1. Introduction

Journal of Machine Learning Re- national Journal of Industrial Ergonomics search 3 (2003) 993-1022. 40 (2010) 356-367. URL: https://linkinghub. [11] Elasticsearch B.V.

10.1080/10447310701360995

Improving Spare Part Search for Maintenance Services using Topic Modelling

Anastasiia Grishina

Milosh Stolikj

Qi Gao

Milan Petkovic

0 1 0 Eindhoven University of Technology , Den Dolech 2, 5612 AZ, Eindhoven , The Netherlands 1 Philips Research , High Tech Campus 34, 5656 AE, Eindhoven , The Netherlands

2020

513 19 20

To support the decision-making process in various industrial applications, many companies use knowledge management and Information Retrieval (IR). In an industrial setting, knowledge is extracted from data that is often stored in a semi-structured or unstructured format. As a result, Natural Language Processing (NLP) methods have been applied to a number of IR steps. In this work, we explore how NLP and particularly topic modelling can be used to improve the relevance of spare part retrieval in the context of maintenance services. A proposed methodology extracts topics from short maintenance service reports that also include part replacement data. An intuition behind the proposed methodology is that every topic should represent a specific root cause. Experimental were conducted for an ad-hoc retrieval system of service case descriptions and spare parts. The results have shown that our modification improves a baseline system thus boosting the performance of maintenance service solution recommendation.

eol>Entity retrieval spare part search decision support maintenance services natural language processing topic modelling

1. Introduction

Information retrieval systems are gaining importance in various industrial applications. We can observe the emergence of knowledge-based systems that support the decision-making process in construction, aviation, equipment maintenance and other areas [1, 2]. In these settings, knowledge is frequently extracted from data that is captured in legacy systems using natural language and stored in a semi-structured or unstructured format. As a result, linguistic and statistical NLP methods have been applied to a number of IR steps, such as document and query modelling, query expansion and search result clustering based on semantic similarities [3, 4, 5, 6].

In this work, we explore how NLP and particularly topic modelling can be used to improve spare part retrieval that serves the purpose of medical equipment maintenance. In particular, we focus on remote system diagnostics that takes place when the equipment malfunctions, i.e. stops working according to its specification. The problem may be resolved in several ways, one of which is the replacement of one or more (malfunctioning) parts. We conducted our research in the context of an ad-hoc entity retrieval system which helps engineers to search for relevant historical service reports and identify the most probable service solution. Therefore, target retrieval entities are equipment components, i.e. parts to be replaced. In practice, one case may require multiple parts to be replaced.

To address the challenge of spare part retrieval, we create an NLP pipeline that pre-processes short textual descriptions of maintenance activities and apply topic modelling to categorize the descriptions of past cases. From relevant maintenance service reports, the proposed methodology extracts topics each of which may indicate a specific root cause. Once categorized, cases and parts would be easier to examine and more relevant to a particular type of failure. An engineer can address topics seuqentially and choose among parts related to the same topic. Therefore, we exploit term co-occurrences and their semantic correspondences using topic modelling to enhance the relevance of target entities retrieval. Although the use case assumes that a number of parts will be ultimately suggested based on past maintenance records, the problem statement does not fall under the vastly explored area of recommender systems that involves user preference modelling.

To evaluate the diference introduced by the proposed component, we use IR metrics that are customized to characterize the relevance and completeness of a set of retrieved entities. They measure how far in the list of search results all the required parts are present, indicate if at least one required entity is retrieved and whether all needed parts are present among top K search results. activities and IDs of parts used to solve the issue. The main contributions of the work are as follows:

Hence, the reports

might contain abbreviations, • we evaluate the proposed method on a real set of ranked parts recommended for replacement is Section 6. The paper is concluded by Section 7 where

The baseline entity search system in question is • we enhance the performance of an industrial entity retrieval system by learning semantic correspondences between

short descriptions of events associated historical with the entities; • we approach the challenge of spare parts retrieval in remote system

diagnostics and maintenance of industrial equipment using topic modelling to group extracted historical cases and parts under topics that should represent failure root causes; world dataset using customized information retrieval metrics.

The remainder of this paper is organized as follows. We present the problem formulation and a baseline part retrieval system in Section 2.

The methodology of combining the text mining pipeline and the entity retrieval process is described in

Section 3.

Section 4 is dedicated to a dataset description and methods implementation. We discuss experimental results in Section 5 and related work in we also mention possible directions for future work.

2. Problem Description

In the scope of this work, entity descriptions are composed of equipment characteristics and represented by maintenance case reports registered in the retrieval system. Entities to be retrieved are the parts recommended for replacement to troubleshoot a machine referred to in a new malfunction report. Queries may contain various characteristics of a new maintenance case that should be treated by a maintenance service team. An entity, i.e. a spare part, is identified with a unique ID and is related to a case description.

One historical maintenance case can have several parts associated with it, similarly, a new service case may require a set of diferent parts.

The knowledge base of

maintenance cases is retrieved documents will be ranked according to a updated with the help of service engineers.

They similarity function computed for a query and a submit maintenance reports for every equipment document, i.e. vectors in a vector space. failure or customer complaint as short technical texts often in multiple languages (English and a locally spoken language). Each historical report includes a number of logs such as time of customer complaint registration, a textual description of maintenance

In the context of our problem description, for a query containing keywords { } =1

nance case description with fields and a mainte{ } =1

, Okapi software logs sent by a machine as well as natural language descriptions of a machine state on every step of the maintenance process. Closed cases are uploaded to the collection of historical cases that could be mined using the above mentioned ER system.

To present the setting in a formal way, let be a query performed by a service engineer while working on a case. We will use the term query case to indicate such cases. Each query is associated with a single maintenance case. The list of parts replaced in a case is ( ). We use ( ) to denote a list of cases retrieved for the query . A set of parts replaced in all retrieved cases is denoted by ( ) = ∪ ∈ ( ) ( ), and a expressed by ( ) ⊆ ( ). 3.

Methodology

The method proposed in this work combines a baseline entity retrieval setting and an add-on topic modelling component as described below. 3.1. Baseline Entity Retrieval System empowered with a two-step retrieval mechanism. A database of entity descriptions lies in the foundation of the mechanism. It consists of entity descriptions retrieval followed by the final entity retrieval and ranking as explained in detail below. 3.1.1. Retrieval of Entity Descriptions At the first step of the entity search, a system retrieves relevant descriptions using a Vector Space Model (VSM) with Okapi BM25 similarity score [7, 8]. VSM is a document and query representation model that converts texts to N-dimensional vectors of term weights, where N is the number of words in a dictionary. Terms are simply the words or groups of words present in the collection of documents. The dictionary is built from a text corpus and includes distinct terms.

The intuition behind VSM is that

= ∑

∑ =1 =1 ( ) ⋅ 25(, ) =

( , ) ⋅ ( 1 + 1) ( , ) + 1 ⋅ (1 − + ⋅ )

Here, ( , ) is the frequency of the keyword in a field the field of the case description . in terms of words, and is the length of is the average length of the field in descriptions of all cases in the collection . parameters that control how

Variables 1 and much are tuning every

new occurrence of a term impacts the score and the document length scaling correspondingly. Inverse

Document Frequency is calculated as:

. (1) (2) BM25 similarity score could be expressed as follows: Algorithm 1 Part Recommendation ( ( )) ← ( ( )) + 1 ( ) ← {} ( ) ← {} for ∈ ( ) do: number of parts to recommend

Output: A list of recommended parts

Input: Query associates with maintenance case , ← {} ⊳ # occurrences of part combinations steps. The pipeline includes tokenization, lemmatiza- In this section, we describe the real world dataset that ( ) = log

− ( ) + 0.5 ( ( ) + 0.5 )

, where is the total number of cases, i.e. and ( ) is the number of case descriptions that

= | ( )|, contain 1 the query term .

Therefore, the case 25(, 1 ) >

25(, 2 ).

∈ ( ) is ranked higher than 2 ∈ ( ) if 3.1.2. Entity Retrieval and Ranking 3.2. Topic Modelling Component trieval pipeline is performed by adding a component that groups retrieved cases under a number of topics and ranks the parts within the topics. Figure 1 shows the baseline architecture (a) and the modification that includes the proposed topic modelling component (b).

The topic modelling component could be considered as an individual NLP pipeline with a number of tion, removal of stop phrases, building a dictionary of tokens, term weighting and topic modelling.

Tokenization of the text refers to splitting it into units or The second step realizes the entity retrieval. It ranks lemma is a word in its canonical form that exists in spare parts associated with the retrieved cases based on the frequency of their occurrence and the rank of the case where they occur. Thus, the most frequent parts that occur in top ranked cases appear higher on the final list of retrieved parts than a part that appears the same number of times lower on the case list. Several proprietary filters are applied as well, but they do not afect the methodology. The algorithm for part recommendation is presented in Algorithm 1. one of the most popular algorithms for automatically Transformation of the historical cases and parts re- of topics and derive the topic distribution for every ( ) ← get part IDs( ) if ( ) ∈ ( ) then else ( ) ← ( ) ∪ ( )

( ( )) ← 1 end if sort( ( ), using= for ( ) ∈ ( ) do end for ( ) ← ( ) ∪ ( ) drop duplicates( ( )) end for ( ) ← top K( ( ))

⊳ retrieved parts ⊳ recommended parts ( ( )), order=DESC) tokens that represent individual words or sometimes groups of words [9]. The process of lemmatization involves finding the initial forms of the inflected words, also referred to as root forms or lemmas. A the dictionary of the used language. For example, the lemma for do, doing, did is the word do. Next, term weighting refers to assigning weights to tokens. We utilize term frequency or bag-of-words weights as a term weighting scheme. It associates a term with a weight proportional to the frequency of the term occurrence in the corpus of documents.

For topic

modelling, we use Latent Dirichlet Allocation (LDA), extracting topics.

LDA is based on the generative

probabilistic language model [10].

The purpose of

LDA is to learn the representation of a fixed number document in a collection. Every maintenance service case is assigned a topic according to the maximum probability of the case belonging to a topic.

4. Evaluation

is extracted from the baseline part retrieval system. We also discuss the metrics used to evaluate the performance of the baseline system and compare it to in a baseline two-step document and part retrieval system (a). the configuration with the integrated topic modelling component.

4.1. Dataset Description

For our experiments, we use a proprietary dataset composed of historical maintenance cases. Textual ifelds of case descriptions have been aggregated into one field per maintenance case and serve as input to LDA during training and testing stages. The majority of cases are written in mixed languages. Figure 2 presents the distribution of the number of queries over their characteristics: the number of retrieved service cases, retrieved ranked parts to replace and parts replaced in the query case. queries retrieved up to 200 similar case descriptions, however, this number could reach 1000 cases. The number of unique recommended parts retrieved from these cases was below 350 in general, while the majority of queries retrieved 0-10 parts. The number of parts required to treat a

maintenance case associated with the query was equal to 5 or less in most of the query cases.

For building the LDA model, we use a subset of historical cases written in English. The training set contains data from

101,026 diferent maintenance cases. For the test set, we use a sample of 1,564 queries performed by service engineers, together The majority of | ( )| ≤ . The operator | ⋅ | applied to a set defines the count of set elements. The metrics are calculated and parts retrieved in response to the queries. with the corresponding cases returned as search have results: (, non-empty intersection

with the training ( )). Cases returned for the queries may dataset, however, the cases for which the queries had been created were excluded from the training set.

4.2. Evaluation metrics

Top ranked parts are used to estimate , and _ _ metrics.

Metric@K is computed for a set of retrieved parts as follows: _ { { 1, if ( ) ⊆ ( ), 0, if ( ) ⊈ ( ); 1, if | ( ) ∩ ( )| > 0, 0, if | ( ) ∩ ( )| = 0; | ( ) ∩ ( )| | ( )|

; _ @ ( ) =,

≤ and @ ( ) = 1; number of consumed parts.

An additional metric iterations is fixed at 100. were suggested for a troubleshooting report, shows if any consumed part was listed among retrieved parts, = 5, 10. The algorithm is set up to metrics presented in the paper are evaluated at top retrieved parts and retrieved parts that were consumed to the total as well as , a topic-word prior.

The number of

indicates the ratio of learn symmetric , a document-topic prior, from data of We _ retrieved _ is used to estimate how far in the list of

parts one could find the full list of consumed parts in the query case and returns null if such does not exist. As a baseline, we use the initial part retrieval strategy and its statistics for the whole set of retrieved and ranked parts ( ). Once topics are computed, the metrics are estimated for parts associated with the cases in every topic , i.e. a subset cases and, therefore, parts: discard ( )( ) = { ( ) | ∈ ( ) & ∈ } instead of ( ).

query cases that did not include information whether some parts were consumed or not (i.e. missing data). If a case did not require any part replacement, we utilize an artificial part called “No parts” and assign an ID to it. In this way, for query

cases replacement that were

solved it is possible to without evaluate part the performance of part retrieval. The top ranked part in this situation should be “No parts”.

4.3. Implementation

The first step of the initial ER system is powered by

Elasticsearch [11].

It performs indexing of the documents in the knowledge base and retrieves them according to Okapi BM25 ranking with default tuning parameters 1 = 1.2 and = 0.75.

For the add-on topic modelling component, we utilize Python NLP libraries: Gensim [12] for all the steps including topic modelling and spaCy [13] for lemmatization. One step that is also customized to the maintenance application is the removal of stop phrases. We use a collection of English stop words pre-defined by Gensim and corpus-specific common phrases such as questionnaire forms repeated across the majority of cases, since question formulations do not characterize individual cases.

One characteristic of LDA model is that it provides diferent topic distributions depending on a random seed used in its initialization. Therefore, every LDA model with the same set of parameters, except for the measures whether all the used parts _

In addition, we set an empirical parameter for the ratio of English

words appearing in the case description

= 30%. A topic will be derived by LDA trained on the entirely English corpus in case the description contains at least otherwise the maintenance case will be marked as

English words,

“topic undefined”. 5. Results and Discussion In this section, we compare the results of the initial ER architecture evaluation to the results of the modified architecture with the topic modelling component as well as to the best possible results for the dataset of maintenance cases. We group queries by levels of generalization, which stands for the number of matched cases and retrieved parts in our setting.

Moreover, since a number of topics is a hyper-parameter that is not learned via training, we discuss the estimation of a possible number of topics using NLP coherence metrics and compare it with observations of the retrieval system’s performance. 5.1. Retrieval Performance at Top K

Parts

The performance of maintenance cases and parts retrieval in the initial configuration of the part retrieval system (Baseline) and the configuration with LDA topic modelling component (LDA) is evaluated using the above described metrics at diferent These results are also compared to the best possible results on the test dataset computed at report a 95%-level confidence interval of the mean = ∞

. We values of 5 runs with diferent random seeds for LDA initialization in Figure 3. In addition, we show the ratio of test queries for which the metrics improved with the topic modelling component in comparison

. to the baseline implementation in Figure 4.

Comparing baseline results at diferent top random seed, should be computed several times that retrieved parts, it can be seen that the values of will be referred to as runs further in the text. Afterwards, all the metrics should be averaged over several runs to get consistent results and minimize the influence of the algorithm’s stochastic behavior.

Another control parameter is the number of topics , maximum at increase with higher = ∞ .

and achieve the

possible _ @∞ is not the target value for this metric, since it is higher than the values of _ _ @ for any ≠ ∞ while the goal is to and _ minimize it. Since we target at the lowest _ _ @ possible, this metric is improved when the average value decreases.

Overall improvement is observed for the experimental configuration with the topic modelling component. For metrics evaluated at = 10, the improvement reached 54.5%, 52.6% and 51.8% of maximum possible improvement for , and . It indicates that the introduced component efectively captures similar cases and therefore parts, too. The performance improvement influenced by topic modelling is more prominent at smaller values of as can be seen from the diference between the average baseline values of , Figure 3: Comparison of diferent metrics computed for and and those of LDA in Figure 3. LDA and baseline results in a part retrieval task. Confidence

There is an increase in the ratio of improved interval of 95% is shown as a box around LDA values. queries for , and calculated at smaller as depiced in Figure 4. For example, from less than 4% of queries for @10 to around 5.45% for @5. Turning now to the ratio of queries with improved _ _ @ , it is higher for larger since the set of top ranked parts increases with greater likewise the probability of ifnding all of the necessary parts among top parts.

Yet, it is the metric with the most prominent progress according to the ratio of queries that were improved using topic modelling: 10.49% to 11.20% for the LDA configuration.

While for some queries the metrics were improved by the introduction of LDA component, 0.007% to 0.5% of queries experienced deterioration of the Figure 4: Ratio of queries for which the performance met , and at diferent and rdiecnsciemipnrtoervveadl obfy9t5h%e istosphiocwmnoadseallibnogxcaormoupnodneLnDtA. vCaolunefsi-. 0.8% to 3.2% of queries for _ _ @ . This happens, for example, when a number of documents with the right parts suggestion do not appear in the same group. A possible solution (as well as a future amso|st( )|fr=om0. Thtehegroutoppsiocf qumeorideeslltihnagt becnoemfitepdotnheent work direction) is to integrate domain knowledge integration are the following: into the system and pre-define the number of topics and their characteristic terms to always appear in the 1. queries with number of retrieved cases | ( )| > same topic. 100, 5.1.1. Performance Evaluation for Queries

Grouped Based on the Number of

Retrieved Cases and Parts The queries are grouped by the number of parts used in the query case and retrieved cases as well as by the number of retrieved service cases as demonstrated in Figure A in Appendix. Similarly to Figure 3, the results are reported with the mention of 95%-level confidence interval on average for the runs. We distinguish the queries made for service cases that did not require any part replacement and mark them 2. queries associated with cases that required 1 ≤

| ( )| ≤ 10 parts, 3. queries with retrieved and ranked parts

10 < | ( )| ≤ 100.

Therefore, the topic modelling has a positive efect on the queries that result in extensive lists of cases and, thus, parts appearing in those cases. Comparing this result to the distribution of queries in our experimental setting (Figure 2), the positive efect concerns the largest groups of queries.

Industries have been adopting process planning and knowledge-based systems for machine manufacturing and maintenance over the recent years [1, 2, 19]. In the literature review on spare part demand forecasting [20], it has been found that a large part of research work has been dedicated to the analysis of historical demand using installed base information and reports.

The work on technical support that utilizes a historical case base is particularly relevant to our research [21, 22, 23]. The goal of the paper [21] is to aid telecom technical support teams with a fast and accurate search over the solutions base for previously registered cases and solutions from other technical texts. A method of populating an existing ontology has been proposed using text segmentation and scoring to serve the use case of Telecom Hardware remote user assistance. The authors in [24] propose a two-step method for spare part demand forecasting that predicts the number of repairs and the number of parts needed for a repair. Our work combines processing of a historical case base, but is not focused on spare part demand forecasting for general planning. It rather considers individual maintenance cases and addresses a lower level of granularity.

Processing of Technical Documents Studies apply NLP as a tool for extracting knowledge from natural texts in industrial log mining [25, 22, 26],

6. Related Work

Areas related to our research span across entity retrieval and knowledge management in industrial applications that correspond to the scope of our work while the use of topic modelling in IR is related to the methodology used in this paper. 6.1. Entity Retrieval Overview Entity retrieval (ER) is defined in [16] as “the task of answering queries with a ranked list of entities.” The area of entity retrieval is closely connected to IR and mining technical documentation [27], classification of In a number of research works, a combination of system failures and preventive maintenance [28, 23]. topic modelling and IR is applied to small texts [34].

The study [22] applies an NLP approach to For instance, the paper [35] describes a method that maintenance data concerning a part of the Swedish ifrst pools similar tweets using an IR approach, railway system and identifies frequent failure cases merges relevant short texts in a larger document and on the railways. Text mining and NLP techniques are trains LDA model on concatenated documents thus applied in [23] to analyze and classify the obtaining richer topics. By contrast, our method construction site accidents using the data from addresses a domain-specific collection of short texts Occupational Safety and Health Administration. In written in so-called telegraph style with spelling this setting, an ensemble method was used to obtain mistakes and domain-related abbreviations. Tfidf matrix and a sequential quadratic parsing Search Results Clustering To date, several studmethod to assign weights to 5 classifiers. ies have investigated document and language models

The work [29] focuses on building Machine based on topics and clusters. The work [36] explored Learning (ML) models to estimate future duration of a cluster-based retrieval of documents, a mechanism maintenance activities by identifying problem, that returns a relevant cluster of documents, and solution and items features via text mining for proposed two language models for ranking the pre-processing followed by neural networks and clusters of documents and smoothing the documents decision trees for prediction. NLP is used to mine using clusters. By contrast, some works cluster electronic documents composed of free-form text to search results using traditional ML, graph-based and extract terms of interest, the hierarchy of their con- rank-based clustering techniques [6, 37]. For texts and form a set of normalized terms including instance, Lingo algorithm [38] focuses on learning multi-word terms for further data analysis in [30]. phrases to represent clusters in a human-readable

Therefore, problems addressed in maintenance way and then it discovers topics using Tfidf weightservices application domain are diverse in nature. ing, performs term-document matrix reduction with However, to the best of our knowledge the current SVD and matches the extracted phrases with topics. paper is the first attempt to use entity retrieval In comparison to these approaches, our work aims at techniques for spare part management. retrieving entities rather than documents and the user can explore all the retrieved parts within all the clusters instead of only one cluster. 6.3. Use of NLP and Topic Modelling in

IR Systems The efectiveness of IR systems could be improved by topic modelling that mines term associations in a collection of documents. Topic modelling could be integrated to IR tasks to smooth the document model with a document term prior estimated using term distributions over topics [31]. The work [32] explores the possibilities of modelling term associations as a way of related terms integration into document models and proposes a model of probabilistic term association using the joint probability of terms. A combination of term indexing and topic modelling approaches is proposed in [33]. In the proposed model, every query term in a document is weighted using the LDA algorithm and IR indexing methods.

The best experimental results were obtained with LDA-BM25 version. However, in this paper, the similarity is computed using a vector space model and the retrieval results are combined using topic relations mined from a historical case base.

Therefore, topic modelling is used as a clustering or grouping method on top of an ER system.

7. Conclusion

In this work, we explored a way of improving a spare part retrieval system for remote diagnostics and maintenance of medical equipment by applying topic modelling to search results. The topic modelling component was used to cluster the results of a baseline retrieval system and improve the relevance of the search results. We aimed to support the decision-making process of maintenance service teams that searched in a historical collection of troubleshooting reports and retrieved parts needed for a new similar issue.

The experimental dataset was constructed from query-result pairs pointing at the historical case base and parts used in the cases. We adjusted several IR metrics to evaluate the results of spare part retrieval in the baseline architecture and the topic modelling component modification. The major enhancement was observed for the metric that estimated the minimum top ranked parts that were suficient for the full treatment of a service case associated with a

Acknowledgments

The authors would like to acknowledge the gracious support of this work through the local authorities under grant agreement “ITEA-2018-17030-Daytime”. performed query.

A natural progression of this work is to apply online topic learning and automatically recommend the topic that performs best for a given query. An input from domain experts would help fix the number of topics and characteristic terms that should appear under one topic. Furthermore, additional domain knowledge could be combined with the entity retrieval system under consideration to suggest actions beyond part replacement, such as troubleshooting tests for remote and on-site diagnostics.