Information Retrieval Boosted by Category for Troubleshooting Search System Bin Tong Toshihiko Yanase Hiroaki Ozaki Makoto Iwayama Research & Development Group, Hitachi, Ltd. 1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan {bin.tong.hh, toshihiko.yanase.gm}@hitachi.com {hiroaki.ozaki.yu, makoto.iwayama.nw}@hitachi.com ABSTRACT ine the similar situations of a problem within a certain pe- Troubleshooting search system aims at extracting relevant riod of time, which facilitates an appropriate solution. The information to solve the problem at hand. It is often the troubleshooting search system requires the engineers to in- case that documents in troubleshooting system includes an put a short problem query to search the relevant information abundant amount of domain-specic categories. However, from the documents. It may cause the lexical gap problem the useful information about the domain-specic categories, [8, 9], because it is dicult for the engineers to compose a such as relationship between words and categories and re- succinct and precise problem query to represent their infor- lationship between categories, is not fully utilized in simple mation needs. query search and faceted search. In this paper, we propose Moreover, information about the domain-specic categories an information retrieval method boosted by the domain- is not fully utilized in the problem query search. One way specic categories. Given a problem query and categories, to use the category information is to make faceted search, in the troubleshooting search system is able to retrieve the rel- which the selected categories are used to lter the ranking evant information of interest with respect to the selected results. For example, if a machine code is selected, all search categories. The experiment results show our proposal im- results are restricted to the selected machine code. proves the recall. However, the faceted search might have two problems in the troubleshooting search system. First, the information related to the selected categories can not be retrieved, since Categories and Subject Descriptors the retrieved information is limited to the selected cate- H.3.3 [Information Search and Retrieval]: Retrieval Models gories. However, it is natural that system engineers tend to check relevant problems to facilitate their decision making. Keywords For example, given a selected machine code, the informa- tion about another machine code might be informative to Troubleshooting, Category, Co-occurrence Graph solutions if two machine codes belong to the same machine series and have similar problems. Second, the ranking of 1. INTRODUCTION search results is only dependent on the problem query but Troubleshooting [13] is a form of problem solving, and not on the selected categories. For example, a trouble code is often applied to repair malfunctioned facilities or equip- corresponds to a number of specic countermeasure codes. ments. Maintenance log [10, 5, 2] is one of important doc- Given a selected trouble code, the information about its fre- uments for troubleshooting, which is generated during con- quent countermeasures is expected to place higher in the versations between customers and engineers in equipment ranked list of results. maintenance. The maintenance log often includes the en- To mitigate the lexical gap problem and the above men- tries for problem titles and documents that relate to the tioned retrieval problems, we propose an information re- problem description in details and instructions to solving trieval method using a scoring technique. Our proposal the problem. To ease the management of the huge amount is extended from a word co-occurrence graph in the QSB of the maintenance logs, domain-specic categories, such as method [9] that aims at solving the lexical gap problem. In machine code, trouble code, and countermeasure code, are our proposal, besides using the word co-occurrence to score used to tag for both the problem titles and the documents. words in documents, the word's score is also weighted by a The target of information retrieval for troubleshooting [11] boosting term about the domain-specic categories. More based on the maintenance logs is to help engineers to exam- specically, the boosting term considers the relationship be- tween categories and words and the relationship between categories. They are utilized to alleviate the above two re- trieval problems with respect to the categories. 2. RELATED WORK Copyright °c 2015 for the individual papers by the papers’ authors. Copy- The information retrieval for troubleshooting is related to ing permitted for private and academic purposes. This volume is published question answering [15] and query biased document sum- and copyrighted by its editors. SIGIR Workshop on Graph Search and Beyond ’15 Santiago, Chile marization document summarization [14, 1, 3]. The most of Published on CEUR-WS: http://ceur-ws.org/Vol-1393/. work about question answering has been focusing on factoid question answering [6, 12, 7]. However, in the troubleshoot- ing system, the answer of the question is a set of relevant sentences or phrases. As to query biased document sum- marization, there seems to be no work that leverages other auxiliary information, such as categories. In addition, it is worth to mention the work [12] about non-factoid ques- tion answering. In this work, Surdeanu et al. proposed a framework for answer ranking by exploiting various lin- guistic features generated from popular techniques, such as syntactic parsing, Name Entity Recognition (NER), and Se- mantic Role Labeling (SRL). However, except for regular sentences, a large number of typos and short phrases exist Figure 1: The co-occurrence graph among the words and in maintenance logs. In such a case, those techniques might the categories not perform well due to the irregularities in texts and the lack of training data in the troubleshooting domain. Query Snowball (QSB) is a method for multi-document with a document, the words in the document have edges summarization that extracts the relevant sentences from mul- with the category value. In other words, this category value tiple documents with respect to the given query. The basic is associated with all the words in the sentences of docu- idea of this method is to ll up the lexical gap between the ments. Similarly, if a category value is associated with a query and relevant sentences by enriching the information problem query, the words in the problem query have edges need representation. In order to achieve it, a co-occurrence with the category value. In other words, this category value graph for the words in the queries and the documents is is associated with all the words in the problem query. built. The words in the co-occurrence graph consist of three In the second step, we build the co-occurrence graph for layers, which are Q words, R1 words, and R2 words. Q is the category values in both problem queries and the docu- the set of query terms. R1 is the set of words that co-occur ments. Suppose that a problem query corresponds to a doc- with a query term in the same sentence. R2 is the set of ument. In this graph, the category values associated with words that co-occur with a word from R1 , excluding those the problem query have edges with the category values as- that are already in R1 . sociated with the document. As illustrated in Figure 1, the left side represents the co-occurrence graph for the R1 and R2 words and the category values associated with the doc- 3. QUERY SNOWBALL WITH CATEGORY uments; the middle part represents the co-occurrence graph INFORMATION for the Q words and the category values associated with the To extract relevant information with respect to the se- queries; the right side represents the co-occurrence graph for lected categories, we extend the co-occurrence graph in QSB the category values in the queries and the category values by integrating two types of relations, including the relation- in the documents. We dene CM as the set of the category ship between words and categories and the relationship be- values in the query set that are selected by the end-user, tween categories. The reason to extend QSB is that the and CN as the set of the category values in the query set co-occurrence graph is exible to integrate the two relations. that are not selected by the end-user. Similarly, we dene The relationship between words and categories represents CJ as the set of the category values in the document set the distribution on words with respect to categories. If that are selected by the end-user, and CK as the set of the probabilities of words with respect to a given category are category values in the document set that are not selected by high, the information about these words is more likely to the end-user. We also dene CM N = CM ∪ CN as the set of be retrieved given that category. The fundamental idea the category values for the queries and CJK = CJ ∪ CK as is that the distributions on co-occurrence probabilities of the set of the category values for the documents. words with respect to dierent categories might be dier- ent. It is assumed that the words of higher probabilities 3.2 Score Boosted by Category Information with respect to a category are treated more important in In order to integrate two new relationships about cate- that category. For example, a word appears more often in gories into the QSB method, we invent a score with respect documents of a category than other categories. The word is to a query, CM and CJ , which is boosted by category infor- therefore more important for that category than the others. mation. The new score, which we call cqsb, can be formu- Similarly, the relationship between categories represents the lated as follows: occurrences of categories. The information about categories, cqsb(w) = qsb(w)exp(λ · sctg (w)) (1) whose occurrence frequencies with respect to a specic cat- egory are high, is more likely to be retrieved. For example, where qsb(w) is the score for a word w in the QSB method. a given category appears more often with specic categories sctg (w) is the boosting term for the word w, which includes than the others. The information about those specic cat- two new relationships about categories. More specically, egories with respect to the given category is treated more the probabilities between words and categories and the prob- important than other categories. abilities between categories, which are calculated through the co-occurrence graph, are used to boost the score qsb(w). 3.1 Co-occurrence Graph Extension λ is a weight for the term sctg (w). It can be seen from Eq. In the rst step, we build two co-occurrence graphs for the (1) that when sctg (w) is larger than 0, exp(·) will be larger words and category values in the problem queries and the than 1. When multiplied with qsb(w), it will give the word documents, respectively. If a category value is associated w a higher degree of importance. If the value of λ is set to be 0, sctg (w) does not take any eect. Note that sctg (w) is relationship with the word r2 . Since the word r2 does not always larger than or equal to zero. The score of a sentence have edges with the Q words in the co-occurrence graph is a summation of the scores for any combinations of two of the word-word relation, the measurement of the relation words, which is simply calculated by multiplying the cqsb could be done through the R1 words by using the frequency, scores of the two words. such as f req(r1 , r2 ) and f req(r1 , q). An intuitive example of the measurement is a multiplication of f req(r1 , r2 ) and 3.3 Score for R1 Words f req(r1 , q) for the word r2 and the word q . The word q The relevant score of a word r1 (r1 ∈ R1 ) with respect to (q ∈ QrC2M ) also holds two constraints that the word q is able the category values can be formulated as follows: to reach the word r2 in the co-occurrence graph through a specic word r1 and the word q should have edges with the sctg (r1 ) = swc (r1 , QrC1M ) + scc (r1 , QrC1M ) (2) category values in CM . where swc (r1 , QrC1M ) measures the relationship between the The term swc (r2 , QrC2M ) in Eq. (5) is calculated as: words and the categories. scc (r1 , QrC1M ) measures the rela- X f req(r1 , r2 ) tionship between the categories. QrC1M is a set of top k query swc (r2 , QrC2M ) = swc (r1 , QrC2M ) (6) sum R 1 terms that co-occur most frequently with the word r1. In r1 ∈Rr 1 2 r 2 addition, the words in QrC1M follow a constraint that they should have edges with both the word r1 and the category where Rr12represents a set of R1 words which have the values in CM . top k highest frequencies with the word r2 , and sumRr1 = P 2 The term swc (r1 , QrC1M ) in Eq. (2) can be calculated as: r1 ∈R1 f req(r1 , r2 ). The term swc (r1 , QCM ) can be calcu- r2 r2 q |CM N | lated by Eq. (3). X X f req(ci , q) The term scc (r2 , QrC2M ) in Eq. (5) can be calculated through swc (r1 , QrC1M ) = θ (3) r i=1 f req(ci ) Eq. (4) by substituting QrC1M with QrC2M . q∈QC1 M where CM q N is the set of category values for the queries that 4. EXPERIMENTS also have edges with q . Let θ be β if ci ∈ CM and ci ∈ In this experiment, we use the maintenance reports from N , and be 1 − β if ci ∈ / CM otherwise. f req(ci , q) is the q CM a leading construction machinery company in Japan. We frequency of sentences that include both ci and q , which can collect a part of the maintenance reports of 4 dominated be also represented by the distribution on ci with respect to troubles from total 19 troubles. Note that equipment code q . It can be seen that Eq. (3) measures the closeness degree is the category for the problem queries. Phenomenon code between the word r1 and the category values in CM through and the countermeasure code are the categories for the doc- the words in QrC1M , since the word r1 does not directly have uments. In each data set, one query consists of a problem edges with the category values in CM in the co-occurrence query, model code, phenomenon code, and countermeasure graph. code. We also manually label the important sentences from The term scc (r1 , QrC1M ) in Eq. (2) can be calculated as: the documents. For each query, we search the sentences in X X X f req(ci , cj ) the documents, and evaluate the performance by comparing scc (r1 , QrC1M ) = θ (4) if the sentences in the top rank are matched with the la- r1 c ,c f req(ci ) belled sentences. Note that precision, recall and F-score are q∈QC θ∈Γ i j M used as criteria. Among the three metrics, recall is the most where Γ = {β, 1 − β}. ci ∈ CJr1 and cj ∈ CM q when θ = β . important criterion. The reason is that, in troubleshooting Note that CJr1 is a set of categories in CJ that also have system, system engineers prefer to examine all similar cases edges with the word r1 , and CM q is a set of categories in until they feel condent to solve the problem at hand. In CM that also have edges with the word q (q ∈ QrC1M ). Let this experiment, the training data and the test data are the −r1 and cj ∈ CM −q same, since system engineers nd out similar cases in the ci ∈ CJK N when θ = 1 − β . Note that −r1 CJK = CJK − CJ and CM −q past from the troubleshooting system. Note that building N = CM N − CM . CJK is a set r1 r1 q q r1 of categories in CJK that also have edges with the word r1 , and updating the co-occurrence graph can be done periodi- and CJKq is a set of categories in CM N that also have edges cally in an unsupervised way. with the word q (q ∈ QrC1M ). It can be seen that Eq. (4) For the comparisons, we select four baseline methods, which are cqsb, qsb, lexsim, and lexsim+qsb. We name our measures the closeness degree of the category values in CJr1 proposal by lexsim+cqsb. lexsim represents the lexical text and CM q . similarity between the problem query and the sentence in 3.4 Score for R2 Words the comments, which can be simply calculated by the cosine similarity between two vectors with bag-of-words features. Similarly, the relevant score of a word r2 (r1 ∈ R2 ) with One example of the bag-of-words features is the counts of respect to the category values can be formulated as follows: frequent words in documents. Note that stop words are re- sctg (r2 ) = swc (r2 , QrC2M ) + scc (r2 , QrC2M ) (5) moved when counting the frequencies of words. lexsim+qsb aggregates lexsim and qsb to obtain the nal ranking list, where swc (r2 , QrC2M ) measures the closeness degree between which belongs to the rank aggregation problem [4]. As a the word r1 and the category values which have edges with q preliminary step, a simple rank aggregation method is used. words. scc (r2 , QrC2M ) measures the closeness degree between Due to the dierent scales of lexsim and qsb scores, we sim- the category values in the query set and the category values ply sum up the orders of two ranking lists that use lexsim in the document set, which are respect to the word r2 . QrC2M and qsb, respectively. In other words, the smaller the sum- represents a set of query terms q (q ∈ Q) that have close mation of two orders is, the higher rank the sentence gets. Similarly, the orders of lexsim score and cqsb score is ag- 0.8 gregated in lexsim + cqsb. In this experiment, we set two 0.7 weights. One is for the weight between the qsb score or cqsb 0.6 score and the lexsim score. The other is λ, which is for the category relation term in the cqsb score. The tuning space 0.5 Macro Recall of the two weights is [0, 0.01, 0.1, 1, 10]. We use Macro Re- 0.4 call, Mean Average Precision (MAP), and F3 score as recall, precision, and F-score, respectively. 0.3 0.2 0.9 qsb 0.1 cqsb 0.8 lexsim lexsim+qsb lexism+cqsb 0.7 0 0 20 40 60 80 100 120 140 160 180 200 Top l 0.6 Macro Recall 0.5 Figure 4: Recall for trouble code 14 0.4 0.9 0.3 0.8 0.2 qsb 0.7 0.1 cqsb lexsim lexsim+qsb lexism+cqsb 0.6 0 Macro Recall 0 20 40 60 80 100 120 140 160 180 200 Top l 0.5 0.4 Figure 2: Recall for trouble code 03 0.3 0.2 0.8 qsb 0.1 cqsb lexsim 0.7 lexsim+qsb lexism+cqsb 0 0 20 40 60 80 100 120 140 160 180 200 0.6 Top l 0.5 Macro Recall Figure 5: Recall for trouble code 18 0.4 0.3 Table 1: F3 scores in trouble code 03 data set (best case) 0.2 Methods Macro Recall MAP F3 score qsb 0.1 cqsb lexsim lexsim+cqsb .5513 .0690 .3244 0 lexsim+qsb lexism+cqsb lexsim+qsb .3665 .0407 .2036 0 20 40 60 80 100 Top l 120 140 160 180 200 lexsim .2849 .0957 .2379 cqsb .5200 .0609 .2964 Figure 3: Recall for trouble code 05 qsb .3503 .0312 .1731 As recall is the important criterion in troubleshooting search system, we calculate Macro Recall for each data set Table 2: F3 scores in trouble code 05 data set (worst case) by setting the top l (l ∈ {5, 10, . . . , 195, 200}) ranking sen- Methods Macro Recall MAP F3 score tences. Figure 2, Figure 3, Figure 4, and Figure 5 show the lexsim+cqsb .4645 .0557 .2679 Macro Recalls of all the methods, when the number of the lexsim+qsb .3893 .0383 .2031 top l sentences changes from 5 to 200. It can be seen that the lexsim .3431 .1346 .2971 performances of lexsim + cqsb are better than other base- cqsb .4128 .0370 .2049 line methods. It is also noticed from Figure 5 that the per- qsb .3746 .0299 .1739 formance dierences between qsb and cqsb are not obvious when the number of top sentences is increasing. The rea- son might be that the probabilities of words in informative Table 3: The recalls at dierent values of β sentences with respect to categories and the co-occurrence probabilities of categories are evenly distributed. Therefore, β tr03100 tr03200 tr05100 tr05200 the second term on the right side of Eq. (1) does not dier 1 .5200 .7752 .4128 .7013 over words to a large extent. 0.8 .5225 .7865 .4133 .7057 We also investigate MAP and F3 score. For simplicity, 0.6 .5560 .8020 .4276 .6935 we show their results in cases in which a best result and a 0.4 .5456 .7809 .3901 .7045 worst result of Macro Recall for lexsim + cqsb are achieved. The results are illustrated in Table 1 and Table 2, which are from the trouble code 03 data set and the trouble code 05 top l sentences are set to be 1 and 100, respectively. It is data set, respectively. Note that β and the number of the shown that lexsim + cqsb outperforms lexsim + qsb, cqsb, and qsb in both cases. It is also noticed that, in the worst Translation Models. In Proceedings of the Conference case, even if Macro Recall of lexsim + cqsb is better than on Empirical Methods in Natural Language the others, its F3 score is lower than that of lexsim. Note Processing (EMNLP), pages 410418, 2008. that F1 has the same trend as F3 in this experiment. [9] H. Morita, T. Sakai, and M. Okumura. Query We also check the eect of the parameter β , as it inuences Snowball: A Co-occurrence-based Approach to the score about the relationship between the categories. Ta- Multi-document Summarization for Question ble 3 shows the macro recalls of cqsb at dierent values of Answering. In The 49th Annual Meeting of the β and l (l ∈ {100, 200}) in the data sets of trouble code 3 Association for Computational Linguistics: Human and trouble code 05. It is implied that 0.8 and 0.6 might be Language Technologies, Proceedings of the Conference good values for cqsb to improve the recalls. (NAACL-HLT), pages 223229, 2011. [10] E. Mustafaraj, M. Hoof, and B. Freisleben. Mining 5. CONCLUSION Diagnostic Text Reports by Learning to Annotate An information retrieval method using the scoring tech- Knowledge Roles. In Natural Language Processing nique boosted by the domain-specic categories is proposed and Text Mining (NLPT), pages 4667, 2007. for troubleshooting search system. The knowledge about [11] F. Roulland, S. Castellani, A. N. Kaplan, M. A. category information, which includes the relationship be- Grasso, and J. O'Neill. Real-time Query Suggestion in tween words and categories the relationship between cate- a Troubleshooting Context, Xerox, US8510306 B2, 4, gories, is well integrated into a co-occurrence graph. The ex- 2014. periments on the maintenance logs proved the improvement [12] M. Surdeanu, M. Ciaramita, and H. Zaragoza. of recalls, showing the eectiveness of using the category Learning to Rank Answers to Non-factoid Questions information. from Web Collections. Computational Linguistics, 37(2):351383, June 2011. 6. REFERENCES [13] I. Sutton. Process Risk and Reliability Management: [1] A. Celikyilmaz and D. Hakkani-Tur. A Hybrid Operational Integrity Management. Elsevier, 2010. Hierarchical Model for Multi-document [14] A. Tombros and M. Sanderson. Advantages of Query Summarization. In Proceedings of the 48th Annual Biased Summaries in Information Retrieval. In Meeting of the Association for Computational Proceedings of the 21st Annual International ACM Linguistics (ACL), pages 815824, 2010. SIGIR Conference on Research and Development in [2] A. Chandramouli, G. Subramanian, and D. Bal. Information Retrieval (SIGIR), pages 210, 1998. Unsupervised Extraction of Part Names from Service [15] E. M. Voorhees. Overview of the TREC 2003 Question Logs. In Proceedings of the World Congress on Answering Track. In The Text Retrieval Conference Engineering and Computer Science (WCECS), pages Proceedings (TREC), pages 5468, 2003. 826828, 2013. [3] H. Daumé, III and D. Marcu. Bayesian Query-focused Summarization. In Proceedings of the 21st International Conference on Computational Linguistics and the 44th Annual Meeting of the Association for Computational Linguistics (ACL), pages 305312, 2006. [4] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar. Rank Aggregation Methods for the Web. In Proceedings of the 10th International Conference on World Wide Web (WWW), pages 613622, 2001. [5] B. Edwards, M. Zatorsky, and R. Nayak. Clustering and Classication of Maintenance Logs using Text Data Mining. In Data Mining and Analytics 2008, Proceedings of the Seventh Australasian Data Mining Conference (AusDM), pages 193199, 2008. [6] Z. Ji, F. Xu, B. Wang, and B. He. In Proceedings of the 21st ACM International Conference on Information and Knowledge Management (CIKM), pages 24712474, 2012. [7] J. Ko, L. Si, and E. Nyberg. A Probabilistic Framework for Answer Selection in Question Answering. In Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics, Proceedings of the Conference (NAACL-HLT), pages 524531, 2007. [8] J.-T. Lee, S.-B. Kim, Y.-I. Song, and H.-C. Rim. Bridging Lexical Gaps Between Queries and Questions on Large Online Q&A Collections with Compact