Information Retrieval Boosted by Category for
                      Troubleshooting Search System

                    Bin Tong              Toshihiko Yanase                  Hiroaki Ozaki           Makoto Iwayama
                                           Research & Development Group, Hitachi, Ltd.
                                  1-280, Higashi-koigakubo, Kokubunji-shi, Tokyo 185-8601, Japan
                                      {bin.tong.hh, toshihiko.yanase.gm}@hitachi.com
                                    {hiroaki.ozaki.yu, makoto.iwayama.nw}@hitachi.com

ABSTRACT                                                                    ine the similar situations of a problem within a certain pe-
Troubleshooting search system aims at extracting relevant                   riod of time, which facilitates an appropriate solution. The
information to solve the problem at hand. It is often the                   troubleshooting search system requires the engineers to in-
case that documents in troubleshooting system includes an                   put a short problem query to search the relevant information
abundant amount of domain-specic categories. However,                      from the documents. It may cause the lexical gap problem
the useful information about the domain-specic categories,                 [8, 9], because it is dicult for the engineers to compose a
such as relationship between words and categories and re-                   succinct and precise problem query to represent their infor-
lationship between categories, is not fully utilized in simple              mation needs.
query search and faceted search. In this paper, we propose                     Moreover, information about the domain-specic categories
an information retrieval method boosted by the domain-                      is not fully utilized in the problem query search. One way
specic categories. Given a problem query and categories,                   to use the category information is to make faceted search, in
the troubleshooting search system is able to retrieve the rel-              which the selected categories are used to lter the ranking
evant information of interest with respect to the selected                  results. For example, if a machine code is selected, all search
categories. The experiment results show our proposal im-                    results are restricted to the selected machine code.
proves the recall.                                                             However, the faceted search might have two problems in
                                                                            the troubleshooting search system. First, the information
                                                                            related to the selected categories can not be retrieved, since
Categories and Subject Descriptors                                          the retrieved information is limited to the selected cate-
H.3.3 [Information Search and Retrieval]: Retrieval Models                  gories. However, it is natural that system engineers tend to
                                                                            check relevant problems to facilitate their decision making.
Keywords                                                                    For example, given a selected machine code, the informa-
                                                                            tion about another machine code might be informative to
Troubleshooting, Category, Co-occurrence Graph                              solutions if two machine codes belong to the same machine
                                                                            series and have similar problems. Second, the ranking of
1. INTRODUCTION                                                             search results is only dependent on the problem query but
   Troubleshooting [13] is a form of problem solving, and                   not on the selected categories. For example, a trouble code
is often applied to repair malfunctioned facilities or equip-               corresponds to a number of specic countermeasure codes.
ments. Maintenance log [10, 5, 2] is one of important doc-                  Given a selected trouble code, the information about its fre-
uments for troubleshooting, which is generated during con-                  quent countermeasures is expected to place higher in the
versations between customers and engineers in equipment                     ranked list of results.
maintenance. The maintenance log often includes the en-                        To mitigate the lexical gap problem and the above men-
tries for problem titles and documents that relate to the                   tioned retrieval problems, we propose an information re-
problem description in details and instructions to solving                  trieval method using a scoring technique. Our proposal
the problem. To ease the management of the huge amount                      is extended from a word co-occurrence graph in the QSB
of the maintenance logs, domain-specic categories, such as                 method [9] that aims at solving the lexical gap problem. In
machine code, trouble code, and countermeasure code, are                    our proposal, besides using the word co-occurrence to score
used to tag for both the problem titles and the documents.                  words in documents, the word's score is also weighted by a
   The target of information retrieval for troubleshooting [11]             boosting term about the domain-specic categories. More
based on the maintenance logs is to help engineers to exam-                 specically, the boosting term considers the relationship be-
                                                                            tween categories and words and the relationship between
                                                                            categories. They are utilized to alleviate the above two re-
                                                                            trieval problems with respect to the categories.

                                                                            2.   RELATED WORK
Copyright °c 2015 for the individual papers by the papers’ authors. Copy-     The information retrieval for troubleshooting is related to
ing permitted for private and academic purposes. This volume is published   question answering [15] and query biased document sum-
and copyrighted by its editors.
SIGIR Workshop on Graph Search and Beyond ’15 Santiago, Chile               marization document summarization [14, 1, 3]. The most of
Published on CEUR-WS: http://ceur-ws.org/Vol-1393/.                         work about question answering has been focusing on factoid
question answering [6, 12, 7]. However, in the troubleshoot-
ing system, the answer of the question is a set of relevant
sentences or phrases. As to query biased document sum-
marization, there seems to be no work that leverages other
auxiliary information, such as categories. In addition, it
is worth to mention the work [12] about non-factoid ques-
tion answering. In this work, Surdeanu et al. proposed
a framework for answer ranking by exploiting various lin-
guistic features generated from popular techniques, such as
syntactic parsing, Name Entity Recognition (NER), and Se-
mantic Role Labeling (SRL). However, except for regular
sentences, a large number of typos and short phrases exist       Figure 1: The co-occurrence graph among the words and
in maintenance logs. In such a case, those techniques might      the categories
not perform well due to the irregularities in texts and the
lack of training data in the troubleshooting domain.
   Query Snowball (QSB) is a method for multi-document           with a document, the words in the document have edges
summarization that extracts the relevant sentences from mul-     with the category value. In other words, this category value
tiple documents with respect to the given query. The basic       is associated with all the words in the sentences of docu-
idea of this method is to ll up the lexical gap between the     ments. Similarly, if a category value is associated with a
query and relevant sentences by enriching the information        problem query, the words in the problem query have edges
need representation. In order to achieve it, a co-occurrence     with the category value. In other words, this category value
graph for the words in the queries and the documents is          is associated with all the words in the problem query.
built. The words in the co-occurrence graph consist of three        In the second step, we build the co-occurrence graph for
layers, which are Q words, R1 words, and R2 words. Q is          the category values in both problem queries and the docu-
the set of query terms. R1 is the set of words that co-occur     ments. Suppose that a problem query corresponds to a doc-
with a query term in the same sentence. R2 is the set of         ument. In this graph, the category values associated with
words that co-occur with a word from R1 , excluding those        the problem query have edges with the category values as-
that are already in R1 .                                         sociated with the document. As illustrated in Figure 1, the
                                                                 left side represents the co-occurrence graph for the R1 and
                                                                 R2 words and the category values associated with the doc-
3. QUERY SNOWBALL WITH CATEGORY                                  uments; the middle part represents the co-occurrence graph
   INFORMATION                                                   for the Q words and the category values associated with the
   To extract relevant information with respect to the se-       queries; the right side represents the co-occurrence graph for
lected categories, we extend the co-occurrence graph in QSB      the category values in the queries and the category values
by integrating two types of relations, including the relation-   in the documents. We dene CM as the set of the category
ship between words and categories and the relationship be-       values in the query set that are selected by the end-user,
tween categories. The reason to extend QSB is that the           and CN as the set of the category values in the query set
co-occurrence graph is exible to integrate the two relations.   that are not selected by the end-user. Similarly, we dene
   The relationship between words and categories represents      CJ as the set of the category values in the document set
the distribution on words with respect to categories. If         that are selected by the end-user, and CK as the set of the
probabilities of words with respect to a given category are      category values in the document set that are not selected by
high, the information about these words is more likely to        the end-user. We also dene CM N = CM ∪ CN as the set of
be retrieved given that category. The fundamental idea           the category values for the queries and CJK = CJ ∪ CK as
is that the distributions on co-occurrence probabilities of      the set of the category values for the documents.
words with respect to dierent categories might be dier-
ent. It is assumed that the words of higher probabilities        3.2   Score Boosted by Category Information
with respect to a category are treated more important in            In order to integrate two new relationships about cate-
that category. For example, a word appears more often in         gories into the QSB method, we invent a score with respect
documents of a category than other categories. The word is       to a query, CM and CJ , which is boosted by category infor-
therefore more important for that category than the others.      mation. The new score, which we call cqsb, can be formu-
Similarly, the relationship between categories represents the    lated as follows:
occurrences of categories. The information about categories,
                                                                              cqsb(w) = qsb(w)exp(λ · sctg (w))            (1)
whose occurrence frequencies with respect to a specic cat-
egory are high, is more likely to be retrieved. For example,     where qsb(w) is the score for a word w in the QSB method.
a given category appears more often with specic categories      sctg (w) is the boosting term for the word w, which includes
than the others. The information about those specic cat-        two new relationships about categories. More specically,
egories with respect to the given category is treated more       the probabilities between words and categories and the prob-
important than other categories.                                 abilities between categories, which are calculated through
                                                                 the co-occurrence graph, are used to boost the score qsb(w).
3.1 Co-occurrence Graph Extension                                λ is a weight for the term sctg (w). It can be seen from Eq.
  In the rst step, we build two co-occurrence graphs for the    (1) that when sctg (w) is larger than 0, exp(·) will be larger
words and category values in the problem queries and the         than 1. When multiplied with qsb(w), it will give the word
documents, respectively. If a category value is associated       w a higher degree of importance. If the value of λ is set to
be 0, sctg (w) does not take any eect. Note that sctg (w) is                relationship with the word r2 . Since the word r2 does not
always larger than or equal to zero. The score of a sentence                 have edges with the Q words in the co-occurrence graph
is a summation of the scores for any combinations of two                     of the word-word relation, the measurement of the relation
words, which is simply calculated by multiplying the cqsb                    could be done through the R1 words by using the frequency,
scores of the two words.                                                     such as f req(r1 , r2 ) and f req(r1 , q). An intuitive example
                                                                             of the measurement is a multiplication of f req(r1 , r2 ) and
3.3 Score for R1 Words                                                       f req(r1 , q) for the word r2 and the word q . The word q
  The relevant score of a word r1 (r1 ∈ R1 ) with respect to                 (q ∈ QrC2M ) also holds two constraints that the word q is able
the category values can be formulated as follows:                            to reach the word r2 in the co-occurrence graph through a
                                                                             specic word r1 and the word q should have edges with the
           sctg (r1 ) = swc (r1 , QrC1M ) + scc (r1 , QrC1M )          (2)   category values in CM .
where swc (r1 , QrC1M ) measures the relationship between the                   The term swc (r2 , QrC2M ) in Eq. (5) is calculated as:
words and the categories. scc (r1 , QrC1M ) measures the rela-                                           X f req(r1 , r2 )
tionship between the categories. QrC1M is a set of top k query                    swc (r2 , QrC2M ) =                      swc (r1 , QrC2M )   (6)
                                                                                                             sum  R 1
terms that co-occur most frequently with the word r1. In                                                r1 ∈Rr
                                                                                                          1
                                                                                                                 2
                                                                                                                    r   2

addition, the words in QrC1M follow a constraint that they
should have edges with both the word r1 and the category                     where   Rr12represents a set of R1 words which have the
values in CM .                                                               top k highest frequencies with the word r2 , and sumRr1 =
                                                                             P                                                           2
   The term swc (r1 , QrC1M ) in Eq. (2) can be calculated as:                 r1 ∈R1 f req(r1 , r2 ). The term swc (r1 , QCM ) can be calcu-
                                                                                                                           r2
                                                                                     r2
                                          q
                                        |CM N |
                                                                             lated by Eq. (3).
                                X        X             f req(ci , q)            The term scc (r2 , QrC2M ) in Eq. (5) can be calculated through
         swc (r1 , QrC1M ) =                       θ                   (3)
                                  r      i=1
                                                        f req(ci )           Eq. (4) by substituting QrC1M with QrC2M .
                               q∈QC1
                                    M

where CM  q
            N is the set of category values for the queries that
                                                                             4.    EXPERIMENTS
also have edges with q . Let θ be β if ci ∈ CM and ci ∈                         In this experiment, we use the maintenance reports from
     N , and be 1 − β if ci ∈ / CM otherwise. f req(ci , q) is the
   q
CM                                                                           a leading construction machinery company in Japan. We
frequency of sentences that include both ci and q , which can                collect a part of the maintenance reports of 4 dominated
be also represented by the distribution on ci with respect to                troubles from total 19 troubles. Note that equipment code
q . It can be seen that Eq. (3) measures the closeness degree                is the category for the problem queries. Phenomenon code
between the word r1 and the category values in CM through                    and the countermeasure code are the categories for the doc-
the words in QrC1M , since the word r1 does not directly have                uments. In each data set, one query consists of a problem
edges with the category values in CM in the co-occurrence                    query, model code, phenomenon code, and countermeasure
graph.                                                                       code. We also manually label the important sentences from
    The term scc (r1 , QrC1M ) in Eq. (2) can be calculated as:              the documents. For each query, we search the sentences in
                               X X X f req(ci , cj )                         the documents, and evaluate the performance by comparing
        scc (r1 , QrC1M ) =             θ                              (4)   if the sentences in the top rank are matched with the la-
                                r1 c ,c
                                          f req(ci )                         belled sentences. Note that precision, recall and F-score are
                              q∈QC    θ∈Γ i    j
                                  M
                                                                             used as criteria. Among the three metrics, recall is the most
where Γ = {β, 1 − β}. ci ∈ CJr1 and cj ∈ CM     q
                                                  when θ = β .               important criterion. The reason is that, in troubleshooting
Note that CJr1 is a set of categories in CJ that also have                   system, system engineers prefer to examine all similar cases
edges with the word r1 , and CM     q
                                      is a set of categories in              until they feel condent to solve the problem at hand. In
CM that also have edges with the word q (q ∈ QrC1M ). Let                    this experiment, the training data and the test data are the
         −r1
               and cj ∈ CM −q                                                same, since system engineers nd out similar cases in the
ci ∈ CJK                     N when θ = 1 − β . Note that
  −r1
CJK = CJK − CJ and CM        −q                                              past from the troubleshooting system. Note that building
                               N = CM N − CM . CJK is a set
            r1      r1                q       q     r1

of categories in CJK that also have edges with the word r1 ,                 and updating the co-occurrence graph can be done periodi-
and CJKq
           is a set of categories in CM N that also have edges               cally in an unsupervised way.
with the word q (q ∈ QrC1M ). It can be seen that Eq. (4)                       For the comparisons, we select four baseline methods,
                                                                             which are cqsb, qsb, lexsim, and lexsim+qsb. We name our
measures the closeness degree of the category values in CJr1
                                                                             proposal by lexsim+cqsb. lexsim represents the lexical text
and CM q
         .
                                                                             similarity between the problem query and the sentence in
3.4 Score for R2 Words                                                       the comments, which can be simply calculated by the cosine
                                                                             similarity between two vectors with bag-of-words features.
  Similarly, the relevant score of a word r2 (r1 ∈ R2 ) with                 One example of the bag-of-words features is the counts of
respect to the category values can be formulated as follows:                 frequent words in documents. Note that stop words are re-
           sctg (r2 ) = swc (r2 , QrC2M ) + scc (r2 , QrC2M )          (5)   moved when counting the frequencies of words. lexsim+qsb
                                                                             aggregates lexsim and qsb to obtain the nal ranking list,
where swc (r2 , QrC2M ) measures the closeness degree between                which belongs to the rank aggregation problem [4]. As a
the word r1 and the category values which have edges with q                  preliminary step, a simple rank aggregation method is used.
words. scc (r2 , QrC2M ) measures the closeness degree between               Due to the dierent scales of lexsim and qsb scores, we sim-
the category values in the query set and the category values                 ply sum up the orders of two ranking lists that use lexsim
in the document set, which are respect to the word r2 . QrC2M                and qsb, respectively. In other words, the smaller the sum-
represents a set of query terms q (q ∈ Q) that have close                    mation of two orders is, the higher rank the sentence gets.
Similarly, the orders of lexsim score and cqsb score is ag-                                                 0.8

gregated in lexsim + cqsb. In this experiment, we set two                                                   0.7
weights. One is for the weight between the qsb score or cqsb
                                                                                                            0.6
score and the lexsim score. The other is λ, which is for the
category relation term in the cqsb score. The tuning space                                                  0.5


                                                                                             Macro Recall
of the two weights is [0, 0.01, 0.1, 1, 10]. We use Macro Re-                                               0.4
call, Mean Average Precision (MAP), and F3 score as recall,
precision, and F-score, respectively.                                                                       0.3

                                                                                                            0.2
                    0.9
                                                                                                                                                                    qsb
                                                                                                            0.1                                                    cqsb
                    0.8                                                                                                                                          lexsim
                                                                                                                                                            lexsim+qsb
                                                                                                                                                           lexism+cqsb
                    0.7                                                                                      0
                                                                                                                  0    20   40   60   80    100    120   140 160 180      200
                                                                                                                                           Top l
                    0.6
     Macro Recall


                    0.5
                                                                                                                      Figure 4: Recall for trouble code 14
                    0.4
                                                                                                            0.9
                    0.3
                                                                                                            0.8
                    0.2
                                                                            qsb                             0.7
                    0.1                                                    cqsb
                                                                         lexsim
                                                                    lexsim+qsb
                                                                   lexism+cqsb                              0.6
                     0


                                                                                             Macro Recall
                          0    20   40   60   80    100    120   140 160 180      200
                                                   Top l                                                    0.5

                                                                                                            0.4
                              Figure 2: Recall for trouble code 03
                                                                                                            0.3

                                                                                                            0.2
                    0.8
                                                                                                                                                                    qsb
                                                                                                            0.1                                                    cqsb
                                                                                                                                                                 lexsim
                    0.7                                                                                                                                     lexsim+qsb
                                                                                                                                                           lexism+cqsb
                                                                                                             0
                                                                                                                  0    20   40   60   80    100    120   140 160 180      200
                    0.6                                                                                                                    Top l

                    0.5
     Macro Recall


                                                                                                                      Figure 5: Recall for trouble code 18
                    0.4

                    0.3
                                                                                        Table 1: F3 scores in trouble code 03 data set (best case)
                    0.2
                                                                                             Methods        Macro Recall MAP F3 score
                                                                            qsb
                    0.1                                                    cqsb
                                                                         lexsim              lexsim+cqsb         .5513      .0690     .3244
                     0
                                                                    lexsim+qsb
                                                                   lexism+cqsb               lexsim+qsb          .3665      .0407     .2036
                          0    20   40   60   80    100
                                                   Top l
                                                           120   140 160 180      200
                                                                                             lexsim              .2849      .0957     .2379
                                                                                             cqsb                .5200      .0609     .2964
                              Figure 3: Recall for trouble code 05                           qsb                 .3503      .0312     .1731

   As recall is the important criterion in troubleshooting
search system, we calculate Macro Recall for each data set                              Table 2: F3 scores in trouble code 05 data set (worst case)
by setting the top l (l ∈ {5, 10, . . . , 195, 200}) ranking sen-                            Methods         Macro Recall MAP F3 score
tences. Figure 2, Figure 3, Figure 4, and Figure 5 show the                                  lexsim+cqsb         .4645      .0557     .2679
Macro Recalls of all the methods, when the number of the                                     lexsim+qsb          .3893      .0383     .2031
top l sentences changes from 5 to 200. It can be seen that the                               lexsim              .3431      .1346     .2971
performances of lexsim + cqsb are better than other base-                                    cqsb                .4128      .0370     .2049
line methods. It is also noticed from Figure 5 that the per-                                 qsb                 .3746      .0299     .1739
formance dierences between qsb and cqsb are not obvious
when the number of top sentences is increasing. The rea-
son might be that the probabilities of words in informative
                                                                                                             Table 3: The recalls at dierent values of β
sentences with respect to categories and the co-occurrence
probabilities of categories are evenly distributed. Therefore,                                               β     tr03100 tr03200 tr05100 tr05200
the second term on the right side of Eq. (1) does not dier                                                  1      .5200    .7752      .4128    .7013
over words to a large extent.                                                                                0.8    .5225    .7865      .4133    .7057
   We also investigate MAP and F3 score. For simplicity,                                                     0.6    .5560    .8020      .4276    .6935
we show their results in cases in which a best result and a                                                  0.4    .5456    .7809      .3901    .7045
worst result of Macro Recall for lexsim + cqsb are achieved.
The results are illustrated in Table 1 and Table 2, which are
from the trouble code 03 data set and the trouble code 05                               top l sentences are set to be 1 and 100, respectively. It is
data set, respectively. Note that β and the number of the                               shown that lexsim + cqsb outperforms lexsim + qsb, cqsb,
and qsb in both cases. It is also noticed that, in the worst          Translation Models. In Proceedings of the Conference
case, even if Macro Recall of lexsim + cqsb is better than            on Empirical Methods in Natural Language
the others, its F3 score is lower than that of lexsim. Note           Processing (EMNLP), pages 410418, 2008.
that F1 has the same trend as F3 in this experiment.              [9] H. Morita, T. Sakai, and M. Okumura. Query
  We also check the eect of the parameter β , as it inuences        Snowball: A Co-occurrence-based Approach to
the score about the relationship between the categories. Ta-          Multi-document Summarization for Question
ble 3 shows the macro recalls of cqsb at dierent values of           Answering. In The 49th Annual Meeting of the
β and l (l ∈ {100, 200}) in the data sets of trouble code 3           Association for Computational Linguistics: Human
and trouble code 05. It is implied that 0.8 and 0.6 might be          Language Technologies, Proceedings of the Conference
good values for cqsb to improve the recalls.                          (NAACL-HLT), pages 223229, 2011.
                                                                 [10] E. Mustafaraj, M. Hoof, and B. Freisleben. Mining
5. CONCLUSION                                                         Diagnostic Text Reports by Learning to Annotate
  An information retrieval method using the scoring tech-             Knowledge Roles. In Natural Language Processing
nique boosted by the domain-specic categories is proposed            and Text Mining (NLPT), pages 4667, 2007.
for troubleshooting search system. The knowledge about           [11] F. Roulland, S. Castellani, A. N. Kaplan, M. A.
category information, which includes the relationship be-             Grasso, and J. O'Neill. Real-time Query Suggestion in
tween words and categories the relationship between cate-             a Troubleshooting Context, Xerox, US8510306 B2, 4,
gories, is well integrated into a co-occurrence graph. The ex-        2014.
periments on the maintenance logs proved the improvement         [12] M. Surdeanu, M. Ciaramita, and H. Zaragoza.
of recalls, showing the eectiveness of using the category            Learning to Rank Answers to Non-factoid Questions
information.                                                          from Web Collections. Computational Linguistics,
                                                                      37(2):351383, June 2011.
6. REFERENCES                                                    [13] I. Sutton. Process Risk and Reliability Management:
 [1] A. Celikyilmaz and D. Hakkani-Tur. A Hybrid                      Operational Integrity Management. Elsevier, 2010.
     Hierarchical Model for Multi-document                       [14] A. Tombros and M. Sanderson. Advantages of Query
     Summarization. In Proceedings of the 48th Annual                 Biased Summaries in Information Retrieval. In
     Meeting of the Association for Computational                     Proceedings of the 21st Annual International ACM
     Linguistics (ACL), pages 815824, 2010.                          SIGIR Conference on Research and Development in
 [2] A. Chandramouli, G. Subramanian, and D. Bal.                     Information Retrieval (SIGIR), pages 210, 1998.
     Unsupervised Extraction of Part Names from Service          [15] E. M. Voorhees. Overview of the TREC 2003 Question
     Logs. In Proceedings of the World Congress on                    Answering Track. In The Text Retrieval Conference
     Engineering and Computer Science (WCECS), pages                  Proceedings (TREC), pages 5468, 2003.
     826828, 2013.
 [3] H. Daumé, III and D. Marcu. Bayesian Query-focused
     Summarization. In Proceedings of the 21st
     International Conference on Computational
     Linguistics and the 44th Annual Meeting of the
     Association for Computational Linguistics (ACL),
     pages 305312, 2006.
 [4] C. Dwork, R. Kumar, M. Naor, and D. Sivakumar.
     Rank Aggregation Methods for the Web. In
     Proceedings of the 10th International Conference on
     World Wide Web (WWW), pages 613622, 2001.
 [5] B. Edwards, M. Zatorsky, and R. Nayak. Clustering
     and Classication of Maintenance Logs using Text
     Data Mining. In Data Mining and Analytics 2008,
     Proceedings of the Seventh Australasian Data Mining
     Conference (AusDM), pages 193199, 2008.
 [6] Z. Ji, F. Xu, B. Wang, and B. He. In Proceedings of
     the 21st ACM International Conference on
     Information and Knowledge Management (CIKM),
     pages 24712474, 2012.
 [7] J. Ko, L. Si, and E. Nyberg. A Probabilistic
     Framework for Answer Selection in Question
     Answering. In Human Language Technology
     Conference of the North American Chapter of the
     Association of Computational Linguistics, Proceedings
     of the Conference (NAACL-HLT), pages 524531,
     2007.
 [8] J.-T. Lee, S.-B. Kim, Y.-I. Song, and H.-C. Rim.
     Bridging Lexical Gaps Between Queries and Questions
     on Large Online Q&A Collections with Compact