Designing of Information System for Semantic Analysis and
Classification of Issues in Service Desk System
Ksenia Lokhacheva a, Denis Parfenov a and Maria Lapina b
a
    Orenburg State University, Prospekt Pobedy, 13, Orenburg, 460018, Russia
b
    North-Caucasus Federal University, Pushkin St., 1, Stavropol, 355017, Russia


                 Abstract
                 The paper describes the designing of Information System for Semantic Analysis and
                 Classification of Issues in Service Desk System. The concept of a Service Desk system and
                 problems of its using are described; several mathematical models and methods of text
                 analysis and text classification are studied; an analysis of system usage options, construction
                 of a system scheme and a Class diagram were held.

                 Keywords 12
                 reinforcement learning, machine learning, algorithmic trading, market make, market liquidity

1. Introduction
   Text classification (a.k.a. text tagging or text categorization) is the process of automatic text
analysis and assigning a set of pre-defined tags or categories based on text content. Automatic text
classification can be based on production models with pre-defined strategies, or on machine learning
methods. The problem of classifying user requests (issues) in the Service Desk system is one of the
examples of text classification tasks.
   Service Desk is a system that manages incidents (service disruptions) and service requests (routine
service related tasks) along with handling user communications for things like outages and planned
changes to services. The responsibility of the Service Desk operator is to handle such requests: he
gives instructions for troubleshooting or fixes them on his own via remote access. In order to
effectively solve the problem, the operator must correctly classify it, that is, determine the area to
which this issue belongs. In this case, the operator can:
   1) Use the Data Base;
   2) Use the built-in Service Desk search (for similar user issues);
   3) Classify issue on his own according to his professional experience;
   4) Search in the Internet (Google, Yandex, etc.).
   The main difficulty of the content searching for issue in Service Desk systems is that
characteristics of the same class issues are described in different words. In addition, descriptions often
contain slang, grammatical errors and forms of mailing, since most of the requests are received by e-
mail. All these factors significantly complicate the process of issues classification, and therefore
increase the duration of finding the correct matches and solutions for issues.


YRID-2020: International Workshop on Data Mining and Knowledge Engineering, October 15-16, 2020, Stavropol, Russia
EMAIL: ksenia.lohacheva.97@mail.ru (Ksenia Lokhacheva); parfenovdi@mail.ru (Denis Parfenov); norra7@yandex.ru (Maria Lapina);
ORCID: 0000-0002-8073-0710 (Ksenia Lokhacheva); 0000-0002-1146-1270 (Denis Parfenov); 0000-0001-8117-9142 (Maria Lapina);
            ©️ 2020 Copyright for this paper by its authors.
            Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
            CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                                                70
2. Related works
    Most companies in one way or another work with clients and provide user support service. In
addition, technical support of internal processes is a question of great importance for successful
company management.
    In work [3] negative aspects of the wrong organization of Technical Support Department work are
described, namely:
     lack of fixed areas of competence creating a misunderstanding of the importance of the
functions performed;
     risk of the particular user request loss in the total amount of requests and managers’ orders as a
result of an unregulated request form;
     high dependence of the company's work on the "key" specialist, which occurs when a certain
type of work is regularly performed by one employee.
    Service Desk systems are able to ensure high-quality interaction between all members of the
business process. The main tasks of Service Desk systems are the receiving and processing requests,
i.e. the client creates request (ticket) and Service operators process it. With the use of the Service
Desk system, it is possible to improve the work of all Service operators of the company.
    Processes in the Service Desk systems regulate all the difficulties that arise in the work of the IT-
Department [4]:
     Incident Management
     Problem Management
     Change Management
     Release Management
     Service Level Management
     Financial Management
     Availability Management
     Capacity Management
     Continuity Management
     Information Security Management
    Thus, according to the described functions and tasks of the Service Desk system, automation of
some processes using semantic analysis and requests classification in order to predict the most likely
solution to the problem without additional involvement of specialists seems relevant.
    Natural languages texts analysis involves two stages:
    1. word embedding, that includes Parsing, Part-of-speech tagging, excluding stop-words, digits,
Stemming (or Lemmatization).
    2. model training on pre-labeled data and text classification.
    Due to the fact that automatic processing of text information is becoming more and more relevant
and in demand, nowadays there is a large number of studies on methods of models training.
    In [1] and [2], a comparative analysis of text classification methods is carried out. Both papers
present a formal formulation of the text classification problem, describe classification methods, and
provide a comparative analysis of classifier training methods using machine learning technologies,
including the Bayes method, k-nearest neighbors algorithm, least squares method, support vector
machine, and methods based on artificial neural networks. The main criteria for evaluating the quality
of the classification were a combination of precision and recall. Based on the study [1], it was
concluded that the best ratio of these characteristics is achieved using the methods of support vector
machine and convolutional neural network. At the same time, the speed of the Bayes method is one of
the highest, but the accuracy for different experiments varies. According to the study [2], the least
squares method showed the best results in terms of recall, while the support vector method was the
best in terms of precision. A comparative analysis of the considered classification methods based on
studies [1] and [2] is presented in table 1.


                                                                                                      71
Table 1
Analysis of the effectiveness of text classification methods
 Method                             Average precision, p               Average recall, r
 Support vector machine             91,3%                              84,2%
 k-nearest neighbors algorithm 79,5%                                   77,1%
 Convolution neural network         91,5%                              81,3%
 Bayes method                       74,6%                              82,5%

    One of the most popular directions in text analysis is word embedding – techniques in natural
language processing for mapping words to a vectors of real numbers. The article [5] describes the
concept of such mapping, shows the word2vec model – one of the most popular models at the
moment, explains how one can use the word2vec implementation in Python for classification tasks
(user determination on the visited sites sequence) and regression tasks (predicting the popularity of
articles).
    Several papers are devoted to the description of full-fledged systems for the analysis of text
documents and subsequent document processing. Thus, the article [6] describes the Stanford
CoreNLP framework, which provides opportunities for processing texts in natural languages. The
framework is used under JVM, and provides most of the main stages of natural language processing
from tokenization to coreference resolution. The article describes the system architecture, simple
templates for using framework components, and the principles of annotation (adding metadata to
code) using Stanford CoreNLP. The framework supports English, French, German, Chinese, and
Arabic, but does not support Russian.
    Another article [2] describes a software complex that includes an open system for automated text
analysis, an “Automated text analysis” portal, and a set of services and mobile applications. The paper
presents the architecture of a tool set for automated text analysis in Russian, that links a data
management system, an open system for automatic text analysis, an Internet portal, and a mobile
application. The paper does not present a study of the accuracy, reliability and adequacy of the
proposed solution. In addition, it is not possible to classify texts that have a size comparable to the
size of user requests in Service Desk systems.
    Based on the relevance of the problem and the lack of adequate analogues that support Russian, it
was decided to develop an Information System for Semantic Analysis and Classification of Issues in
Service Desk System.

3. Problem statement
   The goal is to design an information system for semantic analysis and classification of issues in
Service Desk system. Typically, Service Desk systems support a three-level client-server architecture,
in which the client (user interface), application (hardware and software), and data (DB and DBMS)
levels are physically separated.
   The following options are available for using the Service Desk system (figure 1).
   We suppose that each request left in the Service Desk system will be pre-processed before it is
included in the list of requests to be executed. At the same time, the pre-processing will consist of
semantic analysis of semi-structured data extracted from the particular issue, classification of the issue
(searching for the most appropriate executing Department (or team) in Technical Support
Department), and selection of a possible solution based on the analysis of solutions of previously
closed issues of the same category.


                                                                                                       72
Figure 1: Variants of Service Desk system usage

   After pre-processing, the request is added to the list of requests to be executed for a specific
Department. Employees of this Department can assign any request to themselves. If, after the first
issue reviewing, the technical service operator agrees with the results of the classification, he can
review a possible solution, try to apply it, and then, if the solution offered by the system did not help,
note this fact in the issue description and offer a new one. If at some point of issue execution it
becomes clear that the classification was incorrect, the technical service operator can detach this issue
from himself and move it to the list of general open issues. After executing and closing issues from
the list of general open issues, an employee who executed it must leave appropriate comments on the
task (about the executing Department and the correct solution).
   As a result, the options for interacting with the proposed system look as shown in figure 2.


Figure 2: Variants of interacting with the proposed system

4. System design
   The scheme of the developing system is shown in figure 3. In this case, Issues Data, Vocabulary,
and Marked Data Storage are components of the Data Storage.


                                                                                                       73
Figure 3: Variants of interacting with the proposed system

   The user leaves the request in the Service Desk system, its data is stored in the Issues Data storage,
then the entire request is vectorized using vocabularies (databases) of the Russian language. The
marked data is sent to the appropriate storage, where the Issue Classification Module pulls it up. After
classification the index of the current issue to update information in the Marked Data Storage is held.
In addition, after issue classification, a possible solution should be proposed. As the output, the
system converts the original request, adding the assigned task class, the executing Department, and
the possible solution for the issue.


Figure 4: Variants of interacting with the proposed system

    Figure 4 shows a Class diagram of the main entities of the designing system. In this case, the
entities are:
     Department, responsible for the Department of the company where employees work. The DTO
“Departments” is linked to the Department entity by an aggregation relationship. It stores information
about all departments of the company where employees who creates issues (the “authorDepartments”
attribute) and employees who execute issues (the “actorDepartments” attribute) work.

                                                                                                      74
     InitialOrder, responsible for initial information of received issue. This entity contains the
following attributes: the issue identification number (orderId), the issue body (orderBody),
information about the issue author(author), information about the Department where the issue author
works (authorDepartment), in this regard, this entity is linked with the “Departments” DTO by an
association relationship, and a list of tags that the author could add to the issue description to specify
the problem (tags).
     TransformedOrder, responsible for information about the transformed request. This entity
inherits the attributes of the InitialOrder entity and also has:
     a) the transformed issue identification number (newOrderId);
     b) vector representation of the issue body (wordVec);
     c) the system-selected request type (class) (recomendedOrderType), the actual request type (class)
          (actualOrderType), these attributes link the TransformedOrder entity to the “OrderTypes”
          DTO;
     d) the system-selected Department whose employees could solve the issue
          (recomendedActorDepartment), the actual Department whose employees solved the issue
          (actualActorDepartment), these attributes associate the TransformedOrder entity with the
          “Departments” DTO;
     e) the system-selected issue solution (recomendedSolution), the actual issue solution
          (actualSolution), these attributes link the TransformedOrder entity to the “Solutions” DTO.
     OrderType, responsible for the classification type of the issue. The “OrderTypes” DTO is
associated with the OrderType entity by an aggregation relationship, and it stores information about
all possible order types (the “types” attribute).
     Solution, responsible for the type of issue solution. The “Solutions: DTO is associated with the
Solution entity by an aggregation relationship, and it stores information about all possible types of
solution requests (the “solutions” attribute).
    These entities are the main components of Issues Data and the Marked Data Storage (figure 3).
    The designing system will be implemented as a plug-in for one of the most famous Service desk
systems – Jira.

5. Conclusion
    The paper describes the designing of Information System for Semantic Analysis and Classification
of Issues in Service Desk System. The following points are mentioned:
    1. the concept of a Service Desk system and problems of its using are described;
    2. several mathematical models and methods of text analysis and text classification are studied;
    3. the information system for semantic analysis and classification of issues in Service Desk
system was designed, an analysis of system usage options, construction of a system scheme and a
Class diagram were held.
    To implement this system, further research of vectorization methods, classification methods, and
solution recommendations methods that are compatible with the Atlassian SDK are necessary.

6. Acknowledgments
   The study was carried out with the financial support the grant from the President of the Russian
Federation for state support of leading scientific schools of the Russian Federation (NSh-
2502.2020.9).

7. References
   [1] A.I. Kadhim "Survey on supervised machine learning techniques for automatic text
       classification." Artificial Intelligence Review 52.1 (2019): 273-292.


                                                                                                       75
[2] A.K. Abasi, A.T. Khader, M.A. Al-Betar, S. Naim, S.N. Makhadmeh, Z.A.A. Alyasseri "Link-
    based multi-verse optimizer for text documents clustering." Applied Soft Computing 87 (2020):
    106002.
[3] Kilpeläinen, Jaakko "Automating knowledge work of service desk: Machine learning model for
    software robot." (2019).
[4] S.P. Paramesh, K.S. Shreedhara "Automated it service desk systems using machine learning
    techniques." Data Analytics and Learning. Springer, Singapore, 2019. 331-346.
[5] M. Younas, K. Wakil, M. Arif, A. Mustafa "An automated approach for identification of non-
    functional requirements using Word2Vec model." Int. J. Adv. Comput. Sci. Appl 10 (2019).
[6] Christopher D. Manning "The Stanford CoreNLP Natural Language Processing Toolkit"
    Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics: System
    Demonstrations (2014): 55-60.
[7] M. Dli "Application of Fuzzy Decision Trees for Rubricating Unstructured Electronic Text
    Documents" Proceedings of the IS-2019 Conference (2019): 108-118.
[8] Hu, Kai "Understanding the topic evolution of scientific literatures like an evolving city: Using
    Google Word2Vec model and spatial autocorrelation analysis." Information Processing &
    Management 56.4 (2019): 1185-1203.
[9] Matt J. Kusner "From word embeddings to document distances" ICML'15: Proceedings of the
    32nd International Conference on International Conference on Machine Learning (2015): 957-
    966.


                                                                                                  76