<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An NLP-based Chatbot to Facilitate RE Activities: An Experience Paper on Human Resources Application</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Pedram Khatamino</string-name>
          <email>pedram.khatamino@westerops.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mustafa B. Çamlı</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bilgehan Öztekin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Umutcan Gözümoğlu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emre Tortumlu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hüseyin M. Gezer</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Chatbot, Natural Language Processing, Requirements Engineering</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>In: F.B. Aydemir</institution>
          ,
          <addr-line>C. Gralha, S. Abualhaija, T. Breaux, M. Daneva, N. Ernst, A. Ferrari, X. Franch, S. Ghanavati, E. Groen, R. Guizzardi, J. Guo, A. Herrmann, J. Horkoff, P. Mennig</addr-line>
          ,
          <institution>E. Paja, A. Perini, N. Seyff, A. Susi, A. Vogelsang (eds.): Joint Proceedings of REFSQ-2021 Workshops, OpenRE</institution>
          ,
          <addr-line>Posters and Tools Track, and Doctoral Symposium, Essen, Germany, 12-04-2021</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Westerops Software and Information Technologies Company</institution>
          ,
          <addr-line>Istanbul</addr-line>
          ,
          <country country="TR">Turkey</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Companies use multiple applications and platforms on a daily basis to manage their operations. The context switch between these applications causes distractions in employees. We propose a chatbot integrated to the main workspace of a company as a quick interface to multiple applications to prevent employees from switching to other applications. This paper presents our experience building the prototype of this chatbot, demonstrate its usage for a human resources application and provide a use case for requirements engineering activities.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRoDUCTION</title>
    </sec>
    <sec id="sec-2">
      <title>2. CHATBOT REQUIREMENTS</title>
      <p>In this section we briefly provide the requirements for the chatbot. The requirements are
elicited through semi-structured elicitation interviews with the other employees of our
company. The system- to-be is a chatbot that acts as a quick interface to multiple
applications that are used within the company on a daily basis in order to prevent the
frequent context switch by the employees. Below we list the requirements.</p>
      <p>• R1: Chatbot shall allow the user to type the kind of the application they want to use.
• R2: Chatbot shall terminate the process when an application that is not supported is chosen 3
times.
• R3: Chatbot shall allow the user to type specific parameters according to the application type.
• R4: Chatbot shall detect the dates written in natural language.
• R5: Chatbot shall calculate the dates based on the detected dates from the conversation.
• R6: Chatbot shall allow the user to specify dates from a calendar drop-down when the
application type is human resources and the dates cannot detect from the conversation.
• R7: Chatbot shall allow the user to confirm the Chatbot’s responses.
• R8: Chatbot shall warn the user if the user’s message is misunderstood or not entered.
• R9: Chatbot shall show all parameters of the request at the end.</p>
      <p>R10: Chatbot shall pass the user’s responses to the API and pass the response from the API to the
use
Although we aim for a chatbot that is connected to multiple applications through their APIs,
we choose a human resources application as our pilot application. In Figure 1, we provide the
use case diagram of the chatbot only for cases related to the human resources application. The
user can type messages via the Chatbot GUI in order to communicate with the Chatbot. The
purpose of these messages can be enhanced in the future. Our focus is to create a leave
request using the chatbot. The user can create a leave request by providing her/his name,
defining the type of the leave, specifying the start and the end dates of the request. The user
needs to approve the request at the end. If the start and end dates are not understood, a
calendar shall be shown by the Chatbot.</p>
    </sec>
    <sec id="sec-3">
      <title>3. CHATBOT ARCHITECTURE</title>
      <p>In this project, we are planning to use a multi-tenant structure for increasing security
however, prototype is sunning on SQLite. In addition, it is aimed to establish an
automatic structure as much as possible with artificial intelligence and rule-based
algorithms.</p>
      <p>In Figure 2, the architecture diagram is shown. The user interacts directly with the
Chatbot framework via the Chatbot GUI. The Chatbot GUI sends the user’s request and
takes the response by the Message Channel. Moreover, the router, NLP Engine and the
database are located in AWS Cloud. The router is a REST API that makes the connection
between the Message Channel and the NLP Engine. The NLP Engine consists of three
parts which are the Application Type Classifier, the Leave Type Detector and the Date
and Time Extractor. They take an input from the Message Channel via the Router and
gives proper outputs back; furthermore, the NLP Engine uses the Database for training
necessary machine learning models.</p>
      <sec id="sec-3-1">
        <title>3.1. Interaction FLOW</title>
        <p>The Figure 3 presents the interaction flow; at the very beginning, the Chatbot shows the
welcome message and then ask name of the user. After that, the Chatbot offers help to
understand the user’s need and takes input from the user. The received input goes
directly to the API. If the user types the whole request parameters at once, the API
responses as correct detection, or else incorrect detection. The input goes through two
API if the response from the API is correct detection. Later on, if the user approves the
parameters, the API is called again to create the request and then the chatbot is finished
or, else the user is connected to human call; however, if the whole request parameters
are marked as incorrect detection from the API, then the user shall type each parameter
by one by. Firstly, the Chatbot shows the type of applications so that the user can choose
the application which is needed. If the selected application is other than the human
resources application, the Chatbot is terminated due to the fact that applications except
the human resources application are not ready to be used yet. However, if selected
application is the human resources application, the Chatbot asks type of the leave and
the user answers. Secondly, the Chatbot opens the calendar drop-down twice for start
and end date with a list of dates and times to be chosen so that the user can choose the
start and end date of the request. Finally, the parameters are shown to the user in order
to check the user’s approvement. If the user approves the parameters, the API is called
again to create the request and then the chatbot is finished or, else the user is connected
to human call.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Technologies</title>
        <p>
          We use Python as the main development language due to our expertise and the available
libraries for NLP, machine learning, and chatbot development. For NLP, we use NLTK [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]
and Spacy [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. For machine learning related tasks, we use Scikit-learn [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] and Pandas [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
We select BotStar [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] as the chatbot builder platform due to its model driven approach,
easy deployment and integration, and detailed documentation. BotStar is a
conversational bot software for building chatbots on websites and Facebook Messenger
[
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Most important feature BotStar presents is the visual editor which allows us to
create flows with many built-in elements without the need of code. Connections between
elements can include many conditions to control the flow. These conditions can work
with external variables implemented through APIs. This extends to another important
feature which is the extension of the chatbot’s capabilities with developer tools. BotStar
comes with a built-in code editor that lets programmers execute complex code within
the context of the chat flow. BotStar also has a CMS (Content Management System) to
create and manage digital contents for the bots. BotStar also includes customization for
behavior such as mimicking human behavior as if the bot is reading or typing a message.
BotStar can be integrated with common apps like Google Sheet [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], Zapier [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], Stripe [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ],
Integromat [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. We connect our backend and chatbot UI with JavaScript. Docker [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
allows us to develop, ship, and run our backend. We use Nginx for web serving this
project. Finally, Amazon Web Service [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]is deployment server of this project.
        </p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. NLP PIPELINE</title>
        <p>We collect text data from open-source websites to create a data set for our project. We
could not benefit from processing such as web scraping, since the target-related text
data was not available on a particular website. We searched with manually determined
keywords for data consisting of three main labels (Human Resource, Project
Management, Accounting). For instance, our search consists words such as sickness,
sample leave request letters and mails, different religious and national holidays and
corporate positions for HR label. In a week, we prepared a data set of approximately
1500 samples with a team of three. These patterns were collected in sentences and
paragraphs.</p>
        <p>The NLP pipeline starts with the classification of the message of the user after the
greeting. The chatbot supports English. We use NLTK for processing the messages before
classification as it supports English. In the training part, the models to be used for the
classification are chosen as the basic and widely used models in the Scikit-Learn library
since the problem is partially easy to classify. The models are Gaussian Naıv̈e Bayes,
Bernoulli Naıv̈e Bayes, Multinomial Naıv̈e Bayes, Support Vector Machines (SVM), linear
SVM, NuSVC, Logistic Regression, SGD Classifier and MLP. Furthermore, the parameters
of all models are selected as the default values defined by the Scikit-Learn library.
Moreover, the train set consists of 75% of the data and the rest, 25% of the data, is used
as the test set to validate our models. Also, existing same amount of each class in train
and test data guaranteed by manually designed split approach. At the beginning, the
train set was trained with these classifiers separately and the results are printed. Table 1
shows the accuracy results for each label and overall, of models on the test set.
It can be seen that the Nu-Support Vector Machines performs as the best model among
all models. Furthermore, one of the Ensemble Learning method is utilized which is called
hard voting classifier in order to increase accuracy. The hard voting classifier basically
includes several classifiers and every individual classifier vote for a class, and the
majority wins. In our work, the hard voting classifier contains all classifiers in Table 1
and gives average accuracy for each class and same overall accuracy with the
NuSupport Vector Machines. Therefore, the hard voting classifier is not needed due to its
poor work.
I want to sick leave for next week. Human Resources
I want to receive the balance sheet for the last month. Accounting</p>
        <p>
          I want to see activities regarding DWS project. Project Management
Term Frequency - Inverse Document Frequency (TF-IDF) is a statistic that aims to better
define how important a word is for a document, while also taking into account the
relation to other documents from the same corpus [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. TfidfVectorizer, transforms text
to feature vectors that can be used as input to estimator. We use scikit-learn
TfidfVectorizer with some bult-in text preprocessing methods and manually designed
operations such as tokenizer, stop words, lowercase, removing numbers, which are the
main concepts of NLP text classification.
        </p>
        <p>Additionally, stemming method is one of our text preprocessing operations. Stemming
could be defined as reducing inflectional and some derivational forms of words which
leads software engineers to get better results. By stemming, related words are reduced
to their base forms. While training datasets, for instance it is better to reduce –ing or –s
suffixes due to the fact that these suffixes do not change the meaning vastly. NLTK
contains both Porter and Snowball stemmers and for the English dataset, the Snowball
stemmer are used. Likewise, our preprocessing operations consist of removing
stopwords by using NLTK library. We used Scikit-learn data split method for splitting train
and test data.</p>
        <p>Part of speech (POS) tagging is categorizing words in a corpus depending on the
definition of the word and its context. For instance, most verbs in English also could be
used as nouns in sentences and by tagging them correctly would result in better results.
Up to now, only the English dataset is tagged. Part of speech tagging is mostly used in
date and duration detection. Table 3 shows the output of our NLP for date-duration
detection.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.1.1 Named Entity Recognition (NER)</title>
        <p>
          NER a process where an algorithm takes a string of text (sentence or paragraph) as input
and identifies relevant nouns (mainly people, places, and organization etc.) that are
mentioned in that string. In this project, we use the NER structure to automate which
tasks are assigned to which personnel when teams are assigned tasks, and to determine
the reason for which personnel take time off when they get leave request [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. Figure 4
shows our NER concept.
        </p>
      </sec>
      <sec id="sec-3-5">
        <title>3.4. Use Case for RE Applications</title>
        <p>Collecting and analyzing user feedback are two RE activities that can be facilitated with
our chatbot in the future. Instead of providing the customers with forms or interfaces
that can be intimidating or assigning a human that can be costly, we plan to deploy a
chatbot for collecting issues related to our software and collect necessary information
about bugs and feature requests using a chatbot. In addition, when a message is entered
in our chatbot about a relevant field within the company, that field will be added to our
data, and in the future, we will identify which areas our customers have less access to
and make improvements in those areas.
4. Related WORKS
Chatbot Platforms. There are many successful chatbot making platforms such as Botstar,
WotNot, Intercom, Drift Chatbot, Landbot.io, LivePerson, Bold360, Octane AI, etc. Each of
them has special features which are designed for competition in marketplace. In order to
choose between different chatbot platforms, we have to pay attention to certain criteria.
These criteria mainly cover the following topics: identifying the use cases, integrations,
natural language and AI capabilities, training, pricing.</p>
        <p>In researching phase, first consideration has to be “what is the use case for using the
chatbot in the project”. A thorough understanding of the project’s use case can help the
researcher determines what exactly expectation of the chatbot. It is vital to have the
right chatbot integrations in place to get the finest results out of the chatbot platform.
The user-chatbot conversation is one of the most critical components that make chatbots
so intriguing for the users. However, consider a platform which supports NLP and has AI
capabilities to expand the use case and chatbot’s capabilities down the line.
Organizations need a human-independent chatbot solution, that supports continuous
learning and gets smarter with each conversation using machine learning and semantic
modeling. Today, most of the chatbot platforms use a combination of a pay-per-call,
monthly license fee, and pay-per-performance pricing models. A researcher needs to go
with a chatbot pricing plan that is predictive, guarantees savings and allows him to pay
according to his achieved or non-achieved goals.</p>
        <p>
          Experiment Reports. A Review of AI Based Medical Assistant Chatbot aims to address the
language divide between consumer and health care professionals by delivering direct
answers to customer inquiries [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Instead of searching through the web-based set of
theoretically relevant text, it is easy to create question response forums to discuss those
queries. Main use-case of the chatbot’s NLP feature is to act as a diagnosis and treatment
authority. Text classification provides the diagnosis and recommendation is sent by the
chatbot in return. This project consists of text classification to diagnose the patient with
certain illness through multiple sequence questions.
        </p>
        <p>
          Another work includes a chatbot created for ticket reservation which is can classify the
entered message [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. It uses NLP concepts to analyze the message such as POS method
for tagging to make sense of the text entered. When NLP block using POS, a rule has been
created from certain keywords such as departure city, destination city and flight city.
This study was carried out for customers to buy tickets faster.
        </p>
        <p>In our study, we use both a model-based and a rule-based structure while making the
necessary classifications. There are many keywords related to the classes we have
determined in the rule-based structure in field of detecting date, time, leave type,
humanoid answers responses, etc. If no result is found with the model we trained from
our own data, we aimed to find a result with the rule-based algorithm we created.
5. Conclusions
In today’s technologically advanced business environment, chatbots help the business
stay accessible round the clock, without investing heavily in hiring extra customer
support reps. Our work’s important point is to prevent the loss of time, effort and
resources by directing the text as automatically as possible in our Super App project,
which will cover all in-house text, and provide fast and easy solutions to the employees.
In conclusion, we introduce an NLP-based chatbot solution to the fatigue caused by
multiple platforms in business environments. Companies use multiple platforms to run
their operations which may not be excelled by their employees and customers. We
propose integrating a chatbot integrated to the Super App of the company. The chatbot is
currently integrated to the human resources application. We also provide an RE related
use case as part of our on-going work.</p>
        <p>For future perspective we aim to integrate this concept for all HR operations plus Project
management and accounting platforms (for example, Slack, Paraşüt, Teamwork, Clockify,
etc.) which are supported by our main Super App. For this goal, we will research and
determine the requirements for each app.</p>
        <p>In the future, we want to update our data set, train deeper models, and advance in
processes such as classification and text prediction. To collect this data, we are planning
to collect data during beta tests from several companies, including our own company.
Additionally, we want to contribute to the literature by preparing data set and
pretrained models in both English and Turkish languages with manually labeled data set.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <article-title>[1] NLTK</article-title>
          . URL:https://www.nltk.org/
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Spacy</surname>
          </string-name>
          . URL:https://spacy.io/
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Scikit-Learn</surname>
          </string-name>
          . URL:https://scikit-learn.org/stable/
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Pandas</surname>
          </string-name>
          . URL:https://pandas.pydata.org/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Botstar</surname>
          </string-name>
          . URL:https://botstar.com/
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Facebook</given-names>
            <surname>Messenger</surname>
          </string-name>
          . URL:https://www.messenger.com/
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Google</given-names>
            <surname>Sheets</surname>
          </string-name>
          . URL:https://www.google.com/sheets/
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Zapier</surname>
          </string-name>
          . URL:https://zapier.com/
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Stripe</surname>
          </string-name>
          . URL:https://stripe.com/
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Integromat</surname>
          </string-name>
          . URL:https://www.integromat.com/en
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Docker</surname>
          </string-name>
          . URL:https://www.docker.com/
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Amazon</given-names>
            <surname>Web</surname>
          </string-name>
          <article-title>Service</article-title>
          . URL:https://aws.amazon.com/
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Ramos</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2003</year>
          , December).
          <article-title>Using tf-idf to determine word relevance in document queries</article-title>
          .
          <source>In Proceedings of the First Instructional Conference on Machine Learning</source>
          (Vol.
          <volume>242</volume>
          , No.
          <issue>1</issue>
          , pp.
          <fpage>29</fpage>
          -
          <lpage>48</lpage>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Bulla</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parushetti</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Teli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Koppad</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2020</year>
          ).
          <source>A Review of AI Based Medical Assistant Chatbot. Research and Applications of Web Development and Design</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Handoyo</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arfan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Soetrisno</surname>
            ,
            <given-names>Y. A. A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Somantri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sofwan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Sinuraya</surname>
            ,
            <given-names>E. W.</given-names>
          </string-name>
          (
          <year>2018</year>
          ,
          <article-title>September)</article-title>
          .
          <article-title>Ticketing chatbot service using serverless NLP technology</article-title>
          .
          <source>In 2018 5th International Conference on Information Technology</source>
          , Computer, and Electrical Engineering (ICITACEE) (pp.
          <fpage>325</fpage>
          -
          <lpage>330</lpage>
          ). IEEE.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>