<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Deep Dive Survey Miner (Extended Abstract)</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marwa Elleuch</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oumaima Alaoui Ismaili</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Philippe Legay</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Orange Labs</institution>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Deep Dive Survey Miner (DDSM) is a web-based tool that analysis the verbatim records of business stakeholders' answers to open-ended survey questions. These records are traditionally manually annotated or in the best cases, analyzed by tools ensuring a coarse analysis. DDSM allows business analyst to go deeper through an unsupervised fine-grained sentiment analysis. It allows them to get insights on not only the distribution of the sentiments or the topics expressed by business stakeholders, but also, the reasons of these sentiments/topics. This could be useful for quickly deducing problems and great performances of organisation services starting from several verbatim records.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Open-ended survey questions</kwd>
        <kwd>Unsupervised fine-grained sentiment analysis</kwd>
        <kwd>Organisation services</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        • Approaches based on supervised learning (e.g. [
        <xref ref-type="bibr" rid="ref1 ref2 ref5 ref6">5, 6, 1, 2</xref>
        ]), which are limited as they
requires human efort, and could not detect new labels (example of conceived tool Erdil).
• Approaches based on unsupervised learning (e.g. [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]) which have been restricted
to characterizing each sentiment category by the set of recurring words appearing in
their contexts. In the best cases, these words are organized into topics (example of tools
DATAVIV’, VisualCRM).
      </p>
      <p>In this demo paper, we present the Deep Dive Survey Miner (DDSM) tool. It discovers in an
unsupervised way, the reasons of business stakeholders’ sentiments for deducing negative and
positive performances. The tool ensures a fine-grained sentiment analysis allowing users to
go deeper in their analysis; it generates results according to three levels of granularity: (i) a
coarse level concerning the sentiments expressed in the verbatim records, (ii) a second coarse
level concerning the topics in relation to each sentiment, and (iii) a finer level to understand the
reasons in relation to each topic. In what follows, we describe the main functionalities of the
tool, discuss its maturity and conclude with future works.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Main features</title>
      <p>DDSM is a web-based tool allows business experts to quickly analyse the verbatim records of
their surveys. The general architecture of DDSM tool consists of three key components. Below,
we summarize the functionality of each one.</p>
      <p>A- Load Data: DDSM contains an upload page where users can load their verbatim records
while specifying the column names corresponding to their textual contents, IDs and timestamps.</p>
      <p>B- Data Analysis: It uncovers, from the verbatims records, topics and sub-topics of the most
recurrent sentiments reasons. We show, in Figure 1, the main implemented phases:
- Phase 1: This phase pre-processes each verbatim record and split it into textual segments. It
applies a pretrained BERT Model to associate to each segment a positive or negative sentiment.
- Phase 3: This phase combines the occurrences of topics and sub-topics with the identified
sentiments of their corresponding textual segments (see the example of results at Phase 3 in
Figure 1). This enables the generation of two outputs: The first is a knowledge graph resuming:
() Sentiment distribution around each topic and sub-topics; () Sub-topics around there are
mostly positive sentiment; they are considered as the elementary reasons of satisfactions or
positive performances; () Sub-topics around there are mostly negative sentiments; they
are considered as the elementary reasons of dissatisfaction or negative performances. The
second output is a structured log of sentiment reasons: it associates to each occurrence of a
sentiment reason the following attributes: () the verbatim record ID and the related verbatim
question, () the associated topic, sub-topic and sentiment generated by the tool, () the
precise textual segment where it occurred and the origin words expressing it;</p>
      <p>C- Results Visualization: This component enables exploring the results generated by the
tool. It resumes them in a sunburst graph (see the left part of Figure 2) to visualize at the first
level (near the center) all topics discovered by the tool (eg: screen, computer, etc.). Topics size is
proportional to their occurrences number. For each topic, the associated reasons (i.e. sub-topics)
are presented in the third level of the sunburst and are categorized by sentiments (positive or
negative) in the second level. The tool ofers the possibility of displaying all verbatim records
associated with a specific sub-topic by clicking on it (see the right part of Figure 2). It highlights
its positions of occurrences in the verbatim records to quickly locate them. This enables business
experts to better interpret the business context of each discovered topic/sub-topic.</p>
      <p>D- Results Customization: Using the tool, business experts can customize visualizations
based on their business needs and their own vocabulary. In fact, it is possible to:(i) Modify
topics and sub-topics labels (globally or at each occurrence) and their degree of granularity. For
instance, they can change the assignation of a set of sub-topics having in-common aspects to
create new topics. In the video demo, we show how it is possible to easily create "Slowness"
topic from sub-topics automatically associated to other topics related to computer workstation
equipments; (ii) Load topics or sub-topics from answers of close-ended questions to combine
them with the tool results (e.g. visualize sentiment reasons related to each ofice application);
and (iii) Export the log of sentiment reasons obtained from the data analysis component (See the
example provided in the figure 3). This provides explainability concerning the words leading
the tool to assign a specific topic and sub-topic. Such functionality enables business experts to
generate customized visualizations and reports according to their need (e.g. Pareto Diagram).</p>
    </sec>
    <sec id="sec-3">
      <title>3. Maturity and Experiments</title>
      <p>
        The implemented solution is accessible in our organisation via openshift. The front-end of
DDSM is implemented using Dash and the backend-end is implemented using Python. We
show in this video an example of application of the tool on a survey containing 1349 verbatim
records of employee workstation equipements. The solution that was the object of extension and
implementation in DDSM was validated in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We were able to conduct several studies covering
additional business domains (e.g. network equipment installation, business applications, etc).
We validated the results with business experts that reported that the major advantage of the
tool is its ability to generate the main topics and sub-topics in unsupervised way (which means
without human intervention) able to be interpretable and adapted for business need. We show in
Figure 4 two examples of experiment results defending such observation. The first one compares
the distribution of topics as discovered by DDSM (Graph a) with those manually annotated
(Graph b). It shows that the top topics obtained from manual annotations (i.e. Computer,
screen, Headset, mobile) figure among the ones automatically generated by our tool. Only the
"Slowness" topic does not explicitly appear, but as discussed previously, it could be easily inferred
from the sub-topics of the other topics (e.g. computer, application). The second experiment
shows that the manually defined sub-topics (e.g. size of screen in Graph d) of a selected topic
(i.e. Screen) figures among those automatically generated (e.g. small and big screen).
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and future works</title>
      <p>In this paper, we presented DDSM wich analyses the verbatim records. We intend to leverage
the studied use cases to communicate across all directions within our organisation and visually
demonstrate potential gains. Currently, we are studying how the large language models (LLM)
could enhance DDSM performances. In fact, with the actual publicly available LLM, we could
not exceed a limited number of tokens (i.e. 4000) which leads to the analysis of only some dozens
of verbatim records at a single run; a normalization step is needed for covering thousands of
records. Additionally, a huge resources in terms of RAM and GPU (e.g. 140 GB for LLAMA2) is
needed, which complicates their integration in our tool. Getting data out of our organisation
to be processed in a data center (as the case of ChatGPT) is also unfeasible for confidentiality
purposes. Instead, we see the opportunity to use these LLM to generate more learning examples
to improve some key steps (e.g. regrouping similar expressions, generate sub-topic labels).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , T. Guo,
          <string-name>
            <given-names>L.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Modeling label-wise syntax for fine-grained sentiment analysis of reviews via memory-based neural model</article-title>
          ,
          <source>Information Processing &amp; Management</source>
          <volume>58</volume>
          (
          <year>2021</year>
          )
          <fpage>102641</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Xiong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Multiple-element joint detection for aspect-based sentiment analysis</article-title>
          ,
          <source>Knowledge-Based Systems</source>
          <volume>223</volume>
          (
          <year>2021</year>
          )
          <fpage>107073</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>M.</given-names>
            <surname>Shams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Khoshavi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraani-Dastjerdi</surname>
          </string-name>
          ,
          <article-title>Lisa: language-independent method for aspect-based sentiment analysis</article-title>
          ,
          <source>IEEE Access 8</source>
          (
          <year>2020</year>
          )
          <fpage>31034</fpage>
          -
          <lpage>31044</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>J.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Fang</surname>
          </string-name>
          ,
          <article-title>Toward multi-label sentiment analysis: a transfer learning based approach</article-title>
          ,
          <source>Journal of Big Data</source>
          <volume>7</volume>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>26</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <article-title>Aspect-based sentiment analysis with gated alternate neural network, Knowledge-Based Systems 188 (</article-title>
          <year>2020</year>
          )
          <fpage>105010</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Peng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Xu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Bing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Lu</surname>
          </string-name>
          , L. Si,
          <article-title>Knowing what, how and why: A near complete solution for aspect-based sentiment analysis</article-title>
          ,
          <source>in: Proceedings of the AAAI conference on artificial intelligence</source>
          , volume
          <volume>34</volume>
          ,
          <year>2020</year>
          , pp.
          <fpage>8600</fpage>
          -
          <lpage>8607</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Elleuch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O. Alaoui</given-names>
            <surname>Ismaili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Laga</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Gaaloul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Benatallah</surname>
          </string-name>
          ,
          <article-title>Discovering activities from emails based on pattern discovery approach</article-title>
          ,
          <source>in: Business Process Management Forum</source>
          <year>2020</year>
          , Seville, Spain,
          <source>September 13-18</source>
          , Springer,
          <year>2020</year>
          , pp.
          <fpage>88</fpage>
          -
          <lpage>104</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>