<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>State Machine-Based Multimodal Dialogue System for the Elderly Care Service⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Takatsugu Suzaki</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Masayuki Numao</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>The University of Electro-Communications</institution>
          ,
          <addr-line>1-5-1 Chofugaoka, Chofu, Tokyo</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Dialogue is a useful communication way between human and robot. Dialogue management is important to gather the user's intent, select a task to satisfy the user's request, and make a QA session to perform a specific task, all of which should be performed by natural conversation. We proposed a multi-scenario task-oriented dialogue system based on finite state machine(FSM). Our state machine has common state transitions and they enable users to write scenarios easily and flexibly. FSM based dialogue scenarios are suitable for common state transitions. Multimodal dialogue also requires to care the elderly for dementia diagnosis and daily conversation. We applied the proposed system to diagnoses dementia.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;multimodal dialogue</kwd>
        <kwd>dialogue system</kwd>
        <kwd>state machine</kwd>
        <kwd>HDS-R</kwd>
        <kwd>health care</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Dialogue systems are a flexible and easy way for humans
and robots to communicate. Task oriented dialogue can
be used to solve specific domain tasks (e.g. hotel
reservation, guides, etc.)[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Spoken dialogue systems such
as Siri, Alexais widely used. However, human
conversation occurs not only through speech, but also through
gestures and facial expressions[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In addition, in recent
years, the field in dialogue system is mainstream by
endto-end approach, which enables flexible interaction with
people[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. However, this disadvantages are that it
requires a large amount of data for training and that it is
dificult to change input/output modules according to the
user’s environment. Therefore, we propose a rule-based
and multimodal dialog system. Our system allows for
natural conversations while it requires people to write
scenarios. Although our system is a general dialogue
system, the scenarios focus on the elderly care, specifically
dementia diagnosis in this paper.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Social Responsible AI for</title>
    </sec>
    <sec id="sec-3">
      <title>Well-being</title>
      <p>Recently, the number of patients is increasing due to an
aging society. Early detection of dementia is one of
important factors in terms of elderly care. Although specialists
diagnose dementia, this is a problem that is increasing</p>
    </sec>
    <sec id="sec-4">
      <title>3. Definition of Dialogue Scenarios</title>
      <p>Domain-specific tasks rely on pre-defined rules based on
ifnite state machines. A task is defined by a XML file
defined uniquely. In the state, it is defined that slots
requested from users and system’s actions, and cooperation
with KB and external services. Figure 1 and 2 show an
example of scenario and dialogue in HDS-R.</p>
      <p>No Contens Mean
Q1 Did you achieve the objectives of the dialogue? 4.06
Q2 Were you able to interact with them in a natural, human way? 3.59
Q3 Did the dialogue go smoothly? 4.06
Q4 Would you like to interact with the robot again? 3.82
Q5 Did the robot perform as expected? 3.59
Q6 Comments
STD
0.97
1.28
1.14
1.01
1.23</p>
    </sec>
    <sec id="sec-5">
      <title>4. Common State Transition</title>
      <p>The system normally chooses the next state when the
required slot is satisfied in that state, although we
propose special state transition to have flexibility in state
transitions.For example, "repeat" transition repeats the
same state, "skip" transitions to the next state regardless
of whether slots are filled, "cancel" terminates task’s state
machine.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Multimodal Interface</title>
      <p>The role of input events is to perform slot filling and
the role of output events is to generate actions. Figure 3
shows the graph of input modules. Multimodal process
is possible by linking each module in a graph.</p>
    </sec>
    <sec id="sec-7">
      <title>6. Experiments</title>
      <p>Two experiments were carried out to evaluate success
rate and fluency of dialogue and validity of HDS-R
diagnose. First, a survey were conducted to evaluate success
rate and fluency of dialogue on 18 people. The survey
was rated on 5 scale. Table 1 shows contents and results
of survey. The results showed high success rate was
achieved while the fluency needs to be improved.</p>
      <p>Second, our system diagnosed HDS-R to 15 subjects
pretending to be elderly and compared the results of
manual and system scoring. Figure 4 shows a radar chart
of the results for a subject. The RMSE of the total score
was 2 points, confirming that it can be scored as well as
a human.</p>
    </sec>
    <sec id="sec-8">
      <title>7. Conclusion</title>
      <p>We proposed a multimodal dialogue system based on
state machine. Our system enables to write scenarios
easily and flexibly because of common state transitions.
However, our system is inferior to the end-to-end
approach in terms of dialogue fluency. In the future, we
also plan to conduct a demonstration test to conrfim the
validity of the HDS-R scoring.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>This work was supported by JSPS KAKENHI Grant
Number JP20H04289 ”Functional Independence Measurement
System based on ADL Ontology for Aged Person”
URL:https://www.sciencedirect.com/science/
article/pii/S209657961930004X. doi:https://doi.
org/10.3724/SP.J.2096-5796.2018.0010.
[4] S. Kato, H. Shimogaki, A. Onodera, [development of
the revised hasegawa simplified intelligence
assessment scale (hds-r)] kaitei hasegawashiki kannichino
hyoka suke-ru (hds-r) no sakusei(in japanese),
Ronen seishin igaku zasshi(Japanese Psychogeriatric
Society) (1991).
[5] M. F. Folstein, “mini-mental state”. a practical method
for grading the cognitive state of patients for the
clinician, J Psychiatr Res (1975).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <source>A Survey on Dialogue Systems: Recent Advances and New Frontiers</source>
          , volume
          <volume>19</volume>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wanner</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>André</surname>
          </string-name>
          ,
          <article-title>Kristina: A knowledge-based virtual conversation agent</article-title>
          ,
          <source>in: Advances in Practical Applications of Cyber-Physical Multi-Agent Systems: The PAAMS Collection</source>
          , Springer International Publishing, Cham,
          <year>2017</year>
          , pp.
          <fpage>284</fpage>
          -
          <lpage>295</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M.-H. YANG</surname>
          </string-name>
          ,
          <string-name>
            <surname>J.-H. TAO</surname>
          </string-name>
          ,
          <article-title>Data fusion methods in multimodal human computer dialog</article-title>
          ,
          <source>Virtual Reality Intelligent Hardware</source>
          <volume>1</volume>
          (
          <year>2019</year>
          )
          <fpage>21</fpage>
          -
          <lpage>38</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>