<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LitSens: An Improved Architecture for Adaptive Music Using Text Input and Sentiment Analysis</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Manuel Lopez Iban~ez</string-name>
          <email>manuel.lopez.ibanez@ucm.es</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nahum Alvarez</string-name>
          <email>nahum.alvarez@aist.go.jp</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federico Peinado</string-name>
          <email>email@federicopeinado.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Arti cial Intelligence Research Center National Institute of Advanced Industrial Science and Technology 2-3-26 Aomi</institution>
          ,
          <addr-line>Koto-ku, Tokyo 135-0064</addr-line>
          ,
          <country country="JP">Japan</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Software Engineering and Arti cial Intelligence Complutense University of Madrid c/Profesor Jose Garc a Santesmases 9</institution>
          ,
          <addr-line>28040 Madrid</addr-line>
          ,
          <country country="ES">Spain</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>LitSens aims to be a sound system which takes into account the contingencies of real-time decision making in video games. Through this article, we present several improvements for an earlier version of our system architecture, which consisted of an emotion manager attached to predesigned, interactive texts. The main additions to said system are the possibility for users to input custom text to a dialogue manager and the automatic tagging of that text by using real time sentiment analysis, thus improving the knowledge base. We conclude that, though lacking validation due to its early development stage, LitSens can constitute an e cient method for generating adaptive music in real time, suitable for interactive experiences like video games.</p>
      </abstract>
      <kwd-group>
        <kwd>A ective Computing</kwd>
        <kwd>Sound Design</kwd>
        <kwd>Natural Language</kwd>
        <kwd>Processing</kwd>
        <kwd>Video Games</kwd>
        <kwd>Procedural Content Generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Though it is pretty common to use sentiment analysis for the task of processing
big text corpora, such as opinions given on social media by thousands of
individuals [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], few are the cases where this eld of Natural Language Processing is
used to create real-time responses to brief text inputs.
      </p>
      <p>
        Through this article we aim to present a revision of a previously designed
audio system architecture [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], to which we added the ability to input text, as well
as a real-time sentiment analyser, based on Synesketch, by Uros Krcadinac [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]
(lexicon-based). After a brief introduction to the current state of the art, our
system architecture will be presented in detail, with special focus on the newest
additions, and we will end with a discussion on the potential of an adaptive
sound system for soundtrack creation, and some suggestions for future work.
      </p>
    </sec>
    <sec id="sec-2">
      <title>Adaptive sound and music</title>
      <p>
        The concept of adaptive music has changed profusely in recent years. Classic
approaches to dynamic and adaptive audio systems [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] consisted mostly of smooth
transitions between scripted events, predesigned narrative interactions or
cinematic sequences. However, the increasing complexity of interactive experiences
of our days demands a very di erent approach. Current adaptive music must
consider player reactions and adapt to them in real time, thus generating an
adequate emotional atmosphere which is unique to each experience. Here is where
procedural music generation comes into play. However, in spite of the e orts of
authors like Jewell, Nixon &amp; Prugel-Bennett [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which add semantic information
to the automatic music generation process, improving its results and bringing
them closer to commercial soundtracks, available technology still constitutes a
big impediment to real time music generation. Producing realistic music in a
current computer (even a powerful one) is a demanding process, which requires
enormous processing capacities, as well as high read and write speeds from a
hard disk drive.
      </p>
      <p>
        As Collins states [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], available technology is one of the main issues when
generating procedural music, and this problem is not going to be solved soon, as
current soundtracks can take minutes to generate when using virtual instruments
(VSTi), and the ideal latency should be of a few milliseconds (current gaming
monitors claim latencies of a mere 1 to 5 ms).
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>The LitSens architecture</title>
      <p>
        Our software architecture is designed to work in Unity, a popular game engine3,
and functions as follows (see Figure 1):
{ LitSens works as an interactive experience in which there is narrative content
in the form of text.
{ For the design of our system, we take into account three main components:
a dialogue system, an audio system and an emotion manager.
{ At the beginning of the experience, a text with narrative content is shown
in a text box. The user can then use a text eld to input a response, which
will produce an output consisting of the next sentence of the narrative.
{ The input is then processed by a sentiment analyser, which provides several
values. The three values with more weight are then normalized and stored
by the Emotion Manager.
{ The Audio System reads those values from the Emotion Manager, and selects
a maximum of three music fragments, which start playing through the game
engine after a simple mix process. As shown in Figure 2, all music fragments
are previously designed by a human, tagged with an emotion and stored in
a database.
3 www.unity3d.com
{ Following inputs will modify the state of the Emotion Manager, producing
di erent fragment combinations and causing them to change in real time.
{ A total of 6 emotions are considered, following Ekman's taxonomy of basic
emotions [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: happiness, sadness, fear, surprise, disgust and anger.
{ We have chosen a 3-layer system so as to be able to represent multiple
emotions at the same time. J rgensen [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] states that game audio has a dual
role: it supports the general feeling of an environment, but also gives vital
information during gameplay. By having three layers, we can include an
atmospheric track based on past feelings and common dual emotions like
\happiness-sadness", \fear-disgust" or \disgust-anger".
      </p>
    </sec>
    <sec id="sec-4">
      <title>Discussion and future work</title>
      <p>An audio system like the one we present here has, in our opinion, more potential
for creating truly adaptive soundtracks for interactive experiences than totally
procedural content. As all fragments in the audio database must be created by a
human, they can sound credible and realistic, while maintaining enough
exibility to adapt to a variety of player responses. This, however, means there must
exist some human work behind the scenes for the system to work. Still, as most
user reactions are di cult to plan or design in advance, our approximation is to
allow for complex player decisions while producing a fully adaptive soundtrack
at a relatively low cost.</p>
      <p>
        The next natural step for our system would be to validate its functioning
with real subjects in an interactive experiment resembling a commercial,
narrative video game. During said experiment, we should be able to recover two
fundamental pieces of information. On one hand, we intend to retrieve data
regarding the sort of emotions users feel while playing our game, by means of a
tool like the Self-Assessment Manikin Test (SAM) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. On the other, we will
evaluate how having an adaptive music system in uences presence [11{14].
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgements</title>
      <p>This article was funded by the Complutense University of Madrid (grant
CT27/16CT28/16 for predoctoral research), in collaboration with Santander Bank and
NIL research group.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>A.</given-names>
            <surname>Pak</surname>
          </string-name>
          and
          <string-name>
            <given-names>P.</given-names>
            <surname>Paroubek</surname>
          </string-name>
          , \
          <article-title>Twitter as a Corpus for Sentiment Analysis and Opinion Mining,"</article-title>
          <source>In Proceedings of the Seventh Conference on International Language Resources and Evaluation</source>
          , pp.
          <volume>1320</volume>
          {
          <issue>1326</issue>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>M.</given-names>
            <surname>Lopez</surname>
          </string-name>
          <article-title>Iban~ez</article-title>
          , N. Alvarez, and
          <string-name>
            <given-names>F.</given-names>
            <surname>Peinado</surname>
          </string-name>
          , \
          <article-title>Towards an Emotion-Driven Adaptive System for Video Game Music,"</article-title>
          <source>in Proceedings of the 14th International Conference on Advances in Computer Entertainment Technology (Pending publication)</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>U.</given-names>
            <surname>Krcadinac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>V.</given-names>
            <surname>Devedzic</surname>
          </string-name>
          , \
          <article-title>Synesketch: An Open Source Library for Sentence-Based Emotion Recognition,"</article-title>
          <source>IEEE Transactions on A ective Computing</source>
          , vol.
          <volume>4</volume>
          , no.
          <issue>3</issue>
          , pp.
          <volume>312</volume>
          {
          <issue>325</issue>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>U.</given-names>
            <surname>Krcadinac</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Jovanovic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Devedzic</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Pasquier</surname>
          </string-name>
          , \
          <article-title>Textual A ect Communication and Evocation Using Abstract Generative Visuals,"</article-title>
          <source>IEEE Transactions on Human-Machine Systems</source>
          , vol.
          <volume>46</volume>
          , no.
          <issue>3</issue>
          , pp.
          <volume>370</volume>
          {
          <issue>379</issue>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. W. Strank, \
          <article-title>The Legacy of iMuse: Interactive Video Game Music in the 1990s," in Music and Game: Perspectives on a popular alliance</article-title>
          , pp.
          <volume>81</volume>
          {
          <issue>91</issue>
          , Wiesbaden: Springer Fachmedien Wiesbaden,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>M. O.</given-names>
            <surname>Jewell</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. S.</given-names>
            <surname>Nixon</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Prugel-Bennett</surname>
          </string-name>
          ,
          <article-title>\CBS: A concept-based sequencer for soundtrack composition,"</article-title>
          <source>in Proceedings - 3rd International Conference on WEB Delivering of Music, WEDELMUSIC</source>
          <year>2003</year>
          , pp.
          <volume>105</volume>
          {
          <issue>108</issue>
          ,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. K. Collins, \
          <article-title>An Introduction to Procedural Music in Video Games," Contemporary Music Review</article-title>
          , vol.
          <volume>28</volume>
          , pp.
          <volume>5</volume>
          {
          <issue>15</issue>
          , feb
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <given-names>P.</given-names>
            <surname>Ekman</surname>
          </string-name>
          , \
          <article-title>An argument for basic emotions,"</article-title>
          <source>Cognition &amp; Emotion</source>
          , vol.
          <volume>6</volume>
          , no.
          <issue>3</issue>
          , pp.
          <volume>169</volume>
          {
          <issue>200</issue>
          ,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>K.</surname>
          </string-name>
          <article-title>J rgensen, "What Are These Grunts and Growls Over There?" Computer Game Audio and Player Action</article-title>
          .
          <source>PhD thesis</source>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>M. M. Bradley</surname>
            and
            <given-names>P. J.</given-names>
          </string-name>
          <string-name>
            <surname>Lang</surname>
          </string-name>
          , \
          <article-title>Measuring emotion: the Self-Assessment Manikin and the Semantic Di erential</article-title>
          .,
          <source>" Journal of behavior therapy and experimental psychiatry</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>1</issue>
          , pp.
          <volume>49</volume>
          {
          <issue>59</issue>
          ,
          <year>1994</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11. W. Bar eld and S. Weghorst, \
          <article-title>The sense of presence within virtual environments: A conceptual framework," Advances in Human Factors Ergonomics</article-title>
          , vol.
          <volume>19</volume>
          , p.
          <fpage>699</fpage>
          ,
          <year>1993</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <given-names>B. G.</given-names>
            <surname>Witmer and M. J. Singer</surname>
          </string-name>
          , \
          <article-title>Measuring presence in virtual environments: A presence questionnaire,"</article-title>
          <source>Presence: Teleoperators and Virtual Environments</source>
          , vol.
          <volume>7</volume>
          , no.
          <issue>3</issue>
          , pp.
          <volume>225</volume>
          {
          <issue>240</issue>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>M. Lombard</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          <string-name>
            <surname>Ditton</surname>
          </string-name>
          , and L. Weinstein, \
          <article-title>Measuring Presence: The Temple Presence Inventory,"</article-title>
          <source>in Proceedings of the 12th Annual International Workshop on Presence</source>
          , pp.
          <volume>1</volume>
          {
          <issue>15</issue>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>J. Lessiter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Freeman</surname>
            , E. Keogh, and
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Davido</surname>
          </string-name>
          , \
          <article-title>A cross-media presence questionnaire: The ITC-Sense of Presence Inventory,"</article-title>
          <source>Presence: Teleoperators and Virtual Environments</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>