<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Logic-Guided Neural Utterance Generation from Drone Sensory Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefan Borgwardt</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ernie Chang</string-name>
          <email>cychang@coli</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kathryn Chapman</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vera Demberg</string-name>
          <email>demberg@lst</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alisa Kovtunova</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Hui-Syuan Yeh</string-name>
          <email>yehhui@coli</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Our Contribution</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Chair for Automata Theory, Technische Universität Dresden</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Computer Science, Saarland Informatics Campus</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Drone technology and drone control have recently advanced rapidly, to the point that consumer drones with impressive features and capabilities are commonplace [4]. Advanced sensors and improved control algorithms have made flying drones much simpler and many drone applications have become possible (e.g. aerial surveys, mapping, aerial movies or selfie-drones). As they are used for an increasing range of tasks, interacting with drones becomes more important. To enable these interactions in everyday life, it is essential to devise a natural language generation (NLG) setup that can flexibly process a variety of data collected by the drone and convey only the important information to the user. In this paper, we propose a neural generation model (or drone assistant) that verbalizes messages from sensor data records in order to perform a controlled handover to a human drone pilot (see Figure 1). Recent data-driven methods have achieved good performance on various NLG tasks [2, 3, 6]. However, most studies focus on surface descriptions of simple record sequences, e.g. attribute-value pairs of fixed or very limited schemas, such as E2E [7] and WikiBio [5]. In contrast, in our setup there is a large variety of data records, and the content selection task is substantially harder (only critical information, not all available information, should be mentioned at handover time). Moreover, it is desirable that the system generalizes well to diverse as well as unseen environments during its operation. To this end, we argue that it is necessary to leverage intermediate content representations to achieve faithful and controllable text generation in different environments. In this paper, these intermediate representations are generated using description logic (DL) ontologies [1]. This removes the burden of logical reasoning from the neural model and allows for more flexible and high quality utterances to be produced. We study this approach on a new dataset called DroneParrot that consists of 316 data records derived from real drone footage across 8 environments, such as</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>A tree is in path within 0.3 meter and the
battery is low. Please resume human control.  
Drone Pilot</p>
      <p>Autonomous Drone
Urban and Ocean. Each sensor data record is combined with a natural language
utterance and an intermediate representation generated by DL reasoning. The
raw sensor data includes values such as wind speed, altitude, temperature, battery
level, and information about nearby objects. To select the critical information for
a handover, we define a DL ontology that derives additional information about
the current situation. Several high-level concepts like RiskOfPhysicalDamage are
defined to identify critical situations, using axioms like</p>
      <p>∃near.Object u ∃environment.LowVisibility v RiskOfPhysicalDamage.
For a data record (i.e. an ABox) that entails RiskOfPhysicalDamage(drone), we
compute all ABox justifications of this assertion. We do not include TBox axioms
in these justifications since there is no time to explain the whole reasoning process
during a handover situation. The union of all obtained justifications is a subset
of the original data record, which forms the intermediate representation in our
dataset. This shorter representation is used by the neural model to generate a
focused handover message. The reduced size of the intermediate representation
enables learning with fewer training samples and better generalization across
different environments.</p>
      <p>The drone ontology and an example of two videos, annotated with the objects
present on these records, the DL intermediate representations and the generated
utterance are available publicly 3. Since the ontology and the queries are fixed we
did not use a reasoner to perform query rewriting. Instead, all the rewritings were
computed manually and hard-coded in Google Apps Script to make it available
already in the process of time-consuming manual video annotations.</p>
      <p>In our experiments, we observed that the size of the raw data records varies a
lot between environments, with averages from 12.56 assertions for some
environments to 168.85 assertions for others. However, after computing the justifications,
on average only 1.68–3.26 assertions remain. Comparing the performance of the
natural language generation with several baselines as well as state-of-the-art
methods, we observe that all of them benefit from the additional pre-processing
step of computing the intermediate representation – with differences up to 37.36
3 https://cloud.perspicuous-computing.science/s/rPqKAQoWXiq2QSQ
Logic-Guided Utterance Generation for Drones (Extended Abstract)
BLEU points (measuring how close the generated utterances are to the gold
standard). A manual evaluation of 100 randomly selected samples also revealed
an increased quality of the generated text as well as fewer errors in terms of
missing important facts or hallucinated facts. To evaluate the scalability, we also
exposed the system to environments not included in the training dataset. While
the performance decreases in this setting, again including DL reasoning makes a
big difference, as it reduces all data records to a similar form, i.e. it includes only
the facts relevant for the handover.</p>
      <p>In future work, we would like to make this approach more automated by
also learning the TBox (classifying the situations of interest) from the raw data.
Additionally, including TBox axioms in the intermediate representation may
enable us to generate more detailed and naturalistic explanations for situations
that are not time-critical.</p>
      <p>Acknowledgements. This work was supported by the DFG grant 389792660
(TRR 248) (see https://perspicuous-computing.science).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Baader</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Calvanese</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McGuinness</surname>
            ,
            <given-names>D.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nardi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Patel-Schneider</surname>
            ,
            <given-names>P.F</given-names>
          </string-name>
          . (eds.):
          <article-title>The Description Logic Handbook: Theory, Implementation, and Applications</article-title>
          . Cambridge University Press,
          <volume>2</volume>
          <fpage>edn</fpage>
          . (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eavani</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          , Liu,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Wang</surname>
          </string-name>
          , W.Y.:
          <article-title>Few-shot nlg with pre-trained language model</article-title>
          .
          <source>In: Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics (ACL)</source>
          . pp.
          <fpage>183</fpage>
          -
          <lpage>190</lpage>
          (
          <year>2020</year>
          ). https://doi.org/10.18653/v1/
          <year>2020</year>
          .aclmain.18
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Freitag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roy</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>Unsupervised natural language generation with denoising autoencoders</article-title>
          . In: Riloff,
          <string-name>
            <given-names>E.</given-names>
            ,
            <surname>Chiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Hockenmaier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Tsujii</surname>
          </string-name>
          ,
          <string-name>
            <surname>J</surname>
          </string-name>
          . (eds.)
          <source>Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing</source>
          , Brussels, Belgium,
          <source>October 31 - November 4</source>
          ,
          <year>2018</year>
          . pp.
          <fpage>3922</fpage>
          -
          <lpage>3929</lpage>
          . Association for Computational Linguistics (
          <year>2018</year>
          ). https://doi.org/10.18653/v1/d18-
          <fpage>1426</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Fuhrman</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schneider</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Altenberg</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nguyen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Blasen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Constantin</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Waibe</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>An interactive indoor drone assistant</article-title>
          .
          <source>In: 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS)</source>
          . pp.
          <fpage>6052</fpage>
          -
          <lpage>6057</lpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1109/IROS40897.
          <year>2019</year>
          .8967587
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Lebret</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Grangier</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auli</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Neural text generation from structured data with application to the biography domain</article-title>
          .
          <source>In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP</source>
          <year>2016</year>
          , Austin, Texas, USA, November 1-
          <issue>4</issue>
          ,
          <year>2016</year>
          . pp.
          <fpage>1203</fpage>
          -
          <lpage>1213</lpage>
          (
          <year>2016</year>
          ), http://aclweb.org/anthology/ D/D16/D16-1128.pdf
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sha</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sui</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Table-to-text generation by structureaware seq2seq learning</article-title>
          .
          <source>In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence</source>
          ,
          <source>(AAAI-18)</source>
          ,
          <source>the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18)</source>
          , New Orleans, Louisiana, USA, February 2-
          <issue>7</issue>
          ,
          <year>2018</year>
          . pp.
          <fpage>4881</fpage>
          -
          <lpage>4888</lpage>
          (
          <year>2018</year>
          ), https://www.aaai.org/ocs/index.php/AAAI/AAAI18/ paper/view/16599
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Novikova</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dusek</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rieser</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The E2E dataset: New challenges for end-toend generation</article-title>
          .
          <source>In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue</source>
          , Saarbrücken, Germany,
          <source>August 15-17</source>
          ,
          <year>2017</year>
          . pp.
          <fpage>201</fpage>
          -
          <lpage>206</lpage>
          (
          <year>2017</year>
          ), https://aclanthology.info/papers/W17-5525/w17-
          <fpage>5525</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>