Logic-Guided Neural Utterance Generation from Drone Sensory Data (Extended Abstract)? Stefan Borgwardt2 , Ernie Chang1 , Kathryn Chapman1 , Vera Demberg1 , Alisa Kovtunova2 , and Hui-Syuan Yeh1 1 Department of Computer Science, Saarland Informatics Campus, Germany {cychang@coli,s8kachap@teams,demberg@lst,yehhui@coli}.uni-saarland.de 2 Chair for Automata Theory, Technische Universität Dresden, Germany firstname.lastname@tu-dresden.de Drone technology and drone control have recently advanced rapidly, to the point that consumer drones with impressive features and capabilities are com- monplace [4]. Advanced sensors and improved control algorithms have made flying drones much simpler and many drone applications have become possible (e.g. aerial surveys, mapping, aerial movies or selfie-drones). As they are used for an increasing range of tasks, interacting with drones becomes more important. To enable these interactions in everyday life, it is essential to devise a natural language generation (NLG) setup that can flexibly process a variety of data collected by the drone and convey only the important information to the user. In this paper, we propose a neural generation model (or drone assistant) that verbalizes messages from sensor data records in order to perform a controlled handover to a human drone pilot (see Figure 1). Recent data-driven methods have achieved good performance on various NLG tasks [2, 3, 6]. However, most studies focus on surface descriptions of simple record sequences, e.g. attribute-value pairs of fixed or very limited schemas, such as E2E [7] and WikiBio [5]. In contrast, in our setup there is a large variety of data records, and the content selection task is substantially harder (only critical information, not all available information, should be mentioned at handover time). Moreover, it is desirable that the system generalizes well to diverse as well as unseen environments during its operation. To this end, we argue that it is necessary to leverage intermediate content representations to achieve faithful and controllable text generation in different environments. In this paper, these intermediate representations are generated using description logic (DL) ontologies [1]. This removes the burden of logical reasoning from the neural model and allows for more flexible and high quality utterances to be produced. Our Contribution We study this approach on a new dataset called DroneParrot that consists of 316 data records derived from real drone footage across 8 environments, such as ? Abstract of a manuscript in preparation for submission to ACL 2022. Copyright © 2021 for this paper by its authors. Use permitted under Creative Com- mons License Attribution 4.0 International (CC BY 4.0). 2 S. Borgwardt, E. Chang, K. Chapman, V. Demberg et al. A tree is in path within 0.3 meter and the battery is low. Please resume human control. Handover Message Drone Pilot Autonomous Drone Fig. 1. We focus on the drone handover as the main communicative function. Urban and Ocean. Each sensor data record is combined with a natural language utterance and an intermediate representation generated by DL reasoning. The raw sensor data includes values such as wind speed, altitude, temperature, battery level, and information about nearby objects. To select the critical information for a handover, we define a DL ontology that derives additional information about the current situation. Several high-level concepts like RiskOfPhysicalDamage are defined to identify critical situations, using axioms like ∃near.Object u ∃environment.LowVisibility v RiskOfPhysicalDamage. For a data record (i.e. an ABox) that entails RiskOfPhysicalDamage(drone), we compute all ABox justifications of this assertion. We do not include TBox axioms in these justifications since there is no time to explain the whole reasoning process during a handover situation. The union of all obtained justifications is a subset of the original data record, which forms the intermediate representation in our dataset. This shorter representation is used by the neural model to generate a focused handover message. The reduced size of the intermediate representation enables learning with fewer training samples and better generalization across different environments. The drone ontology and an example of two videos, annotated with the objects present on these records, the DL intermediate representations and the generated utterance are available publicly 3 . Since the ontology and the queries are fixed we did not use a reasoner to perform query rewriting. Instead, all the rewritings were computed manually and hard-coded in Google Apps Script to make it available already in the process of time-consuming manual video annotations. In our experiments, we observed that the size of the raw data records varies a lot between environments, with averages from 12.56 assertions for some environ- ments to 168.85 assertions for others. However, after computing the justifications, on average only 1.68–3.26 assertions remain. Comparing the performance of the natural language generation with several baselines as well as state-of-the-art methods, we observe that all of them benefit from the additional pre-processing step of computing the intermediate representation – with differences up to 37.36 3 https://cloud.perspicuous-computing.science/s/rPqKAQoWXiq2QSQ Logic-Guided Utterance Generation for Drones (Extended Abstract) 3 BLEU points (measuring how close the generated utterances are to the gold standard). A manual evaluation of 100 randomly selected samples also revealed an increased quality of the generated text as well as fewer errors in terms of missing important facts or hallucinated facts. To evaluate the scalability, we also exposed the system to environments not included in the training dataset. While the performance decreases in this setting, again including DL reasoning makes a big difference, as it reduces all data records to a similar form, i.e. it includes only the facts relevant for the handover. In future work, we would like to make this approach more automated by also learning the TBox (classifying the situations of interest) from the raw data. Additionally, including TBox axioms in the intermediate representation may enable us to generate more detailed and naturalistic explanations for situations that are not time-critical. Acknowledgements. This work was supported by the DFG grant 389792660 (TRR 248) (see https://perspicuous-computing.science). References 1. Baader, F., Calvanese, D., McGuinness, D.L., Nardi, D., Patel-Schneider, P.F. (eds.): The Description Logic Handbook: Theory, Implementation, and Applications. Cam- bridge University Press, 2 edn. (2007) 2. Chen, Z., Eavani, H., Liu, Y., Wang, W.Y.: Few-shot nlg with pre-trained language model. In: Proceedings of the 58th Annual Meeting of the Association for Computa- tional Linguistics (ACL). pp. 183–190 (2020). https://doi.org/10.18653/v1/2020.acl- main.18 3. Freitag, M., Roy, S.: Unsupervised natural language generation with denoising autoencoders. In: Riloff, E., Chiang, D., Hockenmaier, J., Tsujii, J. (eds.) Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing, Brussels, Belgium, October 31 - November 4, 2018. pp. 3922–3929. Association for Computational Linguistics (2018). https://doi.org/10.18653/v1/d18-1426 4. Fuhrman, T., Schneider, D., Altenberg, F., Nguyen, T., Blasen, S., Constantin, S., Waibe, A.: An interactive indoor drone assistant. In: 2019 IEEE/RSJ Interna- tional Conference on Intelligent Robots and Systems (IROS). pp. 6052–6057 (2019). https://doi.org/10.1109/IROS40897.2019.8967587 5. Lebret, R., Grangier, D., Auli, M.: Neural text generation from structured data with application to the biography domain. In: Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing, EMNLP 2016, Austin, Texas, USA, November 1-4, 2016. pp. 1203–1213 (2016), http://aclweb.org/anthology/ D/D16/D16-1128.pdf 6. Liu, T., Wang, K., Sha, L., Chang, B., Sui, Z.: Table-to-text generation by structure- aware seq2seq learning. In: Proceedings of the Thirty-Second AAAI Conference on Artificial Intelligence, (AAAI-18), the 30th innovative Applications of Artificial Intelligence (IAAI-18), and the 8th AAAI Symposium on Educational Advances in Artificial Intelligence (EAAI-18), New Orleans, Louisiana, USA, February 2-7, 2018. pp. 4881–4888 (2018), https://www.aaai.org/ocs/index.php/AAAI/AAAI18/ paper/view/16599 4 S. Borgwardt, E. Chang, K. Chapman, V. Demberg et al. 7. Novikova, J., Dusek, O., Rieser, V.: The E2E dataset: New challenges for end-to- end generation. In: Proceedings of the 18th Annual SIGdial Meeting on Discourse and Dialogue, Saarbrücken, Germany, August 15-17, 2017. pp. 201–206 (2017), https://aclanthology.info/papers/W17-5525/w17-5525