Proceedings of the 26th International Workshop on Principles of Diagnosis Automatic Model Generation to Diagnose Autonomous Systems Jorge Santos Simón1 and Clemens Mühlbacher1 and Gerald Steinbauer1 1 Institute for Software Technology e-mail: {jsantos, cmuehlba, gstein}@ist.tugraz.at Abstract RoboEarth [7]) make semi-automated derivation of mod- els possible. Despite recent advances on this area [8; 9; Autonomous systems’ dependability can be im- 10], most techniques focus on very specific applications of proved by performing diagnosis during run-time. the generated formal models. Thus, we pose the problem This can be achieved through model-based diag- of generating a common knowledge base as an interme- nosis (MBD) techniques. The required models of diate representation with a well defined semantics out of the system are for the most part handcrafted. This documents used during the system design process. From task is time consuming and error prone. To over- this central repository, different algorithms can extract dif- come this issue, we propose a framework to gen- ferent formal models for particular needs. We believe that erate formal models out of natural language doc- this work can increase the acceptance of model-based tech- uments, such as technical requirements or FMEA, niques and broaden their use. using natural language processing (NLP) tools and techniques from the knowledge representa- The motivation for this work came during the develop- tion and reasoning (KRR) domain. Therefore, we ment of a model-based diagnosis and repair (MBDR) sys- aim to enable the usage of MBD in autonomous tem for an industrial application. The aim is to improve the systems with few extra burden. So doing, we ex- dependability of a fleet of robots that automatically deliver pect a significant increase in the usage of MBD goods in a warehouse. As stated in [11], even minor fail- techniques on real-world systems. ures often prevent a robot from accomplishing its task, de- creasing the overall performance of the system. Moreover, the frequent need of human intervention increases costs and 1 Introduction customer dissatisfaction. Using MBDR techniques, many Dependability is a key feature of modern autonomous sys- of these failures can be automatically handled, allowing the tems. It can be achieved by sound design and implemen- robot to remain on service, perhaps with its capabilities tation, thorough testing and runtime diagnosing. To date, gracefully degraded [12; 13]. In extreme cases, diagnos- all these processes are still not completely automated and ing a failure on time can prevent robot behaviors harmful need substantial manual work. However, all these fields can for humans, itself or other elements in the environment. greatly benefit from the use of model-based techniques. De- Confronted with the lack of any formal model of the sign and implementation can be greatly improved through system, we were forced to manually code the models we model-driven engineering, as stated in [1]. Model-based need. However, this is both a time-consuming and error testing (MBT) has been demonstrated [2] to outperform tra- prone task, and also impose a maintaining burden as the ditional testing techniques in both invested time and number system evolves. Accordingly, we believe that a mostly au- of errors found. Model-based diagnosis (MBD) is the main tomated approach is not only convenient for the intended target of this work. It has been successfully used in indus- project but can also help extending the use of MBDR trial settings [3], reducing the need for human intervention. techniques to other projects and domains. Following this Although it has being increasingly adopted in recent years, idea, we propose a framework that, in a first step, gath- we believe that its full potential is still to be developed. ers the information from the project together with domain All model-based techniques require appropriate models and common-sense knowledge in a machine-understandable of the system. As stated in [4; 5], creating these models knowledge base. Then, a suit of algorithms can extract for- is the most prevalent limiting factor for their adoption. To mal models from this knowledge base for particular pur- overcome this barrier, we propose a method that automates poses. Though our aim is to automate the process as much models creation from the documents used during the sys- as possible, human assistance will be requested whenever tem design. These comprise requirements documents, ar- some pieces of information are missing or contradictory [14; chitectural designs, FMEA and FTA, among others. The 15]. content of these documents is often given in natural lan- The novelty of our proposal is two-fold: first, we empha- guage and in semi-structured form and lacks a common sizes the usability of the resulting models for MBD. Second, semantics. Thus, the contained information is not acces- we aim to integrate all the sources of information typically sible for a computer. However, advances in natural lan- available in an industrial development process, such as re- guage processing (NLP) and the availability of common quirements, architecture, and failure modes. As a result, we sense and domain-specific knowledge bases (e.g. Cyc [6], expect to boost the range and applicability of the automat- 153 Proceedings of the 26th International Workshop on Principles of Diagnosis ically generated models. To better illustrate the proposed 3 Framework overview framework, we will use a small running example extracted We propose the framework depicted in Figure 1 to transform from a real-world application. It comes to the robot’s box informal documents and knowledge into models suitable for loading operation, performed by the robot’s load handling MBD. The informal inputs (white squares with solid lines) device (LHD). are processed into intermediate representations (light gray The remainder of the paper is organized as follows: Re- squares with dashed lines) using techniques from NLP and lated research on model generation is discussed in Section KRR, as well as ontologies (e.g. Cyc). We condense them 2. Section 3 provides an overview of the proposed process. into a knowledge base together with all our knowledge about Section 4 describes the inputs used, while Section 5 de- the system and its domain. Finally, a variety of algorithms scribes the proposed NLP and KRR tool-chain to interpret can produce formal models suitable for MBD (gray squares them. Section 6 provides an example of an output model with dot-dash lines). and its use for MBD. Finally, Section 7 summarizes the pre- sented framework and discusses future work. 4 Sources of information The proposed framework takes artifacts from the design phase as inputs. We propose the use of the following four in- 2 Related research puts, though additional sources can be incorporated if avail- able: We start the brief discussion of related research with the 1. Requirements document: The technical requirements work using NLP methods to derive models. The work of document describes the expected system behavior. [9] uses NLP methods to derive a formal model out of re- Therefore, it is a mandatory input. The models’ quality quirements. This formal model can afterwards be trans- and so the resulting MBD will heavily depend on the formed into different representations to test or synthesize quality of the requirements. Thus, iterative improve- the system. The method proposed in [10] uses NLP meth- ment of the requirements and models is used, as pro- ods to derive design documents (class diagrams, etc.) out posed in [15]. For our running example, we have taken of requirements. These design documents can afterwards be four requirements that describe the box loading process used to implement the system. The authors of [8] proposes of a robot: a method to extract action receipts from websites. These (a) When the robot is docked, it lowers the barrier. action receipts comprises the desired behavior in order to (b) When the robot is ready to load, the load handling achieve a given goal. The method use how-to instructions device starts rotating backward. and NLP tools to derive an action receipt which can be ex- (c) The load handling device stops rotating back- ecuted by a robot. Missing parts are inferred with the help wards when the laser beam is triggered. of common sense knowledge about actions. In contrast to all these approaches, we propose a framework which incor- (d) After stopping the load handling device the barrier porates different information sources to get a better under- is raised. standing of the system. Furthermore, our framework gen- 2. Domain knowledge: This is the most fuzzy input, as erates different models out of an internal formal description it is available not as an artifact but as the knowledge depending on the needs of the intended diagnosis and testing and experience of the engineers involved. We dis- tasks. tinguish three kinds of knowledge. Common sense Beside NLP methods, machine learning can also be used knowledge can be provided by existing ontologies as to generate a model of the system. The work in [4] pre- Cyc [16]. Generic knowledge about the autonomous sented a method to statistically learn the model of the sys- systems domain can be provided by dedicated ontolo- tem under nominal conditions. The model describes the gies as KnowRob [17]. Particular knowledge about the static interaction of the system components. In contrast, the targeted system itself can be partially inferred from the method proposed in [5] learns the behavior of a system. The system architecture, though other parts must be pro- method infers from observed events similar/different states vided by the project engineers. The use of ontologies and merges similar ones. Furthermore, the variables in the range from providing meaning to natural language con- system for each state are estimated. Both methods are only cepts to inferring missing pieces of information. applicable if the system is already built. Instead, we create 3. Architecture: The architecture of the system defines its a model during the design phase, and so the model can be composing elements plus the relations between them. used right at the first stages of the life-cycle. It is typically described as a set of diagrams generated Missing or contradicting information must be detected during the design phase of the system. For our run- and handled when generating models. The method in [15] ning example, we use the architecture excerpt depicted tries to avoid faults in the requirements document. This is in Figure 2. It states that a robot consists of a LHD done through the transformation of the requirements into so and other unspecified elements. Furthermore, the LHD called boilerplates. Through this semi-structured text, am- consists of a laser beam, rollers and a barrier. biguities are removed and a consistent naming is enforced. 4. Failure Modes and Effects Analysis: FMEA looks at A different approach was proposed in [14] to diagnose a all potential failure modes, their effects and causes and knowledge base for consistency. If the knowledge base is determines a risk priority factor. FMEA can be used to inconsistent, the user is asked as an oracle to pinpoint the determine which potential errors are critical, how they problem. Afterwards, the user needs to fix this issue. In our can be pinpointed, and how the effects thereof can be framework, we will use ideas from both methods to derive a avoided [18]. We incorporate the failure modes into consistent knowledge base of the system. the resulting behavior models to diagnose these known 154 Proceedings of the 26th International Workshop on Principles of Diagnosis Figure 1: Abstract work-flow for the proposed framework. Starting from left with inputs in natural language, we generate models that can be applied for diagnosis (right). Figure 2: Robot architecture excerpt. The figure shows re- lations of the type part of for components of the Robot. failures. For our running example, we include the two failure modes that can occur during the load operation, depicted in Table 1. Figure 3: Sample syntax tree of the first sentence (a) of the The biggest challenge for handling all these inputs is to running example. understand semi-structured information. So, we will depict a NLP/KRR tool-chain using state-of-the-art techniques in the following section. Note for example that the 3rd person “s” has been removed from the verbs. Furthermore complex terms such as “load 5 NLP/KRR tool chain handling device” have been replaced by lhd. Finally, the propositions order is rearranged in a consistent structure. The process generates three intermediate artifacts: semi- formal text (boilerplates), syntax trees and semantic cate- 5.2 Syntax trees gories. As a showcase, we will concentrate on the require- A syntax tree comprises the information of the type of each ments of our running example, though these techniques can word in the sentence, e.g. ”lower“ is a verb. Furthermore, be extended to other textual inputs, as we will see at the end the tree specifies how the sentence is constructed with these of this section. words. For example, the syntax tree of the first require- ment in our running example is depicted in Figure 3. In this 5.1 Boilerplates syntax tree we can identify that “robot” is a noun and “the This is a semi-formal representation where most of the robot” is a so called noun phrase. An example of a tool to spelling errors, poor grammar and ambiguities have been extract syntax trees is the probabilistic context free grammar removed. Boilerplates also enforce the use of a consistent parser, described in [20]. naming scheme. There exist tools such as [19] to perform this task semi-automatically. In our example, the four re- 5.3 Semantic categories quirements become the four equivalent boilerplates: The semantic categories conceptually describe our system, (a) when the robot is docked, it lower the barrier. e.g. a transition describing the motion of an actuator. These semantic categories are hierarchical in nature, as more com- (b) when the robot is ready to load, the lhd start rotating plex and abstract concepts are composed of simpler ones, backward. e.g. a transition is composed by an action, pre and post (c) when the lb is triggered, the lhd stop backward rotation. conditions, etc. We obtain the semantic categories by pars- (d) after stopping the lhd, the barrier is raised. ing the syntax trees and applying transformation rules in a 155 Proceedings of the 26th International Workshop on Principles of Diagnosis Component Failure Observations Failure 1 Barrier Barrier stuck up Barrier stuck up regardless commands Failure 2 Load Handling Device (LHD) Rotation fail Laser beam not triggered Table 1: FMEA from the running example. 1. Relations representing a direct transition, as depicted in Figure 4. Such a transition can be directly mapped into a transition on the automaton, as can be seen in Figure 5 through the transitions from state 1 to 2. 2. Relations representing an action with a duration. Such a relation must be translated into several transitions: the start of the action, the termination event and a tran- sition to a final state. Such transformed relation is de- picted in Figure 5 through the transition from state 2 to 5. 3. Relations representing a failure of the system. The failure event is represented as a divergent path from Figure 4: Concepts created from the syntax tree in Figure 3. a normal transition. Thus, the start state is the same The word in quotes is the word as it appears in the sentence. as the one of the normal transition. Afterwards, we The word in parenthesis is the Cyc concept it belongs to. need a state representing the failure. Finally, we need an observation transition that leads to a final state rep- resenting a general failure of the system. The observ- bottom up fashion, following [8]. We start at the leafs of able transition is cased due to the fact that use a fault the syntax tree, containing single words. Each word has model which is derived from the FMEA. Thus every assigned a part-of-speech (POS) label describing its gram- fault has an observable discrepancy to the real system. matical role in the sentence. Furthermore, each word has an Additionally it is important to notice that the state rep- additional label with its WordNet [21] synset, used to de- resenting the general failure is state where the system rive its semantics from the common sense knowledge base. can exhibit arbitrary behavior. Thus we can model the From the leafs, higher level transformations can be applied lack of knowledge which impact the fault has on the to create more complex semantic categories. For example, system. The transformed failure is is depicted in Fig- on our running example we create a semantic category for ure 5 through the transitions from state 2 to 9. each word in the sentence “lower the barrier”. Then, we can derive that “lower” is an action acting on something. 4. Relations representing a failure of a system compo- We can after that use the semantic category of the word to- nent. The failure event is represented as a divergent gether with its position in the syntax tree to apply further path from a normal transition. To determine all the transformation rules. This process is repeated till the root possible affected transitions, we must perform an infer- node is reached. Then, a new semantic category is assigned ence of the effects each transition has. This inference is to the sentence capturing its semantics. For the running ex- based on common sense and domain knowledge. In our ample, the semantic category for “lower the barrier” is a running example, we can infer that lowering the barrier transition. A transition must contain a precondition, a post causes the barrier to be finally down. A failure such condition, an action and optionally an object of the action. as barrier_stuck_up can prevent this transition, and so The semantic category specifies that the action “lower” is they can share a common source state. Then, as before performed on the object “barrier”. With the help of com- we need an observation transition that leads to a final mon sense (Cyc ontology [16]) we can reason that this ac- state representing a general failure of the system. Such tion causes the “barrier” from state “up” to state “down”. a sequence is depicted in Figure 5 though the transi- Thus, we can infer the pre and post conditions of “lower”. tions from state 1 to 9 through the states 7 and 8. Finally, the semantic category together with the reasoning results are packed into statements on our knowledge base, 7 Conclusion and future work as it is depicted in Figure 4. We can incorporate other documents into the knowledge In this paper we propose a framework to automatically gen- base by using a similar NLP tool chain. However, how the erate formal models out of documents represented in semi- information is treated depends heavily on the context inher- structured form and natural language (requirements, domain ent to each document type. knowledge, architecture, failure modes, etc.). The parsed in- formation is gathered together with domain knowledge in a knowledge base. Accessing this common repository, a va- 6 Model generation for behavior diagnosis riety of algorithms can generate different kinds of models To illustrate how the framework can be used to diagnose for different purposes. Our main target is to derive models the behavior of the robot, we create an automaton as output suitable for state-of-the-art MBD techniques applied to au- model. To use techniques such as [22], the automaton must tonomous systems. We plan to implement this framework describe both nominal and faulty behaviors of the system. to assist us on creating the models required for MBD. Do- To generate this automaton from the knowledge base, we ing so, we expect to improve the dependability in the indus- use four different relations stated on it as transitions: trial application of a fleet of transport robots in a warehouse. 156 Proceedings of the 26th International Workshop on Principles of Diagnosis tory Automation (ETFA), 2011 IEEE 16th Conference on, pages 1–9. IEEE, 2011. [6] Cynthia Matuszek, John Cabral, Michael Witbrock, and John Deoliveira. An introduction to the syn- tax and content of Cyc. In Proceedings of the 2006 AAAI Spring Symposium on Formalizing and Com- piling Background Knowledge and Its Applications to Knowledge Representation and Question Answering, pages 44–49, 2006. [7] Markus Waibel, Michael Beetz, Raffaello D’Andrea, Rob Janssen, Moritz Tenorth, Javier Civera, Jos Elfring, Dorian Gálvez-López, Kai Häussermann, J.M.M. Montiel, Alexander Perzylo, Björn Schießle, Oliver Zweigle, and René van de Molengraft. RoboEarth - A World Wide Web for Robots. Robotics & Automation Magazine, 18(2):69–82, 2011. [8] Moritz Tenorth, Daniel Nyga, and Michael Beetz. Un- derstanding and executing instructions for everyday Figure 5: Automaton generated from the running example. manipulation tasks from the world wide web. In Shaded states are reached through some fault. Double cir- Robotics and Automation (ICRA), 2010 IEEE Interna- cled states represent final states. State number 9 is the gen- tional Conference on, pages 1486–1491. IEEE, 2010. eral failure state for readability the self loops with all possi- [9] Shalini Ghosh, Daniel Elenius, Wenchao Li, Patrick ble labels are omitted. Lincoln, Natarajan Shankar, and Wilfried Steiner. Automatically extracting requirements specifi- cations from natural language. arXiv preprint Besides this immediate result, we expect that the proposed arXiv:1403.3142, 2014. framework will ease the creation of formal models for other applications. Thus, we hope to contribute to the widespread [10] Sven J Körner and Mathias Landhäußer. Semantic en- use of MBD techniques, with the consequent improve of au- riching of natural language texts with automatic the- tonomous systems dependability. matic role annotation. In Natural Language Process- ing and Information Systems, pages 92–99. Springer, Acknowledgments 2010. [11] Gerald Steinbauer. A survey about faults of robots The research presented in this paper has received funding used in robocup. In Xiaoping Chen, Peter Stone, from the Austrian Research Promotion Agency (FFG) under LuisEnrique Sucar, and Tijn van der Zant, editors, grant 843468 (Guaranteeing Service Robot Dependability RoboCup 2012: Robot Soccer World Cup XVI, volume During the Entire Life Cycle (GUARD)). 7500 of Lecture Notes in Computer Science, pages 344–355. Springer Berlin Heidelberg, 2013. References [12] Gerald Steinbauer, Franz Wotawa, et al. Detecting and [1] Stuart Kent. Model driven engineering. In Michael locating faults in the control software of autonomous Butler, Luigia Petre, and Kaisa Sere, editors, Inte- mobile robots. In IJCAI, pages 1742–1743, 2005. grated Formal Methods, volume 2335 of Lecture Notes [13] Mathias Brandstötter, Michael Hofbaur, Gerald Stein- in Computer Science, pages 286–298. Springer Berlin bauer, and Franz Wotawa. Model-based fault diagnosis Heidelberg, 2002. and reconfiguration of robot drives. In 2007 IEEE/RSJ [2] Mark Utting and Bruno Legeard. Practical model- International Conference on Intelligent Robots and based testing: a tools approach. Morgan Kaufmann, Systems (IROS), San Diego, CA, USA, 2007. 2010. [14] Kostyantyn Shchekotykhin, Gerhard Friedrich, Patrick [3] Peter Struss, Raymond Sterling, Jesús Febres, Um- Rodler, and Philipp Fleiss. A direct approach to breen Sabir, and Marcus M. Keane. Combining engi- sequential diagnosis of high cardinality faults in neering and qualitative models to fault diagnosis in air knowledge-bases. In International Workshop on Prin- handling units. In European Conference on Artificial ciples of Diagnosis (DX), Graz, Austria, 2014. Intelligence (ECAI) - Prestigious Applications of Intel- [15] Bernhard K Aichernig, Klaus Hormaier, Florian Lor- ligent Systems (PAIS 2014), pages 1185–1190, 2014. ber, Dejan Nickovic, Rupert Schlick, Didier Si- [4] Safdar Zaman and Gerald Steinbauer. Automated Gen- moneau, and Stefan Tiran. Integration of Require- eration of Diagnosis Models for ROS-based Robot ments Engineering and Test-Case Generation via Systems. In International Workshop on Principles of OSLC. In Quality Software (QSIC), 2014 14th Inter- Diagnosis (DX), Jerusalem, Israel, 2013. national Conference on, pages 117–126. IEEE, 2014. [5] Dennis Klar, Michaela Huhn, and J Gruhser. Symp- [16] Stephen L Reed, Douglas B Lenat, et al. Mapping tom propagation and transformation analysis: A prag- ontologies into Cyc. In AAAI 2002 Conference Work- matic model for system-level diagnosis of large au- shop on Ontologies For The Semantic Web, pages 1–6, tomation systems. In Emerging Technologies & Fac- 2002. 157 Proceedings of the 26th International Workshop on Principles of Diagnosis [17] Moritz Tenorth, Alexander Clifford Perzylo, Reinhard Lafrenz, and Michael Beetz. The roboearth language: Representing and exchanging knowledge about ac- tions, objects, and environments. In Robotics and Au- tomation (ICRA), 2012 IEEE International Conference on, pages 1284–1289. IEEE, 2012. [18] Hongkun Zhang, Wenjun Li, and Jun Qin. Model- based functional safety analysis method for automo- tive embedded system application. In International Conference on Intelligent Control and Information Processing, 2010. [19] Stefan Farfeleder, Thomas Moser, Andreas Krall, Tor Stålhane, Herbert Zojer, and Christian Panis. Dodt: Increasing requirements formalism using domain on- tologies for improved embedded systems develop- ment. In Design and Diagnostics of Electronic Cir- cuits & Systems (DDECS), 2011 IEEE 14th Interna- tional Symposium on, pages 271–274. IEEE, 2011. [20] Dan Klein and Christopher D. Manning. Accurate unlexicalized parsing. In Proceedings of the 41st Annual Meeting on Association for Computational Linguistics-Volume 1, pages 423–430. Association for Computational Linguistics, 2003. [21] George Miller and Christiane Fellbaum. Wordnet: An electronic lexical database, 1998. [22] Meera Sampath, Raja Sengupta, Stéphane Lafortune, Kasim Sinnamohideen, and Demosthenis Teneket- zis. Diagnosability of discrete-event systems. Au- tomatic Control, IEEE Transactions on, 40(9):1555– 1575, 1995. 158