ThingML+ Augmenting Model-Driven Software Engineering for the Internet of Things with Machine Learning Armin Moin Stephan Rössler Stephan Günnemann Technical University of Munich Software AG Technical University of Munich Munich, Germany Munich, Germany Munich, Germany moin@in.tum.de stephan.roessler@softwareag.com guennemann@in.tum.de ABSTRACT the entire life-cycle from the specification and design phase to the In this paper, we present the current position of the research project implementation, deployment and maintenance phases. 1 MDE has ML-Quadrat, which aims to extend the methodology, modeling lan- already proven quite successful in some domains such as Embedded guage and tool support of ThingML - an open source modeling Systems, e.g., in the automobile industry. Hence, it sounds like the tool for IoT/CPS - to address Machine Learning needs for the IoT most natural and the most suitable approach to address the said applications. Currently, ThingML offers a modeling language and challenges in the domain of CPS/IoT. [6] tool support for modeling the components of the system, their ThingML: an open source state-of-the-art MDE solution for IoT/CPS. communication interfaces as well as their behaviors. The latter is When we talk about MDE, we actually mean Model-Driven Software done through state machines. However, we argue that in many Engineering (MDSE), particularly the Domain-Specific Modeling cases IoT/CPS services involve system components and physical (DSM) approach to that. One such state-of-the-art solution for the processes, whose behaviors are not well understood in order to be CPS/IoT domain, which is available as free open source software, is modeled using state machines. Hence, quite often a data-driven ap- ThingML 2 . ThingML is not only the name of the project, but also proach that enables inference based on the observed data, e.g., using the name of the domain-specific modeling language, the methodol- Machine Learning is preferred. To this aim, ML-Quadrat integrates ogy and the free open source tool supporting them. The (textual) the necessary Machine Learning concepts into ThingML both on model editor of ThingML supports the user (e.g., a CPS/IoT service the modeling level (syntax and semantics of the modeling language) developer) to model the distributed system using the following ele- and on the code generators level. We plan to support two target ments: (i) components (i.e., Things) with asynchronous message platforms for code generation regarding Stream Processing and passing interfaces (Ports), (ii) composite state machines aligned with Complex Event Processing, namely Apache SAMOA and Apama. the UML2 state charts for specifying the behavior of components, and (iii) an imperative action language for the event processing KEYWORDS rules. This action language is platform-independent, but includes a internet of things, thingml, model driven, big data, machine learn- template language for linking platform-specific models in an easy ing, artificial intelligence manner. Once the model is complete, code generators (also known as Model-to-Text or Model-to-Code transformations in the MDSE 1 INTRODUCTION terminology) can be employed to automatically generate the full The Internet of Things and the Role of Artificial Intelligence. Cur- implementation for specific target platforms and communication rently, we are on the edge of the next industrial revolution, where protocols that are supported by ThingML. The generated implemen- the Internet of Things (IoT) and smart Cyber-Physical Systems tation includes the source code and configuration scripts, and may (CPS) that are connected via the IoT, provide the infrastructure for also include documentation. Last but not least, the ThingML tool is that. The IoT is an expansion of the Internet into new domains, built based on the free open source Eclipse Modeling Framework where constrained embedded devices such as sensors and actuators (EMF), thus highly extensible and interoperable. [4, 5] play an important role [1]. Moreover, CPS are systems of systems, Motivation & Contribution of this Position Paper. We argue that which have both physical and virtual, i.e., digital elements. Artificial model-driven software engineering languages and tools for the IoT Intelligence (AI) is the key factor that distinguishes between the shall provide support for Machine Learning by design. Concretely, previous revolution in the industry world-wide, which led to indus- we propose a complementary view for specifying the behavior of trial automation, and the current one leading to Cognitive Systems components, i.e., things in ThingML, which is not based on state that can possess cognitive capabilities such as ‘learning’. machines in their current form, but based on inference from the Why Model-Driven Engineering for IoT/CPS?. As mentioned above, observed data. This means, we enable a data-driven approach for smart Cyber-Physical Systems (CPS) are connected through the specifying the behavior. In other words, using this complementary IoT. As these systems of systems are very large, highly distributed, view, one shall be able to model the Machine Learning algorithm at very heterogeneous and cross-domain, there is an eminent need for design-time and let the system partially or fully learn the behavior abstraction and automation to be able to specify, design, develop, based on the observed data at run-time. In Section 2, we illustrate analyze, verify and maintain them in a cost-effective manner. One our position. This comprises the comparison of models in Machine of the promising approaches that provides both abstraction and 1 Although the border between these phases might not exist anymore in its classic automation is the Model-Driven Engineering (MDE), also known as sense for modern applications. Model-Based Engineering, where models are the core elements in 2 We chose ThingML due to the prior work of our industrial partner. MDE4IoT’18, October 2018, Copenhagen, Denmark A. Moin et al. (e.g., statistical) and train that model via a ML algorithm using the data instances. Finally, we use the trained model in future, e.g., for making predictions about the possible outputs of that process. Hence, a model in ML is an abstraction / artifact that can help us in making inference based on the observed data, e.g., for mak- ing predictions. A popular example for this use case is predictive maintenance, where ML models can predict possible failures of the system in advance, e.g., based on anomaly detection. There exist various ways for categorizing ML models and algorithms. For in- stance, ML models fall from one point of view into two categories: Figure 1: A sample state machine modeling the behavior of parametric and non-parametric. In parametric models, we assume a smart Air Conditioner based on the room temperature. a specific functional form for the model (i.e., the statistical distri- bution that is assumed to be the generator of the observed data), Learning and models in Software Engineering, our core idea on where a small number of parameters control the form of the model. integrating them for the IoT applications, the advantages that we The linear regression model and the Neural Networks family are consider, and the challenges that we foresee for that. Finally, we examples of the linear and nonlinear parametric models, respec- conclude in Section 3. tively. However, in non-parametric models, the form of the model is defined by the size of the dataset. Although these models still 2 OUR POSITION contain parameters, their parameters do not affect the form of the model, but its complexity. One popular example of this category is We believe that the Model-Driven Engineering (MDE) methodolo- the Support Vector Machines (SVM) family. Unlike parametric mod- gies and tools for the Internet of Things (IoT) and smart Cyber- els, non-parametric ones usually keep part of the data instances or Physical Systems (CPS) must support the Artificial Intelligence (AI) all of them for their future use. For this reason, they are also called needs of these applications both on the modeling level and also on memory-based or instance-based. Confusingly, some sources refer the code generation level in an integrated and seamless manner. to the non-memory-based models as model-based. However, we do Hence, we propose an alternative view for modeling the behavior of not use that term. In our view, all mentioned approaches for ML components, i.e., things in ThingML. This new view enables users modeling are model-based. Last but not least, some diagrammatic of the tool to delegate the definition of the behavior of the thing (i.e, representation of probability distributions used in ML, known as system competent or IoT device) to the AI algorithms (specifically Probabilistic Graphical Models (PGM) has provided a very intuitive, ML algorithms), which are able to conduct inference based on the sound and useful way of visualization and analysis of ML models. observed data. In other words, for complex behaviors that are not [2] easily understandable and specifiable via state machines, the ML algorithms can learn the respective behaviors on their own in an SE Models. From the above explanation, it is clear that ML mod- effective and efficient manner. In addition, we provide an extension els are very much different than SE models. Popular examples of SE to the existing view for specifying behaviors using state machines models can be found in the UML (Unified Modeling Language) stan- so that advanced data analytics algorithms and methods can be dard. They can usually be categorized into structural models and employed for event detection and triggering state transitions. This behavioral models (including interactions). The models in ThingML latter contribution is inline with the contributions of the research are currently merely SE models. However, we plan to link them project HEADS funded by the European Commission (FP7), where with ML models. In the following, we explain our position in more Complex Event Processing (CEP) capabilities have been introduced detail. to the ThingML tool (also known as the HEADS IDE in the con- text of that project), e.g., using the CEP platform Apama. Figure 1 depicts the graphical representation of a state machine that is 2.2 Bringing ML Models and SE models currently used (with the existing ThingML tool) for modeling the Together behavior of a smart Air Conditioner using the data that comes It is NOT (only) about Model-based Machine Learning. In his sci- through asynchronous message passing from a temperature sensor entific article, Model-based Machine Learning [3], Christopher M. in the room. Bishop has already called for following a model-driven approach in the field of ML similar to the Model-Driven Software Engineering 2.1 Models in Machine Learning vs. Software (MDSE) paradigm. Specifically, he has proposed the concept of prob- Engineering abilistic programming and presented a Domain-Specific Modeling ML Models. Machine Learning (ML) is currently used as a promis- Language (DSML) called Infer.NET for PGMs [3]. ing data-driven approach in industry to address many complex problems. ML is one of the several fields that are widely used in It is NOT (only) about abstraction nor (merely) code generation. data analytics. 3 In data-driven approaches, we observe the data There exist already various workflow designers and frameworks instances generated by some process, then we build some model in the field of data analytics such as KNIME, RapidMiner, CamSaS 3 It is perhaps the most important one. For instance, Deep Neural Networks, a family Musketeer and TensorBoard that make conducting data analytics of ML algorithms are currently widely used in industry. tasks more efficient. They provide a higher level of abstraction and ThingML+ MDE4IoT’18, October 2018, Copenhagen, Denmark support various target data analytics and data engineering plat- require very different measures, e.g, for data preparation. Further, forms for partial or full code generation. Similarly, in the field of often one cannot simply use the ML algorithms and methods just as IoT, there exist several mashup tools that provide a higher level black-boxes. Actually, one can do that, but there will be usually no of abstraction for developing IoT services by combining data and good results. For instance, Neural Networks that are quite popular, services over the IoT. Note that many of these cloud-based tools are very sensitive to the initialization of their parameters and also to go far beyond simple mashup and also offer other cloud services the architectural decisions including but not limited to the number such as data analytics and AI capabilities. Examples include but are of hidden layers, number of units per layer, and so forth. Since we not limited to the startup waylay.io, Microsoft Azure and the IBM do not yet have any formal specification for such practices, but Watson IoT Platform. However, we propose a holistic approach instead they are conducted based on experiments by practitioners, that provides a methodology and tool support for systematic en- it will be quite challenging to abstract such problems and tasks gineering of the entire software / smart services for the IoT/CPS from the user and go for full automation. This also holds for data applications. This includes also the data analytics (specifically ML) preparation and cleansing tasks. Such challenges are studied and and IoT mashup capabilities. Our approach is based on the MDE addressed in a relatively new field of ML, known as AutoML. paradigm. Note that this systematic approach is not the case in any of the said solutions. For instance, using the existing tools, one 3 CONCLUSIONS cannot separate the business logic and the underlying technologies In this paper, we presented the current position of the research and at the same time cover both the Data Science and Engineer- project ML-Quadrat on the topic of MDE4IoT. We proposed sup- ing as well as the Software Engineering aspects of the IoT/CPS porting AI, particularly, Machine Learning, in modeling tools by applications. design on the modeling level and on the code generators level. We believe that a holistic approach and a systematic methodology that It is NOT (necessarily) about Graphical / Visual Diagrams. Note covers both the SE and the ML aspects is needed. The project ML- that when we talk about DSML and modeling in general, we do Quadrat aims to realize this vision using the ThingML tool as the not necessarily mean graphical / visual diagrams. Model instances basis. and model editors can be also textual. It is often a misunderstand- ing that modeling has necessarily something to do with graphical ACKNOWLEDGMENTS representations. However, we plan to make our DSML and model- ing tool graphical / visual, in order to make it more intuitive and This work is funded by the German Federal Ministry of Research user-friendly. and Education (BMBF) through the Software Campus initiative (project ML-Quadrat). Advantages. Beside the typical advantages of MDE, and the spe- cific advantages of the MDE for the domain of IoT/CPS, which by REFERENCES nature involves much more heterogeneity and much larger scale [1] Luigi Atzori, Antonio Iera, and Giacomo Morabito. 2010. The Internet of Things: than classic application domains such as embedded systems (see, A Survey. Comput. Netw. 54, 15 (Oct. 2010), 2787–2805. https://doi.org/10.1016/j. comnet.2010.05.010 e.g., [4, 5]), we can offer the advantage of facilitating the employ- [2] Christopher M. Bishop. 2006. Pattern Recognition and Machine Learning. Springer. ment of data analytics algorithms and methods on the modeling [3] Christopher M. Bishop. 2013. Model-based Machine Learning. Philosophical Trans- actions of the Royal Society of London A: Mathematical, Physical and Engineering level and having the source code still automatically generated out Sciences 371, 1984 (2013). of the model instances. This way, software engineers, who do not [4] Frank Fleurey and Brice Morin. 2017. ThingML: A Generative Approach to Engi- necessarily have deep knowledge and skills in the filed of Data neer Heterogeneous and Distributed Systems. In 2017 IEEE International Conference on Software Architecture Workshops (ICSAW). Science and Engineering can easily create smart services for the [5] Nicolas Harrand, Franck Fleurey, Brice Morin, and Knut Eilif Husa. 2016. ThingML: IoT/CPS without mastering the algorithms (e.g., ML algorithms) and A Language and Code Generation Framework for Heterogeneous Targets. In Pro- data analytics methods as well as the various underlying platforms ceedings of the ACM/IEEE 19th International Conference on Model Driven Engineering Languages and Systems (MODELS ’16). (e.g., Spark, Storm, Flink, Samza). This will make an important con- [6] Bernhard Schaetz. 2014. The Role of Models in Engineering of Cyber-Physical tribution to the current problem of lack of Data Scientists in the Systems – Challenges and Possibilities. In CPS20. industry world-wide. Note that the tool will provide some hints and advice during the modeling time to support the user of the tool in employing the ML algorithms and data analytics methods. Concretely, we plan to support the Apache SAMOA as well as the Apama platforms for code generation. However, our tool will be open source and fully extensible for further platforms and use cases. Challenges. Again, similar to the advantages, we do not mention the typical challenges of MDE and MDE for the IoT, but rather the specific ones for this work. Currently, we foresee the main challenge to be related to the very different natures of models in SE and ML. For instance, it sounds quite challenging to generate code out of the model instances that do not have state machines for behavior specification, but rather inference models. Moreover, ML models and algorithms themselves are also quite different and