=Paper=
{{Paper
|id=Vol-2245/mde4iot_paper_2
|storemode=property
|title=Domain Model-Based Data Stream Validation for Internet of Things Applications
|pdfUrl=https://ceur-ws.org/Vol-2245/mde4iot_paper_2.pdf
|volume=Vol-2245
|authors=Simon Pizonka,Timo Kehrer,Matthias Weidlich
|dblpUrl=https://dblp.org/rec/conf/models/PizonkaKW18
}}
==Domain Model-Based Data Stream Validation for Internet of Things Applications==
Domain Model-Based Data Stream Validation for Internet of Things Applications Simon Pizonka Timo Kehrer Matthias Weidlich Humboldt-Universität zu Berlin Humboldt-Universität zu Berlin Humboldt-Universität zu Berlin Berlin, Germany Berlin, Germany Berlin, Germany simon.pizonka@hu-berlin.de timo.kehrer@informatik.hu-berlin.de matthias.weidlich@hu-berlin.de ABSTRACT created based on historical data representing normal behavior, e.g., The Internet of Things (IoT) has become ubiquitous, connecting an as presented in [21]. ever increasing amount of devices, many of which are online 24/7 In this paper, we propose a complementary approach to data and send data continuously. The quality of these data plays a pivotal stream validation for IoT applications, in which validation rules role for many IoT applications, which demands for continuous are derived from pre-defined domain models which are being inter- monitoring and validation of streaming data in order to spot and preted in a stream processing framework at run-time. Following ba- react to potential errors. Yet, implementing such validation facilities sic principles of Model-Driven Engineering (MDE) [2], our goal is to requires a deep understanding of the processed data. IoT developers specify device properties in a high-level and platform-independent are often bothered with technical details such as the data structure fashion, while the validation itself is achieved in a fully automatic and format, which is not only tedious but also prone to errors. way without requiring the need for manual development efforts on In this paper, we advocate a model-based approach to this prob- the technical level. We present a reference implementation of our lem, deriving validation facilities from models written in the Vorto approach, referred to as VortoFlow, in which IoT device informa- modeling language, an emerging domain-specific modeling lan- tion is modeled using the Vorto DSL2 , an emerging domain-specific guage for declaratively describing basic characteristics of IoT de- modeling language for declaratively describing basic characteris- vices. We evaluate our approach and prototypical implementation tics of IoT devices, and these device models are interpreted within using the so-called Intel Lab Data as experimental subject. While the Apache Beam3 , serving as an abstraction layer over a set of widely experiment showcases the feasibility of our approach, we also iden- used stream processing frameworks. We evaluate our approach and tify limitations to be addressed in future work to fully realize our prototypical implementation using the so-called Intel Lab Data [13] vision of domain model-based data validation for IoT applications. as experimental subject. While the experiment showcases the feasibility of our approach, KEYWORDS we also identify limitations which need to be addressed in future work in order to fully realize our vision of domain model-based Model-driven Engineering, Internet of Things, sensor devices, stream data validation for IoT applications. However, we believe that the processing, data validation automated derivation of data validation facilities from domain mod- els is another consequent step in leveraging MDE principles for the 1 INTRODUCTION development of IoT applications [3, 4, 14]. The Internet of Things (IoT) has become reality and is constantly The remainder of the paper is structured as follows. Section 2 growing. There are several forecasts of how many IoT devices we introduces a running example which motivates our approach, an will have in the future. One frequently cited source is the analyst overview of which is presented in Section 3 and whose applicability company Gartner Inc., which expects around 20.4 billion IoT devices is evaluated in Section 4. Related work will be studied in Section 5, by 2020 1 . All these devices will be connected to the internet, many before we conclude and outline future work in Section 6. of them will be online 24/7 and will send continuous streams of data which need to be processed and stored. The quality of these data 2 MOTIVATING EXAMPLE plays a pivotal role for many IoT applications, which demands for With the IoT, many objects of our daily life get connected and continuous monitoring and validation of streaming data in order controllable via the internet. We would like to pick-up here a kitchen to spot and react to potential errors. blender serving as running example. Traditionally, a kitchen blender To date, data validation facilities are often implemented ad-hoc has a physical interface, comprising a rotary knob to turn on and and in a manual fashion. This is a tedious task, prone to errors, off the device and to control its speed, and additional buttons to which not only requires expert knowledge in the respective appli- enable advanced features, e.g., for crushing ice cubes or preparing cation domain, but also a deep understanding of various technical smoothies. Now, imagine there is a mobile application to monitor details, such as the format and structure of the processed data, how the kitchen blender. This application can show, e.g., whether the to plug-in the validation routines into a suitable stream processing blender is active, the rotation speed, and which of the advanced framework, etc. Approaches towards more automated data valida- features are enabled. tion solutions have started to be developed, yet, are still in their The kitchen blender periodically sends messages comprising infancy. One common idea is to detect anomalies in data streams various meta-data (e.g., a timestamp) as well as information about its by using a learning approach where a statistical reference model is 2 https://www.eclipse.org/vorto/ 1 http://www.gartner.com/newsroom/id/3598917 3 https://beam.apache.org/ MDE4IoT’18, October 2018, Copenhagen, Denmark Simon Pizonka, Timo Kehrer, and Matthias Weidlich 1 @ProcessElement 1 if(rotations < 0) { 2 public void processElement(ProcessContext c) { 2 throw new 3 String entry = ""; 3 MinConstraintViolation("Rotations < 0"); 4 try { 4 } 5 entry = c.element(); 5 else if(rotations > 12000) { 6 String [] elms = entry.split(";"); 6 throw new 7 // parse values 7 MaxConstraintViolation("Rotations > 12000"); 8 int rotations = Integer.parseInt(elms[0]); 8 } 9 long runTime = Long.parseLong(elms[1]); Listing 2: Implementation of additional check routines vali- 10 Date dateTime = dateTimeFormat.parse(elms[2]); 11 // new data structure dating the value range of the rotation property. 12 ObjectNode root = mapper.createObjectNode(); 13 root.put("rotations", rotations); 14 root.put("runtime", runTime); 15 root.put("datetime", dateTimeFmt.format(dateTime)); repetitive yet very schematic code needs to be produced in order 16 // output to implement data validation facilities such as the rather simple 17 c.output(dataOK, mapper.writeValueAsString(root)); checks used in our running example. Finally, the hand-crafted vali- 18 dation routines are highly technology-specific and cannot be easily 19 } catch (Exception e) { 20 LOG.error("Processing failed for: " + entry, e); transferred to other platforms and frameworks. 21 c.output(dataError, entry); 22 } 3 APPROACH AND PROTOTYPICAL 23 } IMPLEMENTATION Listing 1: Apache Beam user-defined function implement- In this section, we present our approach and prototypical imple- ing a message data conversion. mentation to combining a domain model with a stream process- ing system in order to validate data streams in IoT applications. current state (activity, rotation speed etc.). Messages are transmitted Specifically, as illustrated in Section 3.1, we use the Vorto DSL to in a device-specific message format. In our example, in a comma- declaratively describe the capabilities of IoT devices, which includes separated string encoding in which single values are separated by the platform-independent specification of message structures and a semicolon. Each value represents a dedicated part of the message, further data integrity constraints such as the measurement range depending on its position. A recurring problem is to convert such of a sensor device. Such a model can be used in a stream process- a native message format into some other data structure, e.g., a ing system to validate incoming data streams. To easily adapt to structured JSON object with is better suited for further processing multiple stream processing systems, we use Apache Beam as an ab- in the cloud. Here, we use Apache Beam as a software abstraction straction layer over several standard stream processing engines for layer over a concrete stream processing engine. Our exemplary that purpose. An overview of our integration with Vorto, referred data transformation of incoming messages may be plugged-in into to as VortoFlow, is presented in Section 3.2. Apache Beam by providing a so-called user-defined function, a Java implementation of which is shown in Listing 1. As we can see in 3.1 Device Information Modeling in Vorto lines 8 to 10, dedicated data values may be accessed via their fixed The Vorto project, which serves as a basis for our approach and position within the comma-separated message string, while the prototypical implementation, aims at achieving interoperability type-specific parsing of values is delegated to built-in Java functions. among IoT device manufacturers, platform providers and applica- If parsing of an input value fails, the parsing exception is caught tion developers through the generation of platform adapters (aka. and the message is marked as invalid (lines 19 to 21). Otherwise, stubs) from domain models. Therefore, Vorto provides a high-level a simple JSON object representing the message is constructed in domain-specific modeling language, the Vorto DSL, to describe lines 12 to 15. the functionality and characteristics of IoT devices in terms of so- In addition to the pure syntactic validation of the message string, called Information Models. An information model contains one or we would now like to progress towards a more semantic data val- multiple Function Blocks. These function blocks are structured into idation by incorporating domain knowledge. For example, from five Sections. The Configuration section defines read- and writable the data sheet of the kitchen blender, we know that it has has a properties to configure a device, while the Status, Fault and Events maximum rotation speed of 12.000 rounds per minute, which means sections define readable properties that define the device’s current that the value range for rotation speed is between 0 and 12.000. status, fault states, and publishable messages, respectively. Proper- Listing 2 shows the code we need to add to validate the value range ties are typed, and a type may be a primitive type or a complex type. of the rotation property. The code snippet is to be inserted after The latter can contain further complex types, primitive types and parsing the input values and before creating the JSON object. This enumerations. Finally, the Operations section defines operations additional code is required for each property to validate the value that can be invoked on the device from, e.g., external applications. range. Typically this code is handwritten. Similar checks may be Listing 3 shows a function block describing the kitchen blender added for other properties of the blender. of our running example. In the event section (lines 14 to 20), we As we can see, even for our small example, developers of IoT declaratively describe the structure of a message called speed applications are typically confronted with multiple technical details which is periodically published by the device. In contains the same such as message protocols, data formats, etc. Moreover, a lot of properties (rotations, runTime and dateTime) as used in Domain Model-Based Data Stream Validation for IoT Applications MDE4IoT’18, October 2018, Copenhagen, Denmark Model 1 namespace de.hu_berlin.blender 2 version 1.0.0 uses to describes generate code 3 displayname "Blender Function Block" capabilities, uses 4 functionblock Blender { restrictions to validate e.g. min, max Vorto Generator data 5 configuration { 6 mandatory firmwareVersion as int generates 7 } IoT Device Cloud 8 status { 9 mandatory speed as intPlatform data Sensors IoT Platform Data validation ... Adapter 10 mandatory powerOn as boolean 11 optional iceCrushActive as boolean 12 optional smoothieActive as boolean 13 } Figure 1: Using VortoFlow in an IoT pipeline. 14 events { 15 speed { Technically, the realization of VortoFlow is based on the follow- 16 mandatory rotations as int 17 mandatory runTime as long ing design decisions. First, instead of generating data validation 18 optional dateTime as dateTime components from domain models, we choose an interpretative ap- 19 } proach in which a generic data validation component interprets 20 } the domain model at run-time. This enables a flexible deployment 21 operations { 22 mandatory updateFirmware() process when the domain model changes. Second, this generic data 23 } validation component is implemented as a Java library which can 24 } be included in an Apache Beam project. The idea is that, besides Listing 3: Vorto information model describing the character- model validation, further processing steps can be included in the istics of a kitchen blender. final project. This is resource-efficient because the messages are al- ready loaded. Finally, the current processing function in VortoFlow is stateless, and thus can be included without much effort. The implementation of the generic data validation component is rather straightforward. To date, VortoFlow supports syntactical conformance checking w.r.t. the message structure defined by the domain model, and to check value ranges constrained by lower and upper bounds as the one used in our running example. Furthermore, due to the stateless functioning of VortoFlow, only a single message is processed at the same time. Please note that, as positive side- our manual implementation in Section 2. However, note that we effect of this simplicity, VortoFlow can be operated in stream and can now use the MIN and MAX constraints of the Vorto DSL to batch mode. While the classical use case is to process a stream of define the value range of the rotations property. incoming real-time data and to give instant feedback, VortoFlow also supports the validation of existing data. This can be helpful for multiple reasons. First of all, data that already exists can be validated and a Vorto model can be created afterwards. Secondly, it 3.2 Data Stream Validation through Model allows the user to re-evaluate data if the model has changed. Interpretation in Apache Beam Figure 1 illustrates how a Vorto model can be used in an IoT scenario. 4 EVALUATION Specific code generators, collectively referred to as Vorto Generator We evaluate the applicability of our approach and prototypical in Figure 1, enable the generation of platform adapters supporting implementation with respect to two research questions: communication and message exchange between components on • RQ.1 (Error Detection): Is it possible in principle to find different platforms. Receiving the measurements and readings from errors in real-world IoT streaming data using our model- sensor devices, the platform adapter is capable of transforming based validation approach? the incoming data to a format which the IoT platform can process. • RQ.2 (Scalability): Does the validation by model interpre- In our prototypical implementation, device-specific messages are tation scale up to realistic IoT applications, which process converted into a structured JSON object, which is passed to the IoT streaming data of high volume and veracity? platform running in the cloud. The platform receives the incoming data stream and forwards it for data validation which, in our case, 4.1 Experimental Subject and Setup takes place in Apache Beam on some concrete stream processing Intel Lab Data. In the Intel Berkeley Research Lab, 54 Mica2Dot engine. The validation itself is performed in a fully automated way Mote4 boards equipped with weather boards were deployed and by the Data Validation component contributed by VortoFlow. The operated from February 28 to April 5 2004, measuring the tem- validation rules which are to be executed on the data stream are perature, humidity and light through environment sensors [13]. obtained from the domain model. If the validation fails, the message The collected dataset contains several obvious errors which makes is marked and equipped with details about the validation error. 4 https://www.eol.ucar.edu/isf/facilities/isa/internal/CrossBow/DataSheets/mica2dot.pdf MDE4IoT’18, October 2018, Copenhagen, Denmark Simon Pizonka, Timo Kehrer, and Matthias Weidlich 1 functionblock MICA2DOTWeatherSensor { at 26th March 2004 00:30:05, the humidity dropped below zero, the 2 status { respective values are marked by the dotted line. These are val- 3 // yyyy-mm-dd 4 mandatory date as string ues which violate the MIN constraint of the humidity property 5 defined by our domain model. 6 // hh:mm:ss.xxxxxx On the other hand, some errors passed the validation undetected. 7 mandatory time as string For instance, when considering the graph in Figure 4 depicting 8 temperature values recorded by one of the temperature sensors, it 9 mandatory epoch as int 10 mandatory moteid as int is obvious that there is something wrong with the data. However, 11 mandatory temperature as float the exceptional increase in the temperature was not spotted as an 12 error by VortoFlow since all values are still in the valid range of 13 mandatory humidity as float [−40, 123.8] as defined by the weather board model. 14 15 mandatory light as float Nonetheless, the first example shows that the general approach 16 works and that errors can be detected in principle, which lets us 17 mandatory voltage as float formulate a positive answer for RQ.1. 18 19 } RQ.2 (Scalability). Table 1 lists the execution times of three inde- 20 } pendent runs of the pipeline shown in Figure 2 for processing the Listing 4: Information model: Mica2Dot weather board. Intel dataset. For each step, the wall-clock time is given along with the average over all runs. The wall time represents the approximate time taken from initialization to termination. There are multiple it an ideal experimental subject for our study. To validate the In- reasons why the results are varying from run to run. The read and tel dataset with VortoFlow, we developed a domain model of the write tasks require the system to access the network. Here, the weather board using the Vorto DSL, and a test program processing the dataset in Apache Beam. humidity - moteid 1 Domain Model. The domain model of the weather board is shown 50 ok in Listing 4, its properties are described in the Function Block’s sta- error tus section. Here, we used domain knowledge such as the provided 40 sensor data sheets [15] to derive the respective boundaries. For ex- ample, the temperature and humidity sensor have a measurement 30 range from -40°C to 123.8°C and 0% to 100%, respectively. % 20 Test Program. The test program comprises the processing pipeline shown in Figure 2. First, the Intel Lab dataset is loaded as a ZIP file 10 from a Google Cloud Storage Bucket5 . The file is extracted into a CSV file being processed line by line, each line represents a message 0 which is to be validated. Therefore, each line of the CSV input is transformed to a JSON object which is compatible with our Vorto 2004-03-02 2004-03-09 2004-03-16 2004-03-23 2004-03-30 domain model of the weather board. The JSON object is passed to Time the generic validation function of VortoFlow and validated w.r.t. Figure 3: Errors in humidity readings (Mote with id 1). the constraints defined by the domain model. All messages which contain an error are written to a text file on a Google Cloud Storage Bucket. The experiments are run on Google Cloud Dataflow6 and temperature - moteid 1 using the latest version of the Apache Beam SDK (2.4.0) for Java. 120 100 Load Transform Validate Write 80 Figure 2: Apache Beam pipeline for processing the Intel °C dataset used as experimental subject. 60 40 4.2 Results RQ.1 (Error Detection). On the one hand, when validating the 20 ok dataset, multiple violations of the humidity constraints were de- 2004-03-02 2004-03-09 2004-03-16 2004-03-23 2004-03-30 tected. Figure 3 shows an example of such a violation. Here, starting Time 5 https://cloud.google.com/storage/docs/json_api/v1/buckets 6 https://cloud.google.com/dataflow/ Figure 4: Temperature readings (Mote with id 1). Domain Model-Based Data Stream Validation for IoT Applications MDE4IoT’18, October 2018, Copenhagen, Denmark Read Transform Validate Write Section 5.2 gives an overview of the state-of-the-art in the field of data stream validation. Run 1 19 sec. 26 sec. 1 min. 31 sec. 12 sec. Run 2 17 sec. 24 sec. 1 min. 14 sec. 10 sec. Run 3 14 sec. 26 sec. 1 min. 22 sec. 10 sec. 5.1 Model-Driven Engineering for the IoT Both industry and academia have recognized the need for research Avg.: ~17 sec. ~25 sec. ~1 min. 22 sec. ~11 sec. on a consolidated set of best practices that will guide developers Table 1: Execution times of three independent runs of pro- through the manifold challenges of software engineering for the cessing the Intel Lab dataset with VortoFlow running on IoT [11]. Model-driven Engineering has been mentioned as one of Google Cloud Dataflow. the key paradigms that bear the potential to tackle these challenges. One of the predominant challenges addressed by adopting MDE principles are distribution and heterogeneity in the IoT. An example for this is the ThingML (Internet of Things Modeling Language) available bandwidth may vary. Furthermore, the processing of the approach [8, 14]. It supports the modeling of IoT applications from data requires memory and CPU time, which may be affected by the different viewpoints (from the architectural level to the behavior fact that the hardware is potentially shared with other users. of individual devices) through a modeling language which com- Although VortoFlow is not optimized for performance at the bines well-established visual modeling constructs (such as state moment, the experiment with the Intel dataset shows that the vali- charts and component diagrams) and an imperative yet platform- dation can be done in a reasonable time. As expected, the validation independent action language. The generation of platform-specific step needs most of the time with around 1 min 22 sec, about three code and adapters is supported through a set of readily available yet times as long as reading and writing the dataset, which we consider customizable code generators for popular programming languages to be acceptable. The dataset contains 2,313,682 elements which and open IoT platforms (e.g., Arduino, Raspberry Pi, Intel Edison). means, per run, around 28,216 messages were validated per second. As mentioned, the Vorto project follows a similar motivation and Thus, RQ.2 can be answered positively as well. goal. The Vorto DSL has been used, e.g., to specify manufacturer- independent abstraction layers describing the functions and proper- 4.3 Discussion ties of vehicles on different levels of granularity [12, 19]. We selected Using the Vorto DSL, it was possible to create a simple yet con- Vorto as a technological basis for our work since it is actively de- cise domain model for the considered domain of our experimental veloped, maintained and continuously evolved (cf. commit logs on subject. This model, in turn, could be used in VortoFlow to detect GitHub7 . Moreover, Vorto is supported as an integral part of the elements that violate the constraints defined by the domain model. Bosch IoT Suite8 and based on the widely used Eclipse Modeling9 Using a model-driven approach saved us from writing plenty of technology stack. repetitive code compared to a manual implementation of the same Besides heterogeneity and distribution, other values supported data validation facilities. by MDE principles such as separation of concerns for collaborative However, as indicated by the second example, checking the range development, automation for enabling self-adaptation at run-time, of values can be only seen as a first indication for errors. Not very or reusability of development artifacts have been addressed, e.g., surprisingly, not all the errors comprised by the Intel Lab dataset in [4]. More recently, the same group of authors has put a specific could be detected using VortoFlow. Therefore, the expressiveness of focus on the engineering of mission-critical IoT systems [3]. These the Vorto DSL needs to be extended by further kinds of constraints systems expose further challenges w.r.t. dependability requirements which then need to be checked by the generic validation compo- such as reliability, safety and security which may be tackled by nent. A starting point for inspiration are classical data description exploiting models for the sake of verification. languages. JSON-Schema, for instance, has many more features to A domain-specific MDE framework that targets IoT-based man- validate a JSON document compared to the Vorto DSL [20]. More- ufacturing systems in an Industry 4.0 context has been presented over, to address the detection of data errors, outliers and anomalies in [17]. Following other approaches to MDE in this domain (see, e.g., over time, like the exceptional increase of the temperature value the research roadmap presented in [18]), the methodology exploits shown in Figure 4, the current stateless processing of single mes- the UML profiling mechanism [9] to tailor a set of popular UML sages is no longer appropriate. diagrams towards the specific needs of manufacturing engineers. From a technical point of view, there is much room for improve- However, none of the existing approaches to leveraging MDE ment w.r.t. optimizing the performance of our prototypical imple- for the development of IoT applications exploits domain models for mentation. The internal structure is not optimized for a quick ac- the automated derivation of data stream validation facilities. cess of all property values. For example, each time a validation of a REGEX constraint is executed, the regular expression is recompiled. 5.2 Data Stream Validation A better approach would be to cache the compiled expressions. Aiming at scalability of stream validation, it has been suggested to rely on concepts of data stream processing [5]. In that case, 5 RELATED WORK languages for data stream processing enable the formalization of In this section, we review related work from two different per- 7 https://github.com/eclipse/vorto spectives. First, in Section 5.1, we will have a look at approaches 8 https://www.bosch-iot-suite.com leveraging MDE for the development of IoT applications, before 9 https://www.eclipse.org/modeling MDE4IoT’18, October 2018, Copenhagen, Denmark Simon Pizonka, Timo Kehrer, and Matthias Weidlich validation requirements using a well-defined set of streaming oper- a smart meter). Third, constraint languages commonly adopted in ators, including stateless ones such as filters and transformations, MDE, such as OCL, provide another angle to increase expressive- as well as stateful operators, e.g., to detect sequential patterns. Data ness of information models w.r.t. to validity requirements. stream management systems then enable the distributed execution of these operators in a compute cluster [6]. REFERENCES The application of these concepts has been illustrated in SVALI [1] Fabrizio Angiulli and Fabio Fassetti. 2010. Distance-based outlier queries in data streams: the novel task and algorithms. Data Min. Knowl. Discov. 20, 2 (2010), (Stream VALIdator) [21], a system that supports two data stream 290–324. validation modes: In a model-and-validate mode, users directly for- [2] Marco Brambilla, Jordi Cabot, and Manuel Wimmer. 2012. Model-driven software malize validation requirements as a function over streaming data, engineering in practice. Synthesis Lectures on Software Engineering 1, 1 (2012), 1–182. which is then continuously evaluated. In a learn-and-validate mode, [3] Federico Ciccozzi, Ivica Crnkovic, Davide Di Ruscio, Ivano Malavolta, Patrizio a statistical reference model is learned from samples of normal Pelliccione, and Romina Spalazzese. 2017. Model-driven engineering for mission- behavior, which is then used as basis for validation. Either way, critical iot systems. IEEE Software 34, 1 (2017), 46–53. [4] Federico Ciccozzi and Romina Spalazzese. 2016. MDE4IoT: supporting the in- validation requirements are defined on the technical level, not con- ternet of things with model-driven engineering. In International Symposium on nected to conceptual models of the application domain. Intelligent and Distributed Computing. Springer, 67–76. [5] Gianpaolo Cugola and Alessandro Margara. 2012. Processing flows of information: In a broader context, a plethora of techniques for the detection of From data stream to complex event processing. ACM Comput. Surv. 44, 3 (2012), anomalies in data streams has been presented in recent years. They 15:1–15:62. have in common that they assess the characteristics of a stream [6] Minos Garofalakis, Johannes Gehrke, and Rajeev Rastogi. 2016. Data Stream Management: Processing High-Speed Data Streams. Springer. to detect data that deviate significantly from expected values and, [7] Dimitrios Georgiadis, Maria Kontaki, Anastasios Gounaris, Apostolos N. Pa- hence, can be thought of as a continuous variant of traditional padopoulos, Kostas Tsichlas, and Yannis Manolopoulos. 2013. Continuous outlier outlier detection. Common techniques for anomaly detection in detection in data streams: an extensible framework and state-of-the-art algo- rithms. In Proceedings of the Intl. Conference on Management of Data. 1061–1064. data streams are distance-based [1, 7, 10]. Here, a stream element [8] Nicolas Harrand, Franck Fleurey, Brice Morin, and Knut Eilif Husa. 2016. Thingml: is considered abnormal, if it is far from a pre-defined number of a language and code generation framework for heterogeneous targets. In Proceed- ings of the ACM/IEEE 19th International Conference on Model Driven Engineering neighboring streaming elements according to some distance func- Languages and Systems. ACM, 125–135. tion. Moreover, anomaly detection may also exploit the ideas of [9] Timo Kehrer, Michaela Rindt, Pit Pietsch, and Udo Kelter. 2013. Generating density-based clustering to flag abnormal stream elements [16] or Edit Operations for Profiled UML Models. In ME@MoDELS (CEUR Workshop Proceedings), Vol. 1090. CEUR-WS.org, 30–39. be based on the angles of data elements in a high-dimensional value [10] Maria Kontaki, Anastasios Gounaris, Apostolos N. Papadopoulos, Kostas Tsich- space [22]. However, all such techniques characterize anomalies by las, and Yannis Manolopoulos. 2011. Continuous monitoring of distance-based means of a mathematical model over streaming data and are, there- outliers over data streams. In Proceedings of the 27th International Conference on Data Engineering. 135–146. fore, completely disconnected from domain models that describe [11] Xabier Larrucea, Annie Combelles, John Favaro, and Kunal Taneja. 2017. Software data sources and the context of a specific IoT application. engineering for the internet of things. IEEE Software 34, 1 (2017), 24–28. [12] Jeroen Laverman, Dennis Grewe, Olaf Weinmann, Marco Wagner, and Sebastian Schildt. 2016. Integrating Vehicular Data into Smart Home IoT Systems Using Eclipse Vorto. In IEEE 84th Vehicular Technology Conference. 1–5. 6 CONCLUSION [13] Samuel Madden et al. 2004. Intel Lab Data. http://db.csail.mit.edu/labdata/ In this paper, we demonstrated how MDE principles can be em- labdata.html [14] Brice Morin, Nicolas Harrand, and Franck Fleurey. 2017. Model-based software ployed in the development of IoT applications. Specifically, we engineering to tame the iot jungle. IEEE Software 34, 1 (2017), 30–36. focused on the question of how to validate data streams emitted [15] Sensirion Inc. 2011. Datasheet SHT1x (SHT10, SHT11, SHT15) Humidity by IoT sources through a model-driven approach. We proposed and Temperature Sensor IC. https://www.sensirion.com/fileadmin/user_ upload/customers/sensirion/Dokumente/0_Datasheets/Humidity/Sensirion_ VortoFlow, which builds upon the Vorto DSL for the specification Humidity_Sensors_SHT1x_Datasheet.pdf of IoT devices. It enables users to capture validity requirements in [16] Sharmila Subramaniam, Themis Palpanas, Dimitris Papadopoulos, Vana Kaloger- aki, and Dimitrios Gunopulos. 2006. Online Outlier Detection in Sensor Data terms of value ranges as part of an information model. This mod- Using Non-Parametric Models. In Proceedings of the 32nd International Conference els then serves as the basis for online validation of data streams: on Very Large Data Bases. 187–198. A generic data validation component, prototypically realized in [17] Kleanthis Thramboulidis and Foivos Christoulakis. 2016. UML4IoT: A UML-based approach to exploit IoT in cyber-physical manufacturing systems. Computers in Apache Beam, interprets the model at run-time and flags invalid Industry 82 (2016), 259–272. data accordingly. We demonstrated the general feasibility and ap- [18] Birgit Vogel-Heuser, Stefan Feldmann, Jens Folmer, Jan Ladiges, Alexander Fay, plicability of VortoFlow using the case of a weather board. Sascha Lity, Matthias Tichy, Matthias Kowal, Ina Schaefer, Christopher Haubeck, et al. 2015. Selected challenges of software evolution for automated production In order to fully exploit the potential of model-driven validation systems. In 13th IEEE International Conference on Industrial Informatics (INDIN). of data streams, we intend to extend VortoFlow to support the spec- IEEE, 314–321. [19] Marco Wagner, Jeroen Laverman, Dennis Grewe, and Sebastian Schildt. 2016. ification of more expressive validity requirements, along several Introducing a harmonized and generic cross-platform interface between a Vehicle dimensions. First, the temporal context of data stream elements may and the Cloud. In 17th IEEE International Symposium on A World of Wireless, be worth to consider, e.g., by validating a sliding average of data Mobile and Multimedia Networks. 1–6. [20] Austin Wright, Henry Andrews, and Geraint Luff. 2018. JSON Schema Validation: stream values over a 1 minute window. Second, information models A Vocabulary for Structural Validation of JSON. Working Draft. IETF Secretariat. are specified per device, whereas the Vorto DSL currently does https://tools.ietf.org/html/draft-handrews-json-schema-validation-01 not support the specification of relations between the models of [21] Cheng Xu, Daniel Wedlund, Martin Helgoson, and Tore Risch. 2013. Model-based validation of streaming data: (industry article). In The 7th ACM International different devices. Enabling the definition of such relations, however, Conference on Distributed Event-Based Systems. 107–114. would be useful to capture validity requirements in terms of causal [22] Hao Ye, Hiroyuki Kitagawa, and Jun Xiao. 2015. Continuous Angle-based Out- lier Detection on High-dimensional Data Streams. In Proceedings of the 19th relations of data produced by different devices (e.g., activation of International Database Engineering & Applications Symposium. 162–167. an electric device should be correlated with load measurements at