A Pattern for Modeling Computational Observations Cogan Shimizu1 , Pascal Hitzler2 and Charles F. Vardeman II3 1 Wright State University, USA 2 Kansas State University, USA 3 University of Notre Dame, Notre Dame USA Abstract Knowledge graphs (KG) are an established method for heterogeneous data integration and have be- gun powering complex software agents. However, it is important to understand where the data in the knowledge graph originates, especially within the context of synthetic research agents and other trust- worthy AI systems. In this paper, we propose an ontology design pattern for tracking the provenance and context of computational observations, as well as a proposing a supporting, simplified conceptual framework for modeling abstract and concrete versions of the same underlying notion. 1. Introduction Knowledge graphs (KG) are an established way of integrating data from multiple, heterogeneous sources [1]. Recently, they have begun powering complex software agents. However, as the complexity – and pervasiveness – of these agents grows, it is important to understand where the data in the knowledge graph originates. This is particular important within the context of synthetic research agents and other trustworthy AI systems. When the AI agent is operating on data of specious origin, it is important to propagate this downstream through all the downstream actions. In particular – and the focus of this paper – we want to track the provenance and lineage of data produced from computational models. This requires that we understand both the models themselves and the results that they produce – which we call “computational observations,” as well as the context in which these models are executed, However, the space of computational models is quite large. Building an entire domain ontology would be very difficult and, not to mention, contentious. As such, we have opted to develop an ontology design pattern [2] that represents, in a generalized case, the interplay between computational models, executions, and observations. Such a pattern will enable downstream ontology or knowledge engineers to quickly incorporate modeling best practices into their own KG, and will enable them to easily align their KGs to other KGs that reuse the same pattern. The primary contributions of this paper are: 1. a simplified framework for conceptually modeling abstract and concrete concepts repre- senting the same underlying notion; and 2. a pattern for modeling computational observations. WOP2022: 13th Workshop on Ontology Design and Patterns, October 23-24, 2022, Hangzhou, China cogan.shimizu@wright.edu (C. Shimizu); hitzler@ksu.edu (P. Hitzler); cvardema@nd.edu (C. F. V. II) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: The schema diagram for the Data Transformation Ontology Design Pattern (ODP). Gold boxes represent concepts that are central to the pattern. Blue boxes with dashed borders are interfaces to other patterns or represent concepts with significant complexity, but outside the scope of this pattern. Purple boxes represented controlled vocabularies, which simply means they are comprised of a finite set of individuals. Black-filled arrows represent object or data properties; and open arrows represent subclass relationships. The remainder of the paper is organized as follows. In the next section, we discuss background and related work. Section 3 provides a brief discussion on a basic, conceptual pattern for modeling abstract and concrete concepts for the same notion. In Section 4 we present our pattern and provide its formalization. Finally, in Section 5, we conclude with future work. 2. Related Work There is some existing literature that focuses on how to capture the results of modeling and simulation software. As far as the authors are aware, there is not directly corresponding work for directly modeling computational observations. Below, we describe some tangential work that could be used in a limited fashion, and how it relates to our pattern described in Section 4. SOSA/SSN (The Sensors, Observations, Sampling, Actuators Ontology and Semantic Sensor Network Ontology [3]) are W3C Recommendations for modeling how observations are gener- ated from sensors. While the notion of a sensor, within these ontologies, is left intentionally ambiguous and can, indeed, be used to represent a piece of software (e.g., for simulation or fore- casting purposes), it is awkward to do so, due to requirements on phenomenon and result times of observations – and whether or not a software producing a dataset is really an observation at all. Secondly, SOSA/SSN does not provide a mechanism for modeling aspects of the software or hardware, which this pattern provides. ML-Schema [4] was proposed by the W3C Machine Learning Schema Community Group to capture the provenance of machine learning data sets, algorithms, models, software imple- mentations, and model runs for machine learning experiments. While ML-Schema captures the computational workflow associated with an ML experiment, it does not conceptualize a “model” as a surrogate for a real world phenomenon and is instead focused on capturing the type of machine learning methodology as a model. Also, because it was developed primarily as a schema for ML experiments, it lacks the generality of a ontology pattern based approach. The Data Transformation Pattern [5] describes how data is transformed via different numerical Figure 2: A visual representation of the simplified Descriptions and Situations framework. The graph- ical syntax is the same as in Figure 1. Figure 3: A visual representation of our Algorithm example. The graphical syntax is the same as in Figure 1. Note the correspondences between it and the figure above. operations (a schema diagram for the pattern is shown in Figure 1). This pattern focuses on the dataset level, rather than the specific data within them, for tracking how the datasets were transformed, what – or whom – performed those actions, and which datasets were used to generate or derive the new dataset. The Computational Observation pattern, instead, focuses on how computational models generate specific data (e.g., computational observations). The Computational Environment pattern [6] is used to model the hardware and software configurations where a particular piece of code may be implemented and executed. Its focus is not on the metadata of the results of that execution, but instead on providing a human and machine-interpretable way of exchanging (arbitrary) configuration information. Indeed, we reference this concept in our own pattern. 3. Modeling Abstract and Concrete Concepts It is frequently desirable to model a particular concept and its abstraction. That is, the difference between an algorithm and the execution of that algorithm. The Descriptions and Situations (DnS) [7] framework is one way to accomplish this. However, DnS can be quite challenging to approach, especially for those without significant ontology engineering experience or foundations in logics. It furthermore leverages DOLCE [8], a foundational ontology, which may not be compatible with project needs. The following is an alternative, simplified framework by which one can conceptually reason about the description and situation dynamic. Figure 2 shows a schema diagram for a simplified conceptualization of the Description and Situation dynamic. In summary, a Description exists outside of time – it is the abstraction of a notion; a Situation is an instantiation of that Description that is anchored in space and time. For example, consider the notion of an algorithm. Loosely speaking, an algorithm is a set of steps that operate on an input to produce an output, for example: long division or, less trivially, normalizing a series of data. Such algorithms, however, exist in the abstract; the executions of them, do not. They are run in particular computational environments, on some hardware, and according to some software implementation. Consider the graphical representation in Figure 3: the Algorithm corresponds to the Description; the Execution corresponds to the Situation; and we split the spatial and temporal aspects of a SpatiotemporalExtent to produce a TemporalExtent and a ComputationalEnvironment. In a more complex scenario, one might also consider an algorithm to have an input or parameter space, where the Execution would have specific inputs and parameters. 3.1. Formalization of the Simplified Framework We provide a brief formalization of this framework. Description is the abstract representation of a particular concept or notion. This could, for example, be an algorithm or a recipe. Intuitively, the Description is a template. ⊤ ⊑ hasDescription.Description (1) Situation is the concrete instantiation of a particular Description. This could, for example, be the execution of an algorithm or the act of following a recipe. Intuitively, the Situation is a template that has been “filled out.” Situation ⊑ ∀ occursOver.SpatiotemporalExtent (2) Situation ⊑ =1 occursOver.SpatiotemporalExtent (3) Situation ⊑ =1 hasDescription.Description (4) 4. A Pattern for Computational Observations The Computational Observation ODP is driven by the interplay between the three core con- cepts: ComputationalModel, ComputationalModelExecution (CME), and ComputationalOb- servation. The first two correspond, respectively, to the Description and Situation from Figure 2. That is, a ComputationalModel is an abstract representation of the space of all computational models, and the CME is a concrete instance from that space (e.g., parameters and inputs have been selected). The CME is then comprised of a set of generated outputs: the Computation- alObservations. The rest of the pattern is relevant metadata important to describing the entire process or pipeline. ComputationalModels are implemented into a codebase (Implementation) and are compiled into executables (Executable). An execution of the ComputationalModel requires that we capture the temporal extent (i.e., when it was executed) and the computational environment (i.e., where it was executed). Below, we provide further descriptions of each concept and its relevant axioms (listed alphabetically). ComputationalEnvironment is the encapsulation of hardware and software configurations of the computer used to execute the ComputationalModelExecution. This interface may be satisfied by the ODP of the same name, as in [6]. ComputationalModel (CM) is an abstract notion intended to connect the conceptual, math- ematical and algorithmic models as surrogates for real world phenomena that is the intended Figure 4: The schema diagram for the Computational Observation ODP. It uses the same graphical syntax as in Figure 1. target of an observation. That is, we use it to represent the space of CMs. Every CM is a Resource, which allows us to chain together CM input and outputs. ComputationalModel ⊑ Resource (5) ComputationalModel ⊑ ≥0 utilizes.Resource (6) ComputationalModelExecution (CME) is the concrete notion of a ComputationalModel. That is, it corresponds to the Situation in Figure 2. We directly link the CME to the Computa- tionalModel via an exact cardinality restriction on the executesComputationalModel property. This differs from the notion of an Executable (below) in that the Executable is the artifact that can be executed many times, and the CME is the actual act of execution, and it thus exactly cor- responds to that Executable. Furthermore, it will always have exactly one TemporalExtent, and at least one ComputationalEnvironment.1 ParameterInstantiations are sometimes necessary when complex CMEs generate their own parameters as part of the execution (more details are provided below). Finally, CMEs generate ComputationalObservations, and they will always generate at least one. CME ⊑ ∀ generatesOutput.ComputationalObservation (7) CME ⊑ ≥0 generatesOutput.ComputationalObservation (8) ⊤ ⊑ ∀ hasTemporalExtent.TemporalExtent (9) 1 We leave this only as an existential restriction as it is up to the user of this pattern – and the conceptualization of the computational environment – whether or not distributed computing scenarios count as multiple computational environments. CME ⊑ =1 hasTemporalExtent.TemporalExtent (10) ⊤ ⊑ ∀ executedIn.ComputationalEnvironment (11) CME ⊑ ∃ executedIn.ComputationalEnvironment (12) ⊤ ⊑ ∀ hasParameterInstantiation.ParameterInstantiation (13) − ParameterInstantiation ⊑ ∃ hasParameterInstantiation .CME (14) CME ⊑ ≥0 hasParameterInstantiation.ParameterInstantiation (15) ⊤ ⊑ ∀ isExecutionOf.Executable (16) CME ⊑ =1 isExecutionOf.Executable (17) CME ⊑ =1 executesComputationalModel.ComputationalModel (18) ComputationalObservation (CO) is the core concept of this pattern. Notably, we do not mandate that the CO is an EntityWithProvenance, as it is already directly modeled via the generatesOutput property. Indeed, we specify that it is inverse existential to state that any CO must have been generated by a CME. We also do not specify any particular way to formulate what the value of the CO is, as we want the pattern to be sufficiently generalized. Finally, we realize that informedBy is a very informal term – we anticipate that the exact label for the relationship be adapted to a particular use-case; we simply want to use a placeholder to indicate that a relationship exists here. ComputationalObservation ⊑ =1 informedBy.ComputationalModel (19) − ComputationalObservation ⊑ ∃ generatesOutput .CME (20) EntityWithProvenance is eponymous; it indicates that the entity in question has desirable metadata, such as who generated the entity, when it may have been generated, and what was used to derive the entity. We recommend using PROV-O [9] to satisfy this interface. Executable is the artifact that can be executed, and is generally produced by a compiler. As such, we indicate that it is an EntityWithProvenance so that this may be captured. We also state that an Executable can only be compiled from exactly one Implementation. Executable ⊑ EntityWithProvenance (21) ∃ isCompiledFrom.Implementation ⊑ Executable (22) Executable ⊑ ∀ isCompiledFrom.Implementation (23) Executable ⊑ =1 isCompiledFrom.Implementation (24) Implementation is the formulation of the computational model into some codebase (e.g., the hosting repository). In our axiomatization, we state that the Implementation is an EntityWith- Provenance, allowing us to model, for example, contributors. Furthermore, we provide scoped domain and range restrictions for the implements property, as well as state that an Imple- mentation implements ComputationalModels. This axiom can be changed depending on the requirements – it is foreseeable that one to many ComputationalModels can be implemented in the same repository (existentiality), or only one (exact cardinality restriction of 1). Implementation ⊑ EntityWithProvenance (25) ∃ implements.ComputationalModel ⊑ Implementation (26) Implementation ⊑ ∀ implements.ComputationalModel (27) Implementation ⊑ =1 implements.ComputationalModel (28) ParameterInstantiation is the set of parameters the defines the, typically mathematical, model space for an execution of a computational model. For instance, in Newton’s Law of Cooling: 𝑄 ̇ = ℎ * 𝐴(𝑇 (𝑡) − 𝑇𝑒𝑛𝑣 ), the heat transfer coefficient ℎ, objects surface area 𝐴, and the temperature of an objects surrounding environment 𝑇𝑒𝑛𝑣 would provide part of a ParameterInstantiation needed to understand the results of the execution of a computational model. Additional details such as time step used in solving the mathematical model would also be part of the ParameterInstantiation. Resource is an arbitrary piece of data that the computational model will require in order to run. These might be data sources or parameters. Note that a ComputationalModel is also a Resource. This allows us to model how ComputationalModels might feed into each other. TemporalExtent is a straightforward concept, representing the length of time that the Com- putationalModelExecution would have run. It is, however, left as an interface in the pattern, as we do not want to mandate a particular conceptualization of time. This interface could, for example, be satisfied using xs:duration or the Time ontology [10] for more complex modeling, depending on the needs of the use-case. 5. Conclusion Modeling the process or pipeline by which computational observations are generated (i.e., the results from executed numerical models or simulations) is an important aspect of trustworthy data. For example, the adage “Garbage In, Garbage Out” comes to mind. How can we trust operations on unknown data that is untrustworthy? The Computational Observation ODP is an attempt to fill this space. By describing the entire generation pipeline, from the particular resources used to implement a computational model, to the hardware, software, and param- eter configurations used in the execution of said computational model we can more readily understand how certain results were achieved in modeling and simulation scenarios. We furthermore presented a simplified, conceptual framework for discussing abstract and concrete notions of the same underlying concept. That is, the difference between, for example, an algorithm and its execution; or a recipe and the act of cooking. In this way, a computational model and its execution must be similarly modeled. We have identified the following items as next steps in our work. 1. We intend to instantiate and connect it to instantiations of both the Data Transformation and Computational Environment Patterns, to create a Modeling & Simulation Ontology. 2. The Computational Observation pattern will be added to the next version of MODL (The modular ontology design pattern library – [11]). 3. Create a cross-walk from the Computational Observation pattern to the CodeMeta Project [12]. Acknowledgement. Authors Shimizu and Hitzler wish to acknowledge funding for this work under the National Science Foundation Grant No. 2033521: “KnowWhereGraph: Enriching and Linking Cross-Domain Knowledge Graphs using Spatially-Explicit AI Technologies”. Author Vardeman wishes to acknowledge funding for this work under the National Science Foundation Grant No. PHY-1247316: “DASPOS: Data and Software Preservation for Open Science” and Grant No. 2127548 “CI-Compass”. Any opinions expressed in this material are those of the authors and do not necessarily reflect the views of the National Science Foundation. The authors would also like to acknowledge valuable discussion with David Carral, Gary Berg-Cross, and Michelle Cheatham in the early stages of the pattern’s development. References [1] C. Shimizu, K. Hammar, P. Hitzler, Modular ontology modeling, Semantic Web (2022). In press. [2] A. Gangemi, V. Presutti, Ontology design patterns, in: S. Staab, R. Studer (Eds.), Handbook on Ontologies, International Handbooks on Information Systems, Springer, 2009, pp. 221–243. URL: https://doi.org/10.1007/978-3-540-92673-3_10. doi:10.1007/ 978-3-540-92673-3\_10. [3] K. Janowicz, A. Haller, S. Cox, M. Lefrançois, D. L. Phuoc, K. Taylor, Semantic Sensor Net- work Ontology, W3C Recommendation, W3C, 2017. Https://www.w3.org/TR/2017/REC- vocab-ssn-20171019/. [4] G. Correa Publio, D. Esteves, A. Ławrynowicz, P. Panov, L. Soldatova, T. Soru, J. Vanschoren, H. Zafar, ML-Schema: Exposing the semantics of machine learning with schemas and ontologies, arXiv e-prints (2018) arXiv:1807.05351. [5] C. Shimizu, R. M. McGranaghan, A. Eberhart, A. C. Kellerman, Towards a modular ontology for space weather research, in: E. Blomqvist, T. Hahmann, K. Hammar, P. Hitzler, R. Hoekstra, R. Mutharaju, M. Poveda-Villalón, C. Shimizu, M. G. Skjæveland, M. Solanki, V. Svátek, L. Zhou (Eds.), Advances in Pattern-Based Ontology Engineering, extended versions of the papers published at the Workshop on Ontology Design and Patterns (WOP), volume 51 of Studies on the Semantic Web, IOS Press, 2021, pp. 299–311. URL: https://doi.org/10.3233/SSW210021. doi:10.3233/SSW210021. [6] D. Huo, J. Nabrzyski, C. F. V. II, An ontology design pattern towards preservation of computational experiments, in: C. Keßler, J. Zhao, M. van Erp, T. Kauppinen, J. van Ossenbruggen, W. R. van Hage (Eds.), Proceedings of the 5th Workshop on Linked Science 2015 - Best Practices and the Road Ahead (LISC 2015) co-located with 14th International Semantic Web Conference (ISWC 2015), Bethlehem, Pennsylvania, USA, October 12, 2015, volume 1572 of CEUR Workshop Proceedings, CEUR-WS.org, 2015, pp. 15–18. URL: http: //ceur-ws.org/Vol-1572/paper3.pdf. [7] A. Gangemi, P. Mika, Understanding the semantic web through descriptions and situations, in: R. Meersman, Z. Tari, D. C. Schmidt (Eds.), On The Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE - OTM Confederated International Conferences, CoopIS, DOA, and ODBASE 2003, Catania, Sicily, Italy, November 3-7, 2003, volume 2888 of Lecture Notes in Computer Science, Springer, 2003, pp. 689–706. URL: https://doi.org/10. 1007/978-3-540-39964-3_44. doi:10.1007/978-3-540-39964-3\_44. [8] A. Gangemi, N. Guarino, C. Masolo, A. Oltramari, L. Schneider, Sweetening ontolo- gies with DOLCE, in: A. Gómez-Pérez, V. R. Benjamins (Eds.), Knowledge Engi- neering and Knowledge Management. Ontologies and the Semantic Web, 13th Inter- national Conference, EKAW 2002, Siguenza, Spain, October 1-4, 2002, Proceedings, volume 2473 of Lecture Notes in Computer Science, Springer, 2002, pp. 166–181. URL: https://doi.org/10.1007/3-540-45810-7_18. doi:10.1007/3-540-45810-7\_18. [9] S. Sahoo, D. McGuinness, T. Lebo, PROV-O: The PROV Ontology, W3C Recommendation, W3C, 2013. Http://www.w3.org/TR/2013/REC-prov-o-20130430/. [10] C. Little, S. Cox, Time Ontology in OWL, W3C Recommendation, W3C, 2017. Https://www.w3.org/TR/2017/REC-owl-time-20171019/. [11] C. Shimizu, Q. Hirt, P. Hitzler, MODL: A modular ontology design library, in: K. Janowicz, A. A. Krisnadhi, M. Poveda-Villalón, K. Hammar, C. Shimizu (Eds.), Proceedings of the 10th Workshop on Ontology Design and Patterns (WOP 2019) co-located with 18th International Semantic Web Conference (ISWC 2019), Auckland, New Zealand, October 27, 2019, volume 2459 of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 47–58. URL: http://ceur-ws. org/Vol-2459/paper4.pdf. [12] codemeta, The CodeMeta Project, https://codemeta.github.io/, ????