Datish: A universal conceptual modeling language to model anything by anyone Roman Lukyanenko1, Binny M. Samuel2, Jeffrey Parsons3, Veda C. Storey4 and Oscar Pastor5 1 University of Virginia, 140 Hospital Dr, Charlottesville, 22903, United States 2 University of Cincinnati, 2920 Woodside Drive, Cincinnati, OH 45219, United States 3 Memorial University of Newfoundland, P.O. Box 4200, 230 Elizabeth Avenue St. John's, NL A1C 5S7 Canada 4 Georgia State University, P.O. Box 3965. Atlanta, GA 30302 Country, United States 5 Universidad Politécnica de Valencia, Camí de Vera, s/n, 46022 València, Spain Abstract This paper presents the architecture of a universal conceptual data modeling language, Datish, with the aim of enabling anyone to model anything. Although there are many conceptual modeling languages, there was no language that could model a wide range of domains and at the same time be used by diverse audiences, including the general public. In this paper, we present the motivation, theoretical foundations, and the architecture of such language, Datish. We then illustrate its use in a real-world scenario. We conclude by discussing promising avenues for future conceptual modeling research using Datish. Keywords Conceptual modeling, Datish, universal conceptual modeling, lightweight conceptual modeling1 1. Introduction Conceptual models are formal or semi-formal representations that support information technology (IT) development and use. The conceptual modeling community commonly assumes that “there is no universal [conceptual modeling] approach and no universal language” because of the wide variation of modeling needs [1, p. 2]. The broadening of modeling to wider audiences and increased need to simultaneously model different systems and domains, motivates us to rethink this assumption. Recently universal conceptual modeling – inclusive modeling of anything by anyone - has attracted increased attention. Notably, principles of universal conceptual modeling have been proposed [2] and a vision for inclusive conceptual modeling has been formulated [3]. Consistent with these ideas, this paper (1) considers the opportunities and challenges of a universal conceptual modeling language, and (2) proposes the architecture of a new conceptual data modeling language Datish (as in English or Spanish). The explicit aim of Datish is to be usable in any situation and by anyone. A conceptual data model (data model for short), as a type of conceptual model, is a representation of form and structure of a domain to facilitate data collection, storage, retrieval, and interpretation [4], [5]. Popular graphical data models, such as ER diagrams or UML class diagrams, are particularly valuable for database design and for understanding the relevant things in a domain [6]–[8]. Despite the many data modeling languages created since the 1970s, none are at the same time general-purpose and usable by anyone. All have limitations on the types of rules they can model and/or target more seasoned modelers. With Datish, we explicitly seek to attain both. Datish addresses several challenges noted by researchers and practitioners [2], [6], [8]–[10]. There is a growing need for conceptual models of a variety of systems by an ever-expanding ER2023: Companion Proceedings of the 42nd International Conference on Conceptual Modeling: ER Forum, 7th SCME, Project Exhibitions, Posters and Demos, and Doctoral Consortium, November 06-09, 2023, Lisbon, Portugal romanl@virginia.edu (R. Lukyanenko) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings cohort of modelers (e.g., end users, novices, users with disabilities). The new Datish language can also offer additional support in more novel systems development scenarios cases (e.g., when building social media or AI systems or selecting a NoSQL database technology). Having a common language should promote communication and mutual understanding among different stakeholders and facilitate an integrated use of IT and data. The aim of Datish is to be accessible to as many users as possible, while permitting modeling of any domain (albeit at a medium-to-high level of abstraction). Datish can be used on its own or with existing (more specialized) conceptual modeling languages when more nuanced rules are needed. Datish can also be extended to permit more granular modeling for specialized uses. The remainder of this paper provides our motivation for a universal conceptual modeling language generally, presents the architecture of Datish, and illustrates its use. 2. Background: Motivation and review of relevant work There is a wide array of benefits of universal conceptual modeling, given the broad range of possible applications and users [2], [11]. We briefly summarize the key arguments that are relevant for our aims (for details, see [2], [3], [11]). Modeling for everyone. An important societal trend is involvement of ordinary people, such as non-IT employees or members of the public, in data modeling tasks. Research following this trend suggests traditional modeling approaches struggle to support non-IT experts [9]. Existing languages are quite complex and often difficult to comprehend [12], [13]. It would be useful to filter out non-core elements in a complex language, as the presence of more advanced features (e.g., participation cardinality) has been shown to impede domain understanding, especially by modeling novices [9]. Modeling emerging systems and domains. Practitioners struggle to use conceptual modeling to support emerging applications such as artificial intelligence applications, social media, NoSQL, and data lakes [14]–[16]. Many popular approaches to development, such as Agile or DevOps, routinely forgo formal specification [17]. Informal and lightweight modeling can better support these practices [18]. Broader systems and data integration. Many modern technologies are used together and integrated (e.g., AI-based ad recommendations on a social media platform powered by a highly scalable distributed database storage and computational technology). A language capable of modeling different domains at the same time could be especially valuable for systems that integrate data across an organization. These challenges can be potentially addressed with a more flexible and universally accessible conceptual modeling language. We believe such universal modeling language is feasible for several reasons. First, similar precedents for a universal language exist in the field of computing, such as binary and assembly code. In databases, SQL has been a universal language for querying databases that has remained dominant even after the emergence of NoSQL. Second, a modeling language manifests principles of communication, language, and design [2]. There is considerable commonality in how humans represent reality, thus making it possible to model diverse, complex, and emerging systems and domains for various people. A universal language model should enable anyone to model anything. Existing approaches to conceptual modeling struggle to satisfy these requirements at the same time. Several established conceptual data modeling languages are used for a wide range of applications, e.g., the ER model and UML class diagrams are de facto standards for relational database design and software engineering, respectively. Extensions of these languages make them more expressive (e.g., extended ER model, [19]) and their semantics more precise (e.g., OntoUML, [20]) and accessible (e.g., ConML, [21]). Building on these, other wide-applicability efforts include enterprise modeling languages, such as Domain Modelling [22], ArchiMate [23] or SysML [24]. Despite their wide applicability, established languages do not meet our requirements for a universal language. First, they have known limitations for some modeling tasks. Notably, a common feature of these languages is requiring or encouraging objects to be members of predefined entity types or classes (i.e., inherent classification [25]). This makes it especially challenging to use these languages to support evolving requirements and heterogeneous data in artificial intelligence, analytics, social media, and NoSQL contexts [15], [26]–[28]. Second, these languages are geared toward a technical audience (cf. ConML, [21]), and not intended for complete novices and members of the public. Some even presuppose highly advanced technical skills (e.g., [22]). Similarly, most of these languages are quite complex, making their learning and use curve steep for non-expert modelers. Flexible language efforts include frameworks, such as RDF, Petri nets, graph models and their extensions (e.g., HERAKLIT [29]). While applicable to many applications, these approaches do not consider broad audiences and cater to seasoned developers. Also, it is questionable whether the extreme flexibility of, for example, RDF results in excessive construct overload (e.g., when the same element can be used for modeling individuals and categories). Efforts to make conceptual modeling more accessible to less technical users include ConML [21] and FlexiSketch [30], which enable modeling to have new uses and users. However, they have notable restrictions: the discussed inherent classification assumption of ConML and the visual forms focus of FlexiSketch. They also cater to modelers with some experience. For example, the “target users of FlexiSketch are: (i) software or systems engineers who create sketches during a system development project and (ii) requirements engineers (business analysts) who use sketches to create and communicate requirements” [30, p. 1513]. 3. Datish grammar: Constructs and the rule We develop Datish as an explicitly general-purpose language with the specific aim to be usable for anyone, including domain novices and experts. It is medium-agnostic and thus, can be realized in many forms (e.g., visual, sound, text), hence, better supporting users with sensory impairments (e.g., blind). To achieve these properties of the language, we implement the Principles of Universal Conceptual Modeling (UCM) [2]. The five principles of flexibility, accessibility, minimalism, primitivism, and modularity provide the foundation for the development of the Datish, and our strategy is to consider all principles together. Based on the principles of modularity, minimalism, and primitivism, Datish has Formative and Structural Modules. The Formative Module shapes the form of the language and is comprised of two constructs - object and description - and a single rule. This is the simplest possible architecture for a language, and a strong adherence to the minimalism principle: any grammar being a conceptual system must have at least two logically connected components [31]. While it is possible to have valid Datish models using only the constructs from the Formative Module, for practical purposes, constructs from the Structural Module will likely be necessary. These constructs include individuals, categories, and relationships. Construct 1: Object. The core construct of Datish is object. This choice was the outcome of a broad search across diverse interdisciplinary literature and simultaneous consideration of different UCM principles. General ontology is a branch of philosophy that studies what exists in reality. Ontological theories have generally pursued two major approaches: substance and process. In substance ontology, the principal unit of reality is some atomic unit or substance [32], [33]. In process ontologies, events are the fundamental elements of reality, either on par with substances or superseding substances [34]–[36]. By far the most common philosophical approach is the ontology of substance [34], which holds that the basic element of being is what is referred to as entity, object, thing, or substance. Some consider these terms synonymous. Halpin [37, p. 4], for example, defines object as “any individual thing of interest.” Others suggest there are nuanced differences; for example, Bunge [38, p. 294] defines thing as “an object other than construct [i.e., product of thought, such as idea or concept].” Conceptual modeling widely uses object as a key modeling construct. Object has been a core construct in conceptual modeling languages (e.g., UML, ORM and OASIS) [37], [39], [40]. The concept is important in systems development more broadly (as in object-oriented analysis and object-oriented programming). Object is a foundational notion in a number of ontologies, such as UFO [41], [42], BFO [43], Object-oriented ontology [44], systems ontology [45], [46]. We now synthesize the substance view of object with the process philosophy. Although Datish is a modeling language for structure and form (rather than dynamics and flow), support for process modeling is desirable, as processes are intrinsically intertwined with substances. This is guided by the flexibility, accessibility and ubiquity principles. Process is a common way to model reality. The more Datish can support this modeling, the more flexibility and universality is attained. An approach we take follows two philosophical positions. First, there are philosophies that claim that an absolute philosophical primitive is necessary to have a unified view of reality. This motivation is especially consistent with the requirements for Datish. An embodiment of this approach is a recent revision of Bunge’s ontology [46]. In his later writings, Bunge proposed an “absolute primitive” notion of “object,” Namely, “[w]hatever can exist, be thought about, talked about, or acted upon” [38, p. 199]. Second, while Bunge does not address the relationship between process and substance in the unified notion of object, several promising attempts have been made to show how, by modeling objects, one can also capture domain dynamics in a more process-faithful manner. The Unified Foundational Ontology (UFO), for example, proposes to consider events to be entities, such as “act of music composition”, “marriage event” [47], [48]. This view is also consistent with Object- oriented ontology [44], which considers events such as “fight” or “leaves turning green” as objects. Hence, object-events can have properties [44]. This position is compatible with the modeling practice of reified relationships [48], [49] and some graph databases (e.g., Neo4J). Based on the principles of flexibility and commonality, Datish objects can be used to represent events. Common among both process and substance ontologies is the belief that things, objects, or entities can be uniquely identified; that is, being recognized as unique and different from other objects. No two objects are the same [33]. By synthesizing the above perspectives on the nature of objects we provide the following definition: Definition 1 - Object: An object is anything that can exist, be thought about, talked about, or acted upon, and be distinguished from other objects. This general approach permits an object to be anything- a concrete entity, such as a chair, an idea of a perfect capitalist market, a process of creating a work of art, an association between two people, as well as a group of celestial bodies under the label “planet.” Furthermore, in different domains, especially in science, the ambiguity of the object notion is desirable. For example, in biology, it is not clear whether some objects (e.g., species) are individuals or categories. Hence, object is a suitable foundational construct for the language like Datish. Construct 2: Description. To be useful, an object in a model needs to carry some information. This is needed for the user of the model to understand what object is being represented and from what perspective. An object, therefore, needs to be expressed with some additional information. We call this a description. Definition 2 – Description: Description is a statement that communicates some relevant aspect(s) of an object by any linguistic or extra-linguistic means. A description does not require any specific form (since Datish is agnostic of the medium). Instead, we suggest for Datish to have patterns of object descriptions. These patterns are templates that optionally “specialize” the description construct of Datish based on modeling needs. Below, we suggest several such patterns. Identifier. As Definition 1 suggests, identity is an essential element of the object construct. Identity can be realized as unique identifiers, which can be local or global in scope [50]. These can be names, numbers, or textual descriptions. They can also take more advanced forms, such as hashes, QR-codes, URLs, RFIDs, or digital signatures. Note, by itself, an identifier only permits distinguishing among objects. Ideally, an identifier should point to additional description (e.g., a URL to the contents of a webpage). Attributes. Objects are commonly understood as bundles or collections of properties or attributes. For example, attributes are often understood as characteristics or features of an object used to identify or categorize it (e.g., color, shape, and size) [51]. Commonly, objects can be described using a list of their applicable attributes. Text. Objects can be described using more complex linguistic structures. These permit the depiction of complex relationships among the attributes. For example, “the underdog election winner (the object) challenged all the norms of political behavior” is a complex association between the attributes of the object and the beliefs and norms set in the broader community. To understand an object, it is sometimes necessary to position it within a broader context, so even richer descriptions are necessary. For example, sikuaq is a particular type of thin ice in the Inuktitut language. To understand its nature, a richer textual description of the culture of the Inuit in Canada is necessary. Importantly, it would be challenging to convey the richness, subtlety, and nuances of some objects by using a list of attributes alone. Multimedia. Objects can be described by unwritten means. These include images, videos, sounds, smells, and other sources of sensory experience. Indeed, research in conceptual modeling has already begun considering multimedia forms, such as symbols, pictures and even virtual reality [52], [53]. Additional patterns of the description may exist. For example, an extension to Datish could use attribute templates (e.g., type-date, number, text, possible default value, nulls allowance, changeable aspect, constant or variable, derived nature) for an application where the types and properties of attributes are important. In sum, to describe objects flexibly and inclusively, the conventional approach of using a list of attributes is not suitable for all possible situations. Hence, we adopt a novel use of description in conceptual modeling to include alternative forms a user may wish to use. The objective of the description is to convey something important about the particular object, especially if it enables distinguishing between one object and another. Finally, as an object by itself does not convey meaning, we introduce the only rule of Datish: Datish Rule. Object description: All objects must have a description. Taken together, the two constructs and the rule described above constitute the minimal elements of Datish grammar. It is conceivable to create Datish models consisting only of objects along with their descriptions. Such minimalism may be useful in cases where very little is known about the nature of these objects, or when showing their relationships may not be necessary. A given object may have multiple descriptions (which is especially valuable if these descriptions are coming from multiple users or are taken from multiple perspectives). A description may not be an accurate, complete, or true way of communicating information about an object. For this reason, objects cannot be reduced to their descriptions. Descriptions are mental constructs (in themselves, mental objects) created by modelers to communicate something of value about other objects of interest. Finally, a description of an object in one model can be modeled as an object in another model (which also implies it will need its own description). This can be particularly useful for capturing the provenance or metadata about the original description of the object. We assume every representation in Datish represents either an object or a description of an object. As data modeling deals with the form and structure of the domain, Datish provides additional constructs for structuring objects. Specifically, all additional constructs in Datish are types of objects (which also implies they must abide by the Rule; they need to have a description). Construct 3: Individual. A fundamental approach for capturing the structure of a domain is in terms of individual or groups of individuals. Most existing conceptual modeling languages draw a distinction between particulars and universals, also known as individuals and categories, or instances and classes, respectively. At the same time, conceptual modeling traditionally focused on representing universals or categories, under the “assumption of inherent classification” [25], [54]. The traditional approach to modeling underrepresents the essential role of individuals in reality and for modeling reality [15], [27]. In many philosophies, the world is made of unique individuals, or things (atomic or complex). For example, according to Bunge things (e.g., specific planets, birds, trees, atoms) are the primary constituents of reality [33]. In contrast, some categories are secondary, in that humans use the categories to group existing things with common attributes (i.e., category of planet, bird, tree, atom). Individuals matter even in cases where categories may come before an instance. Hence when dealing with social reality where categories are typically created before individual objects, once instances of the social categories are created they can take on their own existence [55]. For example, in the late 1970s the conceptual modeling community decided to create a new category: “Conference on Conceptual Modeling” [56]. Initially, it was merely a mental idea that lacked specific members. Yet, since 1979, the conferences, which became known as ER, have been held annually [56]. Each such conference is itself an object, with unique properties (descriptions), somewhat different from those of other conferences. Reasoning with individual objects is important for human cognition. While people experience continuous sensory input (e.g., light falling on retina, sound waves), they invariably transform their sense data into distinct mental objects [57]. Instances are important for everyday naïve thinking about reality. In day-to-day life, the level that is naturally accessible to humans is that of “middle-sized” objects - those that can be “picked out using unaided human sensory capacities” [58, p. 1], such as trees, animals, or rocks [10], [15]. Leveraging these ontological and cognitive benefits, representation of individuals is widespread in mathematics, logic, and computing. In logic, including predicate calculus and extensional logic, individuals can be directly modeled [59], [60]. Datish has a direct support for representing individuals, irrespective of whether they are members of predefined categories. Definition 3 - Individual: Any specific, singular object, abstract or concrete. Construct 4: Category. Categories (or concepts) organize individual objects into groups based on similarity of their features or common patterns of use. In the ontological view that the world is made of substantial (concrete, material, physical) individuals, a category is a non-essential and observer-dependent construct [33]. However, conceptual modeling is a social activity, performed mainly by human beings [61]. Humans create categories to group concrete objects in useful ways. In many scenarios, humans think more in terms of categories than individual objects – “humans are compulsive classifiers” [62], [63]. Classification is central to human perception, memory, reasoning, problem-solving and communication [64], [65]. Furthermore, in social domains, categories are typically created before any individual members of the categories exist [50], [55], [61], [66]. Categorization is a vital cognitive mechanism to manage the infinite diversity of stimuli in the real world. Categories capture the similarity among the objects, thereby allowing these objects to be conceptualized and treated in a similar manner. For example, it may be more efficient to refer to all objects having certain common attributes as “birds”, thereby eliminating redundant descriptions of their shared attributes. The use of categories also permits “completely” representing domains [15], in that they capture generalizations and abstractions over the “infinitely” diverse individuals. Having a “complete” specification is an important outcome of conceptual modeling [67], [68]. Similarly, categories can group not only individual objects, but other categories, forming hierarchies (e.g., robins, birds, animals). It is impossible to convey domain boundaries and attain the desired level of domain completeness and structuring with individual objects alone. We, thus, propose category as a Datish construct: Definition 4 - Category: A collection of individual objects or other categories having common characteristics. Construct 5: Relationship. The principle of minimalism dictates having only essential constructs in Datish. However, a modeler might need to create relationships between multiple individual objects and categories. Furthermore, there can be more than one object in a Datish model, so it is useful to show how these objects are related. Finally, elaborating object relationships is essential for meaningful domain comprehension and learning since it organizes knowledge into coherent structures and integrates new information with prior knowledge [69]. Relationships are ontological primitives. Both in process and substance ontology, reality is based on interactions. This suggests that interaction is a fundamental way to describe the connections among objects. Not all objects interact or interact directly. Our inclusive definition of objects permits mental thoughts, concepts, and events and processes to be modeled as objects. None of these objects interact with each other, or other objects with mass, in the same manner as two physical objects (e.g., billiard balls) do. We assume concepts, and ideas (e.g., number “3” or “Anna Karenina”) do not themselves change since they, themselves, do not possess energy. They change when humans (and other sentient beings) think and communicate about these objects [55], [70], and affect physical objects, via, for example, linguistic declarations, such as commands and requests, or speech acts [55], [71], [72]. Similarly, categories and other objects are related to one another in some way. For example, one concept can be a subtype of another (e.g., bird is a subtype of animal). These concepts are conceptually (or logically) linked via a “type of” relationship. In other words, categories can be related to categories, as well as to individuals. Hence, a notion applicable to all objects is a broad concept of relationship, which includes both physical interaction and conceptual linkages among concepts. We thus define a relationship as follows. Definition 5 – Relationship: A representation of a physical interaction or conceptual connection among one or more objects. We can now clarify some important implementation choices when using Datish. First, all constructs in a Datish model are objects and must have a description. However, this rule is agnostic of the exact nature of the description. In the simplest case, giving a category name is sufficient to satisfy this rule. The length, form and presentation of the description may vary depending on the purpose of the modeling. Datish views category differently than most existing conceptual modeling languages. Nearly all conceptual modeling languages manifest the principle of “inherent classification,” whereby individual objects are represented as members of categories (or, classes, entity types) [7], [73], [74]. In its strong form, the principle states that “specific things in the domain of interest (entities, objects, etc.) can be referred to only as instances of classes (variously referred to as entity types, categories, kinds)” [25, p. 229]. The best-known example is an entity-relationship model [75]. A weaker form of this principle is that an object may not be constrained to have only the attributes of its category (i.e., it may have additional, unique attributes), but the category remains the most common way of modeling domains (e.g., ArchiMate). In Datish, category serves two objectives, consistent with the cognitive benefits of classification and their role in creation of social reality. First, a category is cognitive tool, a convenient shortcut that eliminates the need for lengthy and repetitive descriptions. Second, a category is an object in a Datish model that can be used when no members of category yet exist. Hence, categories in Datish are much more flexible, but also inclusive of their use in traditional conceptual modeling. To enable this flexibility, in Datish it is possible to model categories first, and insist on objects always being members of categories. Furthermore, objects do not have to be members of any given category or can be members of multiple categories. For example, the same object may be simultaneously a member of the category student as well as employee. These variations become modeling choices based on the semantics of a domain. We additionally recommend documenting this choice (e.g., via a simple explanatory note, leveraging the flexibility of the description construct). Each approach – objects being dependent upon or independent of categories - can be beneficial in different scenarios. When domain rules are well-established, and objects share strong similarity with one another, it may be advantageous to treat them as members of predefined categories. In contrast, in domains characterized by high heterogeneity, it is prudent to model objects independently or somewhat independently of the categories to which they may belong. Furthermore, categories need not be homogeneous. This is consistent with both the flexibility and ubiquity principles. Natural categories are often heterogeneous [76]. For example, the category “bird” has very diverse members, including birds that do not fly. Similarly, some categories are best described by their exemplars. The category “game” is famous for lacking specific necessary and sufficient conditions (as famously argued by the philosopher Ludwig Wittgenstein). Having a flexible and rich apparatus for describing categories is therefore essential to ensure the nature of categories is well communicated when necessary. Additionally, we provide several suggested approaches for visually depicting homogeneous versus heterogeneous categories below. Emphasizing the way Datish views categories, we introduce two implementation notes: Datish Note 1 - Object-category: An object may be a member of one or more categories or exist independently of any categories. Datish Note 2 - Category-object: A category may or may not have defined members (objects); objects-members of category may or may not have the same description (e.g., share all attributes). Having only five constructs in the modeling toolkit, along with a single rule, offers a great deal of flexibility. At the same time, this grammar enables the modeling of a wide variety of scenarios, domains, and systems. The descriptions attached to all objects can be simple or complex, depending on the purpose of the modeling. The description need not be next to the object and can be in a separate section of a diagram (as common in architectural blueprints) or another representation [77], to permit longer descriptions. The specific ways to arrange or express these constructs may vary. It is feasible that some organizations’ analysts may customize their own styles or introduce situational or systematic constraints upon the usage of these constructs [78]. For example, some projects may insist on always modeling objects independent of categories; other projects may stipulate that every object must have a unique identifier attached. The flexibility of Datish explicitly enables and encourages these local choices. Extensions upon the core of Datish are also encouraged, especially when the language is used by more technical teams. In such cases additional rules and abstractions familiar to these users (e.g., cardinality) may be introduced. 4. Illustrative application Datish does not insist on a particular way to graphically depict its constructs. Indeed, we encourage the exploration of different approaches, as well as different presentation mediums (not only paper, but multimedia and interactive virtual environments). To enable exploration of Datish, we suggest a visual notation for Datish based on a two-dimensional format (further discussed below). Figure 1 shows the symbols we deem consistent with the principles of universal conceptual modeling, with some rationale for why the symbols were chosen. Description Graphical representation Object shown as cloud (universal customer symbol) with a simple description Individual Object ID: X56Z5 Barack Obama shown as filled circle (universal Bird symbol) with name, and Blue beak attributes Crane-like Gulf of Mexico Oil Spill Individual Object shown using multimedia (e.g., ID: X56Z5 video, image) either in the diagram or in the object Object: X56Z5 description section Relationship Yair Wand shown as lines or arrows (both universal shapes) Ron Weber Homogeneous and heterogeneous categories shown as square (a universal Court US Supreme Court shape) heterogeneous categories can be shown as squares with distinct members (or can be described in a textual Supreme Court of Virginia narrative) Small Claims Figure 1: Illustrative visual representations of object and description Depending on the modeling objectives, description of objects, categories and relationships can be represented with greater or lesser formality and greater or lesser consistency. For example, if Datish is used to model relational databases, descriptions can be lists of attributes, and the unique identifier is unnecessary. If such lists are short, attributes are best shown together with the objects. In contrast, when modeling heterogeneous data to be ingested by a data lake, descriptions may be shown in a separate description section as lists of attributes, representative multimedia, or narratives. We illustrate the use of Datish in a real-world restaurant modeling case. We model these requirements with Datish (Figure 2). The restaurant domain is suitable for this illustration because it is simple enough to understand, yet as we show, to model the real-world complexity of this domain, we need a flexible language such as Datish to capture this complexity. Dante Lake is the owner of a restaurant chain called Deats (all names in the case are pseudonyms). Deats serves menu items following recipes with specially sourced local ingredients. Among the employees, Dante employs cooks in restaurants, and utilizes a third-party delivery service, YourDoor, to bring items to customers’ homes. Dante is a data-driven decision-maker and relies on data to make employee performance, food quality, and menu selection decisions. Dante uses several databases to manage the restaurants, including a data lake. These databases store data about employees, items, recipes, ingredients, and orders. Dante uses IoT sensors and their data streams (e.g., log files) to ensure health code compliance on ingredient storage and preparation, including tracking temperature and humidity. Dante also uses data from YourDoor, which provides data via an API in JSON format to its clients. Last, Dante monitors social media opinions of restaurants and uses data from the popular review site Yum, which also utilizes JSON to make data available. There are several locations of Deats, captured by the homogenous category Restaurant that has a standardized description. Dante also hires employees to work at the restaurants but wishes to record idiosyncratic information about them. The Employee heterogeneous category reflects this. Notice how Stefano is an individual object and one of a kind. Stefano is the head chef for all restaurants, but also an employee who supervises cooks and invents items. Stefano is one of the key reasons for the success of the restaurant, as he, together with his wife, Gina, scout local organic farms and develops the award-winning unique recipes that the customers love so much. Gina is not an employee of a restaurant but has an influence over its operations and occasionally leaves digital traces in her opinions about the meals and modification suggestions. At the restaurants, cooks prepare items for orders using ingredients that should comply with safety. Safety is a heterogeneous category as different ingredients have varying safety requirements that are based on local and regional public health policies for each Deats location, furthermore special one-off safety rules also exist. Safety includes parameters such as Temperature and Humidity but might also specify other, idiosyncratic requirements. Customers can place delivery or Dine-in orders. Delivery orders are handled by a delivery service (currently YourDoor) that delivers the orders to customers. Customers’ descriptions might be tied to Reviews provided by Yum or they may remain independent. We also note the social network of the customers (made of social media connections, friends, family, online influences). It is not modeled in detail, as its complexity exceeds the scope of modeling, but showing it indicates an important source of the influence on customer’s behavior and perceptions of meals and service. This illustration in a very simple domain shows that modeling the real world is complex, nuanced, and messy. At times, individual people are the linchpins of operation, and their oversized role must be captured in the model. In addition, some suspected influences (such as social networks) are based on crude assumptions, to be evaluated with more data later. Some categories in the domain are uniform, whereas others are heterogeneous and focal individual instances of these objects are important to represent. Datish is capable of capturing and representing these nuances. Name: Yum Social network part of Temperature Humidity hosts influenced by Name: Standard XTW23 is part of is part of Review makes Customer is a delivers Delivery Service Safety places handles deliveries complies with mentions Order type of Delivery Ingredient type of has includes Dine In Meal samples Restaurant ID:E189478 prepares invents Name: Stefano Boiardi has Head Chef supervises Cook is a wife of Name: Gina Boiardi Employee type of An advocate of unique and local cuisine and suggests new ideas mentions Figure 2: Datish script of the restaurant scenario 5. Discussion and outlook for the future We argue for the need of a universal language for conceptual modeling and propose Datish to be such language. Datish is based on a set of principles derived from multiple disciplines and is intended to be useful for modeling “anything, anytime by anybody,” reflecting the need for new types of users to be engaged in conceptual modeling and flexible language use. A sample visual representation was presented to illustrate how Datish can be used. By design, Datish is intended to be lightweight and accessible, yet also expressive. With Datish users can model many rules, supported by the flexible and modular language design. It is even possible to model more advanced rules (e.g., cardinality as descriptions of relationships). However, in its basic form Datish only insists on the constructs and the rule that the general audiences can understand. Although we presented a possible visual notation for Datish, it is not intended to be final or definitive. Just as Datish constructs are explicitly grounded in the theoretical principles of universal conceptual modeling, similar attention is required to develop Datish visual notations. These need to be grounded in relevant foundations of visual notation development (e.g., Moody [79]), and be rigorously empirically evaluated [80], including by considering different design alternatives [81]. Furthermore, visual is only one of the modalities. Datish is medium-agnostic and future efforts are needed to develop additional ways of representing Datish (e.g., sound, text). This is especially important for supporting users with impairments. As with visual notations, additional modalities require rigorous development and evaluation. Future research could help to refine and develop the visual, auditory, and other modality symbols of Datish as well as to apply it to a large set of modeling applications in various domains. Future work can also consider using artificial intelligence to generate and parse Datish scripts. For example, advanced natural language and computer vision techniques could be applied to descriptions to parse the semantics of Datish automatically. Another important area for future research is the development of methods for using Datish. Datish is inherently flexible, so future research can provide more explicit rules for how to use the language in specific modeling scenarios. One opportunity is to use Datish for lightweight and informal modeling in Agile and DevOps settings. Alternatively, in cases where requirements are stable, well understood and agreed upon, Datish may be used in a traditional way (e.g., to support relational database design by focusing on category, relationship and description using attributes). Finally, Datish might become helpful in the recognition of the importance of conceptual modeling in any application domain or content. The ultimate success of the Datish project may not be in the language itself, but in fostering the dialog within the conceptual modeling community on universal and inclusive modeling. References [1] B. Thalheim, “Towards a theory of conceptual modelling.,” J. Univers. Comput. Sci., vol. 16, no. 20, Art. no. 20, 2010. [2] R. Lukyanenko, J. Parsons, V. C. Storey, B. M. Samuel, and O. Pastor, “Principles of universal conceptual modeling,” in EMMSAD 2023, Saragosa, Spain: Springer, 2023, pp. 1–15. [3] R. Lukyanenko, D. Bork, V. C. Storey, J. Parsons, and O. Pastor, “Inclusive Conceptual Modeling: Diversity, Equity, Involvement, and Belonging in Conceptual Modeling,” in ER Forum 2023, Lisbon, Portugal: Springer, 2023, pp. 1–4. [4] C. E. H. Chua, M. Indulska, R. Lukyanenko, W. Maass, and V. C. Storey, “Data Management,” MISQ Quarterly Online, pp. 1–10, 2022. [5] V. C. Storey, R. Lukyanenko, and A. Castellanos, “Conceptual Modeling: Topics, Themes, and Technology Trends,” ACM Computing Surveys, vol. 55, no. 14s, pp. 1–38, 2023. [6] H. C. Mayr and B. Thalheim, “The triptych of conceptual modeling,” Software and Systems Modeling, pp. 1–18, 2020. [7] J. Mylopoulos, “Information modeling in the time of the revolution,” Information Systems, vol. 23, no. 3–4, pp. 127–155, 1998. [8] J. Recker, R. Lukyanenko, M. A. Sabegh, B. M. Samuel, and A. Castellanos, “From Representation to Mediation: A New Agenda for Conceptual Modeling Research in A Digital World,” MIS Quarterly, vol. 45, no. 1, pp. 269–300, 2021. [9] M. Hvalshagen, R. Lukyanenko, and B. M. Samuel, “Empowering Users with Narratives: Examining The Efficacy Of Narratives For Understanding Data-Oriented Conceptual Models,” Information Systems Research, pp. 1–38, 2023. [10] A. Castellanos, M. Tremblay, R. Lukyanenko, and B. M. Samuel, “Basic Classes in Conceptual Modeling: Theory and Practical Guidelines,” Journal of the Association for Information Systems, vol. 21, no. 4, pp. 1001–1044, 2020. [11] S. Al-Fedaghi, “In Pursuit of Unification of Conceptual Models: Sets as Machines,” arXiv preprint arXiv:2306.13833, 2023. [12] I. Compagnucci, F. Corradini, F. Fornari, and B. Re, “Trends on the Usage of BPMN 2.0 from Publicly Available Repositories,” presented at the International Conference on Business Informatics Research, Springer, 2021, pp. 84–99. [13] M. zur Muehlen and J. Recker, “How much language is enough? Theoretical and practical use of the business process modeling notation,” in Seminal Contributions to Information Systems Engineering, Springer, 2013, pp. 429–443. [14] D. Bork, “Conceptual Modeling and Artificial Intelligence: Challenges and Opportunities for Enterprise Engineering,” in Enterprise Engineering Working Conference, Springer, 2022, pp. 3–9. [15] R. Lukyanenko, J. Parsons, and B. M. Samuel, “Representing Instances: The Case for Reengineering Conceptual Modeling Grammars,” European Journal of Information Systems, vol. 28, no. 1, pp. 68–90, 2019. [16] S. Nalchigar and E. Yu, “Conceptual modeling for business analytics: a framework and potential benefits,” in 2017 IEEE 19th Conference on Business Informatics (CBI), IEEE, 2017, pp. 369–378. [17] M. Schwartz, War and peace and IT: business leadership, technology, and success in the digital age. New York NY: IT Revolution, 2019. [18] L. Leite, C. Rocha, F. Kon, D. Milojicic, and P. Meirelles, “A survey of DevOps concepts and challenges,” ACM Computing Surveys (CSUR), vol. 52, no. 6, pp. 1–35, 2019. [19] T. J. Teorey, D. Yang, and J. P. Fry, “A logical design methodology for relational databases using the extended entity-relationship model,” ACM Computing Surveys, vol. 18, no. 2, pp. 197–222, 1986. [20] G. Guizzardi, C. M. Fonseca, A. B. Benevides, J. P. A. Almeida, D. Porello, and T. P. Sales, “Endurant types in ontology-driven conceptual modeling: Towards OntoUML 2.0,” presented at the International conference on conceptual modeling, Springer, 2018, pp. 136– 150. [21] C. Gonzalez-Perez, “How Ontologies Can Help in Software Engineering,” in International Summer School on Generative and Transformational Techniques in Software Engineering, Springer, 2015, pp. 26–44. [22] D. Bjørner, Domain Science and Engineering: A Foundation for Software Development. Springer Nature, 2021. [23] C. L. Azevedo, M.-E. Iacob, J. P. A. Almeida, M. van Sinderen, L. F. Pires, and G. Guizzardi, “Modeling resources and capabilities in enterprise architecture: A well-founded ontology- based proposal for ArchiMate,” Information systems, vol. 54, pp. 235–262, 2015. [24] L. Lima et al., “An integrated semantics for reasoning about SysML design models using refinement,” Software & Systems Modeling, vol. 16, no. 3, pp. 875–902, 2017. [25] J. Parsons and Y. Wand, “Emancipating Instances from the Tyranny of Classes in Information Modeling,” ACM Transactions on Database Systems, vol. 25, no. 2, pp. 228–268, 2000. [26] P. Atzeni, C. S. Jensen, G. Orsi, S. Ram, L. Tanca, and R. Torlone, “The relational model is dead, SQL is dead, and I don’t feel so good myself,” ACM SIGMOD Record, vol. 42, no. 1, pp. 64–68, 2013. [27] O. Eriksson, P. Johannesson, and M. Bergholtz, “The case for classes and instances-a response to representing instances: the case for reengineering conceptual modelling grammars,” European Journal of Information Systems, vol. 28, no. 6, pp. 681–693, 2019. [28] R. Lukyanenko and J. Parsons, “Beyond Micro-Tasks: Research Opportunities in Observational Crowdsourcing,” Journal of Database Management (JDM), vol. 29, no. 1, pp. 1– 22, 2018. [29] P. Fettke and W. Reisig, “Systems Mining with Heraklit: The Next Step,” in BPM 2022 Forum, Münster, Germany: Springer, 2022, pp. 89–104. [30] D. Wüest, N. Seyff, and M. Glinz, “FlexiSketch: a lightweight sketching and metamodeling approach for end-users,” Software & Systems Modeling, vol. 18, pp. 1513–1541, 2019. [31] M. A. Bunge, “Systems everywhere,” in Cybernetics and applied systems, London England: CRC Press, 2018, pp. 23–41. [32] J. Benovsky, “The bundle theory and the substratum theory: deadly enemies or twin brothers?,” Philosophical Studies, vol. 141, no. 2, pp. 175–190, 2008. [33] M. A. Bunge, Treatise on basic philosophy: Ontology I: the furniture of the world. Boston, MA: Reidel, 1977. [34] J. Dupré, “A process ontology for biology,” The Philosophers’ Magazine, no. 67, pp. 81–88, 2014. [35] H. Herre, “General Formal Ontology (GFO): A foundational ontology for conceptual modelling,” in Theory and Applications of Ontology: Computer Applications, Springer, 2010, pp. 297–345. [36] A. N. Whitehead, Process and Reality. London England: Free Press, 2010. [37] T. Halpin, Object-role modeling fundamentals: a practical guide to data modeling with ORM. Technics Publications, 2015. [38] M. A. Bunge, Philosophical dictionary. Amherst, NY: Prometheus Books, 2003. [39] I. Jacobson, G. Booch, and J. Rumbaugh, The unified software development process, vol. 1. Reading MA: Addison-Wesley, 1999. [40] O. P. Lopez, F. Hayes, and S. Bear, “Oasis: An object-oriented specification language,” in International Conference on Advanced Information Systems Engineering, Springer, 1992, pp. 348–363. [41] G. Guizzardi, Ontological foundations for structural conceptual models. Enschede, The Netherlands: Telematics Instituut Fundamental Research Series, 2005. [42] G. Guizzardi, “Ontological meta-properties of derived object types,” presented at the International Conference on Advanced Information Systems Engineering, Springer, 2012, pp. 318–333. [43] R. Arp, B. Smith, and A. D. Spear, Building Ontologies with Basic Formal Ontology. in The MIT Press. Cambridge, MA: MIT Press, 2015. [Online]. Available: https://books.google.com/books?id=AUxQCgAAQBAJ [44] G. Harman, Object-oriented ontology: A new theory of everything. London England: Penguin UK, 2018. [45] R. Lukyanenko, V. C. Storey, and O. Pastor, “System: A Core Conceptual Modeling Construct for Capturing Complexity,” Data & Knowledge Engineering, vol. 141, pp. 1–29, 2022. [46] R. Lukyanenko, V. C. Storey, and O. Pastor, “Foundations of information technology based on Bunge’s systemist philosophy of reality,” Software and Systems Modeling, vol. 20, no. 1, pp. 921–938, 2021. [47] J. P. A. Almeida, R. A. Falbo, and G. Guizzardi, “Events as entities in ontology-driven conceptual modeling,” in International Conference on Conceptual Modeling, Springer, 2019, pp. 469–483. [48] G. Guizzardi, G. Wagner, J. P. A. Almeida, and R. S. Guizzardi, “Towards ontological foundations for conceptual modeling: the unified foundational ontology (UFO) story,” Applied ontology, vol. 10, no. 3–4, pp. 259–271, 2015. [49] S. Hitchman, “An interpretive study of how practitioners use entity-relationship modelling in a ternary relationship situation,” Communications of the Association for Information Systems, vol. 11, no. 1, p. 26, 2003. [50] O. Eriksson and P. J. Agerfalk, “Rethinking the Meaning of Identifiers in Information Infrastructures,” Journal of the Association for Information Systems, vol. 11, no. 8, pp. 433– 454, 2010. [51] E. Rosch, C. B. Mervis, W. D. Gray, D. M. Johnson, and P. Boyesbraem, “Basic Objects in Natural Categories,” Cognitive Psychology, vol. 8, no. 3, pp. 382–439, 1976. [52] K. Masri, D. Parker, and A. Gemino, “Using iconic graphics in entity-relationship diagrams: the impact on understanding,” Journal of Database Management (JDM), vol. 19, no. 3, pp. 22– 41, 2008. [53] F. Muff and H.-G. Fill, “Initial Concepts for Augmented and Virtual Reality-based Enterprise Modeling⋆,” in ER Demos and Posters 2021 co-located with 40th International Conference on Conceptual Modeling (ER 2021), 2021. [54] V. Ramesh, J. Parsons, and G. Browne, “What is the role of cognition in conceptual modeling? A report on the First Workshop on Cognition and Conceptual Modeling,” Conceptual Modeling, pp. 272–280, 1999. [55] J. R. Searle, The construction of social reality. Simon and Schuster, 1995. [56] P. Chen, “Entity-relationship modeling: historical events, future trends, and lessons learned,” Software pioneers: contributions to software engineering, pp. 296–310, 2002. [57] S. R. Harnad, Categorical Perception: The Groundwork of Cognition. Cambridge, MA: Cambridge University Press, 1990. [58] J. Foster, “Ontologies without Metaphysics: Latour, Harman and the Philosophy of Things,” Analecta Hermeneutica, no. 3, 2011. [59] A. Pap, “Disposition concepts and extensional logic,” in Dispositions, 1978, pp. 27–54. [60] W. V. Quine, “Intensions revisited,” Midwest Studies in Philosophy, vol. 2, no. 1, pp. 5–11, 1977. [61] S. March and G. Allen, “Toward a social ontology for conceptual modeling,” in 11th Symposium on Research in Systems Analysis and Design, Vancouver, Canada, 2012, pp. 57–62. [62] B. Berlin, D. E. Breedlove, and P. H. Raven, “General Principles of Classification and Nomenclature in Folk Biology,” American Anthropologist, vol. 75, no. 1, pp. 214–242, 1973. [63] J. H. Langdon, “Background: Evolutionary Classification and Fossil Dating,” in Human Evolution: Bones, Cultures, and Genes, Springer, 2023, pp. 31–49. [64] D. D. Kahneman, “The reviewing of object files: Object-specific integration of information,” Cognitive psychology, vol. 24, no. 2, pp. 175–219, 1992. [65] G. Murphy, The big book of concepts. Cambridge, MA: MIT Press, 2004. [66] O. Eriksson and P. J. Agerfalk, “Speaking things into existence: ontological foundations of identity representation and management,” ISJ, vol. 35, no. 1, pp. 1–30, 2021. [67] R. Clarke, A. Burton-Jones, and R. Weber, “On the Ontological Quality and Logical Quality of Conceptual-Modeling Grammars: The Need for a Dual Perspective,” Information Systems Research, vol. 27, no. 2, pp. 365–382, 2016. [68] D. L. Parnas, “A technique for software module specification with examples,” Communications of the ACM, vol. 15, no. 5, pp. 330–336, May 1972. [69] S. Kalyuga, “Knowledge elaboration: A cognitive load perspective,” Learning and Instruction, vol. 19, no. 5, pp. 402–410, 2009. [70] M. A. Bunge, Chasing reality: strife over realism. University of Toronto Press, 2006. [71] J. L. Austin, J. O. Urmson, and M. Sbisà, How to Do Things with Words. in William James lectures. Clarendon Press, 1975. [72] Y. Wand, D. E. Monarchi, J. Parsons, and C. C. Woo, “Theoretical foundations for conceptual modelling in information systems development,” Decision Support Systems, vol. 15, no. 4, pp. 285–304, 1995. [73] J. Peckham and F. Maryanski, “Semantic data models,” ACM Computing Surveys, vol. 20, no. 3, pp. 153–189, 1988. [74] J. M. Smith and D. C. P. Smith, “Database abstractions: aggregation and generalization,” ACM Transactions on Database Systems, vol. 2, no. 2, pp. 105–133, 1977. [75] P. Chen, “The entity-relationship model - toward a unified view of data,” ACM Transactions on Database Systems, vol. 1, no. 1, pp. 9–36, 1976. [76] R. Lukyanenko and B. M. Samuel, “Are all Classes Created Equal? Increasing Precision of Conceptual Modeling Grammars,” ACM Transactions on Management Information Systems (TMIS), vol. 40, no. 2, pp. 1–25, Forthcoming 2017. [77] M. Jabbari, J. Recker, P. Green, and K. Werder, “How do Individuals Understand Multiple Conceptual Modeling Scripts?,”JAIS, vol. 23, no. 4, pp. 1037–1070, 2022. [78] B. M. Samuel, L. Watkins, A. Ehle, and V. Khatri, “Customizing the Representation Capabilities of Process Models: Understanding the Effects of Perceived Modeling Impediments,” Software Engineering, IEEE Transactions on, vol. 41, no. 1, pp. 19–39, 2015. [79] D. L. Moody, “The ‘physics’ of notations: toward a scientific basis for constructing visual notations in software engineering,” Software Engineering, IEEE Transactions on, vol. 35, no. 6, pp. 756–779, 2009. [80] A. Burton-Jones, Y. Wand, and R. Weber, “Guidelines for Empirical Evaluations of Conceptual Modeling Grammars,” Journal of the Association for Information Systems, vol. 10, no. 6, pp. 495–532, 2009. [81] R. Lukyanenko, J. Parsons, and B. M. Samuel, “Artifact Sampling: Using Multiple Information Technology Artifacts to Increase Research Rigor,” in HICSS 2018, Big Island, Hawaii, 2018, pp. 1–12.