=Paper=
{{Paper
|id=Vol-2046/beleczki-molnar
|storemode=property
|title=Modeling framework for designing and analyzing document-centric information systems based on HypergraphDB
|pdfUrl=https://ceur-ws.org/Vol-2046/beleczki-molnar.pdf
|volume=Vol-2046
|authors=András Béleczki,Bálint Molnár
}}
==Modeling framework for designing and analyzing document-centric information systems based on HypergraphDB==
Modeling framework for designing and analyzing document-centric information systems based on HypergraphDB András Béleczki Bálint Molnár bearaai@inf.elte.hu molnarba@inf.elte.hu ELTE Eötvös Loránd University Budapest, Hungary Abstract Using Document-centric Information Systems (IS ) in an Enterprise is very common nowadays: the IS s serves as the basis for storage of data, elements of business processes and they provide a flexible communi- cation protocol between Web-services too (XML-like documents). De- signing work-flows, object hierarchies and much more business-related entities can be done by various types of models (UML, BPMN, Petri- nets). The models and their relationships can be represented mostly through the combination of the Zachman and the TOGAF framework [1, 2] in our case. To assist the designing, analyzing, validating and optimizing steps during model creation, we suggest a generic model- ing approach based on the generalized hypergraphs. This helps avoid inconsistencies between the models without using any extra transfor- mations or cross-checks [3]. In our proposed designer tool we use the HypergraphDB which is a graph database providing every advantages of the generic hypergraphs [4, 5]. To extend the object-definition and rule formalism we use description logics beside the hypergraph formal- ism. 1 Introduction Since Information Systems (ISs) become more and more complex in these days, designing, developing or even analyzing it become a lot harder than before. This complexity originates from the shifting paradigm that the structured data-entities appears in the format of documents. The use of various electronic document types is common in organizations. The basic XML provide a semi-structured formalism, however the XML documents may embed unstructured parts as well beside the meta-data that can be considered semi-structured ones. More- over, the core of an IS may contain both structured and semi-structured data collections that are extended by unstructured elements. The documents in both the outer and inner environment of an IS are a reflection of the intended and realized data flows that embody the life cycles of data collections in relationships to the overarching organization structure and roles, to the related business processes and work-flows. Within this complex situa- tion, we need a theoretically sound approach that support the modeling and design steps then cross-checking and Copyright c by the paper’s authors. Copying permitted for private and academic purposes. In: E. Vatai (ed.): Proceedings of the 11th Joint Conference on Mathematics and Computer Science, Eger, Hungary, 20th – 22nd of May, 2016, published at http://ceur-ws.org 17 verification of consistency the models that were placed in the architecture. The model refinement and extension are carried out by systematic design principles that are under the supervision of constraints that are deduced from the assumptions of consistency and integrity. The set of models ordered into architecture framework yields support for operational function in the production time of an IS. In Section 2, we introduce these particular models through previous researches in literature, in Section 3 we define some notion for a better understanding about hypergraphs, in Section 4 we explain the chosen database- system for our model and Section 5 provides conclusions and possible future work. 2 Literature and Technical Review The Web-based and Web Information Systems are the typical examples that make extensive use of the various document formats. The emphasis on Web technologies slowly diminished as the application of Web technology, definitely at user interfaces, became commonplace. A systematic design approach to construct web-based ap- plications is discussed in. The method explained in [6] makes use of semi-structured and interactive documents represented by XML. Another paper presents an approach for a well-founded, concepts-based modeling process for a Web site. For designing of Web Information Systems, Rossi presents a design procedure [7]. There are many frameworks, which help to grasp the complexity of Information Systems, namely the Blokdijk’s perception of Information Systems, Zachman ontology and TOGAF, all of them were created for information systems [1, 2, 8]. An Information System supports business processes (Business Process Modeling, BPM ) within an enterprise and is tightly coupled to other IS usually. A fairly standard way to model business processes is either the application of Business Process Modeling (BPM) methods, or using Petri-nets. The Information Systems can also be perceived as a structure with underlying databases for structured, semi-structured (XML-based, eXtensible Markup Language) as well as unstructured documents . The documents play important role at the interface, at interaction level and at core activities of data processing. The integration level and the degree of reconciliation between Business Processes and organization can be analyzed on the base of ontologies and semantic approaches [9]; it provides an approach for validation and safeguarding the relationships between organization and processes within the architecture. There were some previous papers and researches that tried to put the before-mentioned approaches into a unified framework by essentially semi-formal way [10, 11, 12, 13]. The Enterprise Architecture framework is provided by a mapping across Zachman ontology and TOGAF framework [1, 2]. The Blokdijk’s collection of Information System Models yields a structuring guideline [8]. Since our proposed approach of the unified modeling is based on a generalized hypergraph theorem, it induces a need for a storage with this capabilities. From the technological point of view, there are a lot of graph-based database systems. The suggested HypergraphDB is an open-source project which is based on the knowledge management formalism known as directed hypergraphs. 3 Mathematical Background Hypergraphs. There are several conceptual formalization that are mentioned in other papers [10, 13] which can be described by a set of relationships from individual models (like UML-based class-diagram, work-flows, etc.). Since these models are representing different facets of perception of IS, and they represent a complex system through a set of complex, heterogeneous relationships. This set of relationships can be described by directed hypergraphs; the directed hypergraph applies the same basic notions as the generalized hypergraphs with the extension of direction. In this set we can separate the elements in two sub-sets: • hierarchical; • network-like relationships. The hypergraphs as mathematical structure seems to be suitable for representing the interrelationships among the models, views, viewpoints, perspectives, and the overarching documents and business processes [1, 5]. To gain insight into the hypergraphs we start with the basic definitions in order to apply for depicting the before-mentioned complex relationships. 18 Definition 1. A hypergraph H is a pair of (V,E) of a finite set of V = {v1 , v2 , ...vn } and a set E of nonempty subsets of V. The elements of V are called vertices or nodes, the elements of E are called edges [4]. Definition 2. Generalized or extended hypergraphs. The notion of hypergraph may be extended so that the hyperedges can be represented – in certain cases – as vertices, i.e. a hyperedge e may consist of both vertices and hyperedges as well. The hyperedges that are contained within the hyperedge e should be different from e [4]. Considering a document model, a proper document type hierarchy can be interpreted as a ordered sub-set of the hyper-edges. In a document subpart hierarchy, a specific subpart of document may be denoted by a vertex within a particular hyper-edge that describes this document that contains the subpart, although that subpart as a vertex may include a document type hierarchy that can be depicted by a hyperedge. Definition 3. A directed hypergraph is an ordered pair → − → − H = V, E = {→ − ei : i ∈ I} ; (1) → − where V is a finite set of vertices and E is a set of hyperarcs with a finite index-set I. Every hyperarc → − ei can be interpreted as an ordered pair − → − → → − − − ei = e+ i = (e + i , i), ei = (i, ei ) ; (2) − → − − →− − → where e+ + + i ⊆ V is the set of vertices of ei and ei ⊆ V is the set of vertices ei . The elements of ei are called − → tail of → − ei , while elements of e− are called head [4]. i The potential implementations of hypergraphs in a hypergraph database allows for linking attributes to vertices, even more to hyperedges. The target domain, namely documents and model of Information Systems within organizations, contains complex n-ary relationships. The hypergraph provides the opportunity to represent recursive construction, to describe logical relations, to store compound structures along with their values and to follow variable lifetimes across various processes. [5, 14, 15] Figure 1: Example for Directed Hypergraph Representing a Sample of Essential Relationships As an illustration of the basic concepts of directed hypergraph, an example can be seen in Figure 1. that 19 makes sense of the representation for the domain by hypergraph. The essential characteristics is that vertices contain composite constituents that are themselves may be graphs; generalized hyperedge may contain other hyperedges but not itself and nodes. Detailed description about the Architecture Describing Hypergraph can be found in [16]. 4 Using HypergraphDB The HypergraphDB is an extensible, portable, distributed open-source data-storage mechanism. It is a graph- databased designed specifically for artifical intelligence and semantic web projects, however because of it general mindset, it is a perfect tool to represent heterogeneous relationships between different types too. The following key facts are convincing enough to use the HypergraphDB as tool to store our model [17]: • The mathematical definition of a hypergraph is an extension to the standard graph concept that allows an edge to point to more than two nodes. HyperGraphDB extends this even further by allowing edges to point to other edges as well and making every node or edge carry an arbitrary value as payload. • The basic unit of storage in HyperGraphDB is called an atom. Each atom is typed, has an arbitrary value and can point to zero or more other atoms. • Data types are managed by a general, extensible type system embedded itself as a hypergraph structure. Types are themselves atoms as everybody else, but with a particular role. • The storage scheme is platform independent and can thus be accessed by any programming language from any platform. Low-level storage is currently based on BerkeleyDB from Sleepycat Software. • Size limitations are virtually non-existent. There is no software limit on the size of the graph that are managed by a HyperGraphDB instance. Each individual value’s size is limited by the underlying storage, i.e. by BerkeleyDB’s 2GB limit. However, the architecture allows bypassing BerkeleyDB for particular types of atoms if one so desires. • The implementation is solely Java based. It offers an automatic mapping of idiomatic Java types to a HyperGraphDB data schema which makes HyperGraphDB into an object-oriented database suitable for regular business applications. Since there aren’t any first-party user interface for the HypergraphDB, the firs step to start using it was to design and develop a middle-ware software which can create complete hypergraphs by creating the appropriate nodes and edges based on various input. These inputs can be mostly XML-based descriptors - like OWL - but can be also some custom, user defined XML schema. In our case, we had a tool - written in C++ using the Qt Framework [18, 19] - which is capable of designing Workflow Models based on Petri-nets. This tool generates a custom XML file consisting of the places, trasitions, arcs, flow relations, presets and ofsets of transitions and all other required data. To test the capabilities of the database, we created a hypergraph based on a business process. This process was first designed in the above-mentioned tool, then the XML output was passed to the middle-ware. After the middle-ware finished the processing of the XML file it created the necessary nodes and the edges. Also utilizing the HypergraphDB efficiency the nodes and hyperedges can be labeled with custom JAVA classes, therefore the potential of the object-oriented class hierarchy is exploitable. 5 Conclusion and Future Work There is a bunch of aspects to analyze the relations among different model-types. For every designing step mentioned before [1, 2] there are several models to depict the fairly similar aspects of the given system. This means that there has to be a transformation or mapping function which selects and creates the relevant rela- tionships between the elements of these models. This mappings can be stored as a sub-hypergraph also in the HypergraphDB, so the information that are about an IS can be handled in uniform way. The proposed approach for uniform representation of IS from an architectural viewpoint offers the opportunity for united handling of models and exploring the graph theoretical tool sets for further analysis. 20 Figure 2: The Framework’s Components and their Interactions References [1] Zachman, J.A., 1987. A Framework for Information Systems Architecture, IBM Systems Journal Volume, 26, No. 3, pp. 276-292 [2] Open Group, 2010. TOGAF: The Open Group Architecture Framework, TOGAF R Version 9, http://www.opengroup.org/togaf/ [3] Suh, N.P.. 2001. Axiomatic Design: Advantages and Applications. Oxford University Press, New York [4] Bretto, A. 2013. Hypergraph Theory: An Introduction. Springer. [5] Gallo, G., Longo, G., Pallottino, S., Nguyen, S. 1993. Directed hypergraphs and applications. Discrete applied mathematics, 42(2), 177-201. [6] Köppen, E., Neumann, G. , 1999. “Active hypertext for distributed web applications”, in: Proceedings of The Eighth IEEE International Workshops on Enabling Technologies: Infrastructure for Collaborative Enterprises (WET-ICE’99), pp. 297—302, [7] Rossi,G., Schwabe,D., Lyardet,F., 1999.“Web application models are more than conceptual models”, in: P. Chen et al. (Ed.), Advances in Conceptual Modeling, LNCS, vol. 1727, pp. 239—252, Springer-Verlag, Berlin [8] Blokdijk, A., Blokdijk, P. 1987. Planning and Design of Information Systems, Academic Press, London [9] Gábor, A., Kő, A., Szabó, I., Ternai, K., Varga, K. 2013. Compliance Check in Semantic Business Process Management, in: On the Move to Meaningful Internet Systems (OTM) 2013 Workshops. 353-362. Springer Berlin Heidelberg 21 [10] Molnár, B. 2014. Applications of hypergraphs in informatics: a survey and opportunities for research. Ann. Univ. Sci. Budapest. Sect. Comput. 42, 261–282. [11] Molnár, B., Tarcsi, A. 2011. Architecture and System Design Issues of Contemporary Web-based Information Systems, in: Proceedings of the 5th International Conference on Software, Knowledge Information, Industrial Management and Applications (SKIMA 2011), September 8-11, 2011, Benevento, Italy. [12] Molnár, B., Benczúr, A. Facet of Modeling Web Information Systems from a Document-Centric View, in: International Journal of Web Portals (IJWP), 5(4), 57-70, 2013, IGI Global [13] Molnár, B., Benczúr, A., Béleczki, A. 2016. A Model for Analysis and Design of Information Systems based on a Document Centric Approach, in: Intelligent Information and Database Systems (IIDS), 290-299, Springer-Verlag, Berlin [14] Ausiello, G., Franciosa, P. G., & Frigioni, D. 2001. Directed hypergraphs: Problems, algorithmic results, and a novel decremental approach, in: Theoretical Computer Science pp. 312-328, Springer Berlin Heidelberg [15] Iordanov, B. 2010. Hypergraphdb: a generalized graph database, in: Web-Age Information Management pp. 25-36, Springer Berlin Heidelberg [16] Molnár B., Benczúr A., Béleczki A., 2016. Formal Approach to Modelling of Modern Information Systems, International Journal of Information Systems and Project Management, (to be published) [17] Kobrix Software. 2010. HypergraphDB - A Graph Database. [ONLINE] Available at: http://hypergraphdb.org. [Accessed 27 May 2016]. [18] The Qt Company. 2012. Qt - Home. [ONLINE] Available at: https://www.qt.io. [Accessed 27 May 2016]. [19] Stroustrup, B. 1995. The C++ programming language. Pearson Education India 22