Integrated Data and Process Management: Finally?

                                         Marlon Dumas

                                   University of Tartu, Estonia
                                     marlon.dumas@ut.ee


         Abstract. Contemporary information systems are generally built on the princi-
         ple of segregation of data and processes. Data are modeled in terms of entities
         and relationships while processes are modeled as chains of events and activities.
         This situation engenders an impedance mismatch between the process layer, the
         business logic layer and the data layer. We discuss some of the issues that this
         impedance mismatch raises and analyze how and to what extent these issues are
         addressed by emerging artifact-centric process management paradigms.


1     The Data Versus Process Divide
Data management and process management are both well-trodden fields – but each in
its own way. Well-established data analysis and design methods allow data analysts to
identify and to capture domain entities and to refine these domain entities down to the
level of database schemas in a seamless and largely standardized manner. Concomi-
tantly, database systems and associated middleware enable the development of robust
and scalable data-driven applications, while contemporary packaged enterprise systems
support hundreds of business activities on top of shared databases.
     In a similar vein, well-documented and proven process analysis and design methods
allow process analysts to identify and to capture process models at different levels of
abstraction, ranging from high-level process models suitable for qualitative analysis and
organizational redesign down to the level of executable processes that can be deployed
in Business Process Management Systems (BPMS).
     But while data management and process management are each well supported by
their own body of mature methods and tools, these methods and tools are at best loosely
integrated. For example, when it comes to accessing data, BPMS typically rely on
request-response interactions with database applications or packaged enterprise sys-
tems. Typically, data fetched from these systems are copied into the “working memory”
of the BPMS. The data in this working memory are then used to evaluate business rules
relevant to the execution of the process, and to orchestrate both manual and automated
work. But the burden of synchronizing the working data maintained by the BPMS with
the data maintained by the underlying systems is generally left with the developers.
     More generally, the “data vs. process” divide leads to an impedance mismatch be-
tween the data layer, the business logic layers and the process layer, which in the long
run, hinders on the coherence and maintainability of information systems. In particular,
the data vs. process divide has the following effects:
    – Process-related and function-related data redundancy. The BPMS maintains data
      about the state of the process, since these data are needed in order to enable the
2         M. Dumas

      system to schedule tasks, react to events and to evaluate predicates attached to
      decision points in the process. On the other hand, data entities manipulated by
      the process are stored in the database(s) underpinning the applications with which
      the BPMS interacts. Hence, the state of the entities is stored both by the BPMS
      and by the underlying applications. In other words, data are managed redundantly
      at the database layer and at the process layer, thereby adding development and
      maintenance complexity.
    – Business rules fragmentation and redundancy. Some business rules are encoded
      at the level of the business process, others in the business logic layer (e.g. using a
      business rules engine) and others in the database (in the form of triggers or integrity
      constraints). Worst, some rules are encoded at different levels depending on the
      type of rule and the data involved. This fragmentation and redundancy hampers
      maintainability and potentially leads to inconsistencies.

    The effects of this mismatch are perhaps less apparent when a one-to-one mapping
exists between the instances of a given process and the entities of a given entity type.
This is the case for example of a typical invoice handling process where one process
instance (also called a “case”) corresponds exactly to one invoice. In this context, the
state of a process instance maps neatly to the state of an entity. Ergo, the data required
by the process, for example when evaluating branching conditions, is restricted to the
data contained in the associated entity (i.e. the invoice in this example) and possibly to
the state of other entities within the logical horizon [5] of the said entity – e.g. the Pur-
chase Order (PO) associated to the invoice. Accordingly, collecting the data required
for evaluating business rules required by this process is relatively simple, while syn-
chronizing the state of the process instance with the state of its associated entity (at the
business logic and data layers) does not pose a major burden.
    The impedance mismatch however becomes much more evident when this one-to-
one correspondence between processes and entities does not hold. Consider for example
a shipment process where a single shipment may contain products for multiple cus-
tomers, ordered by means of multiple purchase orders (POs) and invoiced by means of
multiple invoices – perhaps even multiple POs and multiple invoices per customer in-
volved. Furthermore, consider the case where the products requested in a given PO are
not necessarily sent all in a single shipment, but instead may be spread across multiple
shipments. In this setting, the effects of a customer canceling a PO are not circumscribed
to one single instance of the shipment process. Similarly, the effects of a delayed ship-
ment are not restricted to single PO. Consequently, business rules related for example
to cancellation penalties, compensation for delayed deliveries or prioritization of ship-
ments become considerably more difficult to capture, to maintain and to reason about,
as exemplified in numerous case studies [1, 9, 8, 3]. Traditional process management
approaches quickly hit their limit when dealing with such processes. The outcome of
this limitation is that a significant chunk of the “process logic” has to be pushed down to
the business logic layer (e.g. in the form of business rules) – which essentially voids the
benefits of adopting a structured process management approach supported by a BPMS.
    Service-oriented architectures (SOAs) facilitate the inter-connection of applications
and application components. Their emergence has greatly facilitated the integration of
data-driven and process-driven applications. SOAs have also enabled packaged enter-
                                   Integrated Data and Process Management: Finally?          3

prise software vendors to “open the box” by providing standardized programmatic ac-
cess to the vast functionality of their systems. But per se, SOAs do not address the prob-
lem of data and process integration, since data-centric services and process-centric ser-
vices are still developed separately using different methods. A case in point is Thomas
Erl’s service-oriented design method [4], which advocates that process-centric services
should be strictly layered on top of data-centric (a.k.a. entity-centric) services. Erl’s
approach consists of two distinct methods for designing process-centric services and
entity-centric services. This same principle permeates in many other service-oriented
design methods [7]. Such approaches do not address the issues listed above. Instead,
they merely reproduce the data versus process divide by segregating data-centric ser-
vices and process-centric services.


2     The Artifact-Centric Process Management Paradigm
This talk discusses emerging approaches that aim at addressing the shortcomings of the
traditional data versus processes divide. In particular, the keynote discusses the emerg-
ing artifact-centric process management paradigm [1, 2] and how this paradigm, in con-
junction with service-oriented architectures and associated platforms, enable higher lev-
els of integration and higher responsiveness to process change.
     Mainstream process modeling notations such as BPMN can be thought as be-
ing activity-centric in the sense that process models are structured in terms of flows
of events and activities. Modularity is achieved by decomposing activities into sub-
processes. Data manipulation is captured either by means of global variables defined
within the scope of a process or subprocess, or by means of conceptually passive data
objects that are created, read and/or updated by the events and activities in the process.
In contrast, the database applications and/or enterprise systems on top of which these
processes execute are usually structured in terms of objects that encapsulate data and/or
behavior. This duality engenders the above-mentioned impedance mismatch between
the process layer and the business logic and data layers.
     In contrast, artifact-centric process modeling paradigms aim at conceptually inte-
grating the process layer, the business logic and the data layer. Their key tenet is that
business processes should be conceived in terms of collections of artifacts that encap-
sulate data and have an associated lifecycle. Transitions between these states in this
lifecycle are triggered by events coming from human actors, modules of an enterprise
system (possibly exposed as services) and possibly other artifacts, thus implying that
artifacts are inter-linked. In this way, the state of the process and the state of the entities
are naturally maintained “in sync” and business processes are conceived as network
of inter-connected artifacts that may be connected according to N-to-M relations, thus
allowing one to seamlessly capture rules spanning across what would traditionally be
perceived to be multiple process instances.
     The talk also discusses ongoing efforts within the Artifact-Centric Service Inter-
operation (ACSI) project-2 . This project aims at combining the artifact-centric process
management paradigm with SOAs in order to achieve higher levels of abstraction dur-
ing business process integration across organizational boundaries. The key principle of
-2
     http://www.acsi-project.eu/
4        M. Dumas

the ACSI project is that processes should be conceived as systems of artifacts that are
bound to services. The binding between artifacts and services specifies where should
the data of the artifact be pushed to, or where it should be pulled from, and when. In
the ACSI approach, process developers do not reason in terms of tasks that are mapped
to request-response interactions between a process and the underlying systems. Instead,
they reason in terms of artifacts, their lifecycles, operations and associated data. Arti-
fact lifecycles are captured based on a meta-model – namely Guard-Stage-Milestone
(GSM) – that allows one to capture behavior, data querying and manipulation in a uni-
fied framework [6].
    Upon this foundation, the ACSI project is building a proof-of-concept platform that
supports the definition and execution of artifact-centric business processes. Challenges
addressed by ACSI include the problem of reverse-engineering artifact systems from
enterprise system logs – for the purpose of legacy systems migration – and the verifica-
tion of artifact-centric processes, which by nature are infinite-state systems due to the
tight integration of processes and data.
Acknowledgments. This paper is the result of collective discussions within the ACSI
project team. Thanks especially to Rick Hull for numerous discussions on this topic.
The ACSI project is funded by the European Commission’s FP7 ICT Program.

References
1. Kamal Bhattacharya, Nathan S. Caswell, Santhosh Kumaran, Anil Nigam, and Frederick Y.
   Wu. Artifact-centered operational modeling: Lessons from customer engagements. IBM Sys-
   tems Journal, 46(4):703–721, 2007.
2. David Cohn and Richard Hull. Business artifacts: A data-centric approach to modeling busi-
   ness operations and processes. IEEE Data Eng. Bull., 32(3):3–9, 2009.
3. Marlon Dumas. On the convergence of data and process engineering. In Proc. of the 15th In-
   ternational Conference on Advances in Databases and Information Systems (ADBIS), Vienna,
   Austria, pages 19–26. Springer, September 2011.
4. Thomas Erl. Service-Oriented Architecture (SOA): Concepts, Technology, and Design. Pren-
   tice Hall, 2005.
5. P. Feldman and D. Miller. Entity model clustering: Structuring a data model by abstraction.
   The Computer Journal, 29(4):348360, 1986.
6. Richard Hull, Elio Damaggio, Riccardo De Masellis, Fabiana Fournier, Manmohan Gupta,
   Fenno Terry Heath, Stacy Hobson, Mark H. Linehan, Sridhar Maradugu, Anil Nigam,
   Piyawadee Noi Sukaviriya, and Roman Vaculı́n. Business artifacts with guard-stage-milestone
   lifecycles: managing artifact interactions with conditions and events. In Proceedings of the
   Fifth ACM International Conference on Distributed Event-Based Systems (DEBS), New York,
   NY, USA, pages 51–62. ACM, July 2011.
7. Thomas Kohlborn, Axel Korthaus, Taizan Chan, and Michael Rosemann. Identification and
   analysis of business and software services - a consolidated approach. IEEE Transactions on
   Services Computing, 2(1):50–64, 2009.
8. Vera Künzle and Manfred Reichert. Philharmonicflows: towards a framework for object-aware
   process management. Journal of Software Maintenance, 23(4):205–244, 2011.
9. Guy Redding, Marlon Dumas, Arthur H. M. ter Hofstede, and Adrian Iordachescu. A flexible,
   object-centric approach for business process modelling. Service Oriented Computing and
   Applications, 4(3):191–201, 2010.