252 A Framework for Interactive Mining and Retrieval from Process Traces Doctoral Consortium ICCBR 2016 L. Canensi Department of Computer Science, Università di Torino, Italy canensi@di.unito.it 1 Research Summary Processes are everywhere. We can find processes in hospital, companies, universities, institutions and so on. Therefore, it is no surprise that the usage of information systems and enterprise resource planning tools has been rapidly growing in organizations and companies worldwide, of all kinds and sizes. The sequences (traces) of actions that have been com- pleted at an organization is usually stored in the so-called event log. Event logs constitute a very rich source of experiential knowledge, fundamental to support different tasks, like, e.g., mining a process model, or retriev-ing similar traces, in order to make predictions on the currently running process instance. In the Business Process Management (BPM) field, it is widely recognized that these activities are very important, and can be used to improve process execution and performance. However, all the works in the literature treat these activities separately, applying different theoretical and methodological approaches to each of them. Instead, in my PhD thesis I propose a comprehensive approach, aiming at integrating the construction of the process model and its analysis. In particular, my Thesis is composed of three steps: – initial process model construction – trace retrieval – interactive process model abstraction and refinement. The approach allows the user to take advantage of the integration of the three activities above; indeed, the initial process model we build is Copyright c 2016 for this paper by its authors. Copying permitted for private and academic purposes. In Proceedings of the ICCBR 2016 Workshops. Atlanta, Georgia, United States of America 253 used as an indexing structure to speed up trace retrieval; moreover, we exploit a unified methodological solution that allows to retrieve traces and specific paths in the model. We then exploit retrieval results as a basis to build a more abstract process model, in an interactive fashion. Indeed, our innovative approach is able to ”reconcile” apparently het- erogeneous needs of business process management, supporting a user- friendly interaction with domain experts and to take advantage of all the available knowledge sources in a comprehensive way, The following subsections describe in more detail the various steps of the thesis. 1.1 Initial process model construction The first part of the work is related to the Process Mining (PM) field. PM is a research discipline that discovers, monitors, and improves real processes, by extracting knowledge from traces in the event logs, read- ily available from today systems [1]. Each trace consists of an ordered sequence of activities. The mined process model can be used to under- stand, adapt and modify the real process to increase performance and become a high quality process. There are three main classes of process mining techniques [2]: – Discovery: the discovery of new process models based only on the event log – Conformance: conformance verification of the recorded behavior with respect to a provided model – Enhancement: extension of an existing process model using the infor- mation from the event log Most of the phases of a process life cycle [3] can benefit of PM tech- niques: they can be adopted to analyse an existing model, to diagnose problem, and possibly to adapt/redesign/tune the process model itself. All these considerations lead to define PM as a very important instru- ment for modern organizations that need to manage non-trivial opera- tional processes. My research deals with process discovery, the most relevant and widely used PM activity. There are many different approaches to process discovery. Different algorithms have their own specificities: some focus on local relations be- tween activities in the logs (Heuristic Miner [4]), while others focus on the whole log (Fuzzy Miner [5], Genetic Miner [6]). However, all of these 254 algorithms operate at a unique, system-defined level of abstraction. In- stead, in many domains, it would be very important to have the ability to build/refine models, working at different levels of abstraction. The contribution of my thesis to the process discovery area consists in a novel tool that allows the construction of a data structure (called log-tree), that can be used both as an initial model of the process (to be possibly abstracted in a further interactive session with the user), and as an index, to speed-up trace retrieval. The log-tree is a new representation formalism which has a well defined semantics (unambiguous) and main- tain a direct connection between traces in the log and elements in the process model. Therefore, it is usable as an index of the traces and as a standard process model. In order to build an index, the algorithm guar- antees that the log-tree only includes paths actually recorded as traces in the event log. In order to realize this objective, the algorithm: (1) makes an intensive use of all the available frequency information about the ac- tivities recorded in the event log; (2) properly forks the model into various branches, on the basis of the different execution contexts, implicitly rep- resented by subsets of the traces in the event log. 1.2 Trace Retrieval The second part of the thesis deals with trace retrieval. When the input trace is a currently running process instance, the retrieval of similar, al- ready completed instances recorded as traces in the log, can enable the user to make predictions about the current instance completion, or can recommend suitable actions, resources or routing decisions to be adopted next; these goals are treated in the literature within the operational sup- port research area [2] . Trace retrieval has been recently considered in the Case Base Reasoning (CBR) [7] literature. All approaches use traces as sources for retrieving and reusing user’s experience. For instance, the work in [8] proposes trace-based reasoning, a CBR approach where cases are not explicitly stored in a library, but are implicitly recorded as ”episodes” within traces. The paper in [9] extends that work, and defines a similarity measure to compare episodes extracted from traces. These work, however, do not aim at providing support in business process management, such as prediction for operational support, or pattern identification for ab- straction. The goal of these tools is therefore usually very different from ours. Moreover, current trace retrieval approaches typically take in input a fully specified trace. This is a severe limitation, because sometimes, the goal is to find races that fulfil partially specified patterns. We can 255 deal with this issue, by means of a powerful query language and query answering approach. The log-tree is used as an index, allowing fast retrieval from the avail- able event log. Thanks to its characteristics and methodological solutions, the tool implements operational support tasks in a flexible, efficient and user friendly way. It is worth noting that our approach, besides retrieving traces from the log-tree, also allows to retrieve paths from a generic (more abstract) process model (a graph). 1.3 Interactive process model abstraction The log-tree is already a process model, which guarantees a maximal pre- cision (i.e., it does not represent any behavior that was not recorded in the traces). However, in some cases, it may be useful to have a more general process model, with consequent loss of precision, that abstracts from neg- ligible details. So, the log-tree can be seen as a starting point to generate a more abstract process model in an interactive session of work, where the user is always allowed to inspect the current output, and possibly back- track to the previous step. The ability of retrieving specific traces/paths in the model, corresponding to properties/situations of interest, is very useful in supporting the abstraction process, by suggesting portions of the model that could be merged. To the best of our knowledge, path re- trieval and merging have never been described in the business process management literature, possibly with the exception of the fuzzy miner [5], which provides a functionality to cluster (merge) actions into macro actions. Moreover, no literature contribution provides a unifying frame- work, where a suite of different facilities are properly integrated as in our work, to support process mining at different level of abstraction, and efficient trace/path retrieval (also responding to abstract query patterns). References 1. “http : //www.win.tue.nl/ieeetf pm.” IEEE Taskforce on Process Mining: Process Mining Manifesto (last accessed on 4/11/2013). 2. W. M. P. van der Aalst, Process Mining: Discovery, Conformance and Enhancement of Business Processes. Springer Publishing Company, Incorporated, 1st ed., 2011. 3. W. Scacchi and P. Mi, “Process life cycle engineering: A knowledge-based approach and environment.,” Int. Syst. in Accounting, Finance and Management, vol. 6, no. 2, pp. 83–107, 1997. 4. A. Weijters, W. V. der Aalst, and A. A. de Medeiros, Process Mining with the Heuristic Miner Algorithm, WP 166. Eindhoven University of Technology, Eind- hoven, 2006. 256 5. C. W. Günther and W. M. P. Van Der Aalst, “Fuzzy Mining: Adaptive Process Simplification Based on Multi-perspective Metrics,” in Proceedings of the 5th Inter- national Conference on Business Process Management, BPM’07, (Brisbane, Aus- tralia), pp. 328–343, Springer-Verlag, 2007. 6. A. K. A. D. Medeiros and A. J. M. M. Weijters, “Genetic process mining,” in Appli- cations and Theory of Petri Nets 2005, volume 3536 of Lecture Notes in Computer Science, pp. 48–69, Springer-Verlag, 2005. 7. A. Aamodt and E. Plaza, “Case-based reasoning: foundational issues, methodolog- ical variations and systems approaches,” AI Communications, vol. 7, pp. 39–59, 1994. 8. A. Cordier, M. Lefevre, P.-A. Champin, O. Georgeon, and A. Mille, “Trace-Based Reasoning — Modeling interaction traces for reasoning on experiences,” in The 26th International FLAIRS Conference, May 2013. 9. R. Zarka, A. Cordier, E. Egyed-Zsigmond, L. Lamontagne, and A. Mille, “Similarity Measures to Compare Episodes in Modeled Traces,” in International Case-Based Reasoning Conference (ICCBR 2013) (Springer, ed.), Lecture Notes in Computer Science, pp. 358–372, Springer Berlin Heidelberg, July 2013.