Automated Process Model Annotation Support: Building Blocks and Parameters Michael Fellmann1, Felix Oehmgen1 1 Institute of Computer Science, University of Rostock, Rostock, Germany {michael.fellmann, felix.oehmgen}@uni-rostock.de Abstract. In business process modeling, semi-formal models typically rely on natural language used to express the labels of model elements. This can easily lead to ambiguities and misinterpretations. To mitigate this issue, the combina- tion of process models with formal ontologies or predefined vocabularies has of- ten been suggested. A cornerstone of such suggestions is to annotate elements from process models with ontologies or predefined vocabularies. Although an- notation is suggested in such works, past and current approaches rarely discuss building blocks, parameters and strategies for automating the tedious and error- prone manual task. In this paper, we hence first describe the nature of the anno- tation task. We then identify building blocks and parameters for automated sys- tems and describe an implementation of an annotation system we used to conduct first empirical studies on the effect of parameters. The paper at hand in sum pre- sents design options and parameters for (semi-) automatically linking semi-for- mal process models with more formal knowledge representations. It hence may be a source of inspiration for further explorations and experiments on that topic. keywords: Business Process, Semantic Annotation, Automatic Matching. 1 Introduction In business process modeling, semi-formal modeling languages such as BPMN are used to specify which activities occur in which order within business processes. Whereas the order of the activities is specified using constructs of the respective modeling language, the individual semantics of a model element such as “Check order” is bound to natural language. However, if models have to be interpreted by machines, e.g. for offering modeling support, querying on a semantic level [20] or content analysis, a more formal, machine processable semantics of modeling elements is required [1]. More use cases that would be possible if an automated annotation could be realized are described in more detail in [18]. In the past, several approaches tried to formalize the semantics of individual model elements by annotating elements of ontologies or other predefined vocabularies that to some degree formally specify the semantics of a model element. However, such approaches suffer from a major limitation: Annotation is a highly man- ual and tedious task. The user has to select suitable elements of an ontology by browsing the ontology or doing a keyword-based search in the labels of the ontology. Even if the system is capable of presenting some annotation suggestions, e.g. based on lexical sim- ilarity of labels, the user has to make sure that annotations match the appropriate context in the process model by inspecting the structure of the ontology that typically is orga- nized in a hierarchy. For example, if the ontology contains two activities labelled with “Accept invitation”, it is important whether this activity is part of the hiring process (where the applicant accepts e.g. a job interview) or the planning process for business trips (where the employee accepts an invitation of a business partner). In other words, the semantic context of an element that is to be annotated must be considered. Since only a very limited number of highly automated context-sensitive approaches for pro- cess model annotation is available so far (see [18] for an overview on current and past annotation approaches, [19] for an implementation using Markov Logic), this contribu- tion is meant to facilitate developing, comparing and optimizing such approaches. To bootstrap systematic research in this direction, we describe building blocks and param- eters (in short: design options) for automated annotation. With this, interest in a very promising research topic should be raised; both in regard to scientific outcome as well as practical usefulness (for use cases, see e.g. [18]). The remainder is structured as follows. In Section 2, the annotation task is described and three major building blocks for semantic annotation are identified. In Section 3, these building blocks along with their parameters are described in more detail. In Sec- tion 4, first considerations and results for/of an empirical analysis are given. In Section 5, related work is discussed and in Section 6 the article is concluded. 2 Description of the Annotation Task 2.1 Fundamental Characteristics of the Annotation Task Semantic annotation as investigated in this paper means linking process model tasks (e.g. a task such as “Check order”) with elements of an ontology or vocabulary such as “Order checking”). We denote these elements as “concepts”. In regard to the character- istics of the ontology or vocabulary used for annotation, we assume that it is structured in a hierarchical way, that semantics of the hierarchy is “part-of” and that there is a partial ordering between siblings in the hierarchy. This assumption seems to be justified when considering major examples of vocabularies or ontologies such as the PCF (Pro- cess Classification Framework), a publicly available collection of approx. thousand en- terprise activities which is also available industry-specific versions [2]. Another exam- ple is the MIT Process Handbook [3], a large collection of enterprise knowledge inte- grated into an ontology where activities are also ordered in a part-of-hierarchy. 2.2 Deriving Building Blocks for IT-Support by Observing Human Annotators In order to understand which building blocks are required for an automated annotation approach, it is helpful to observe and interview human annotators about their strategy. We did so by observing and interviewing students who manually annotated business process models as a part of a tutorial. Process models were specified in the BPMN language and annotated with elements of the PCF (Process Classification Framework) taxonomy. 50 undergraduate students with good knowledge in process modelling par- ticipated in small groups in the exercise in the years 2012-2014 and annotated 23 mod- els in a group effort. Since this empirical work is not in the center of the article at hand, we only roughly report the insights we gained. A recurring pattern that has been ob- served both directly and by interviewing the students has been that annotation roughly followed a 3-step procedure: First, keyword search was performed to search for rele- vant elements of the PCF taxonomy. Second, in case that multiple relevant elements of the taxonomy were found, the context of these elements was considered and items of the taxonomy were preferred that better correspond to the overall topic of the process. For example, if the topic of the process was Human Resources (HR), participants pre- ferred activities belonging to the category “6. Human Resources” of the PCF taxonomy. Third, in a last step, the selection of an item for annotation was reviewed considering the annotation of preceding and following model elements to verify that it is meaningful and fits the process context. In this step, the partial ordering of the activity taxonomy was taken into consideration meaning that if activities in the taxonomy appeared to occur in a meaningful order (e.g. check order, approve order, execute order), partici- pants strived to not violate that order in the annotations. In this step, also activities that are on a similar hierarchy level (i.e. that are not more specific or detailed) than those selected for the surrounding model elements have been preferred, if possible. In sum, roughly three steps were executed: (1) retrieve annotation candidates by lexical match- ing, (2) put annotation candidates into context and select the most meaningful and (3) optimize annotation in regard to the annotations of surrounding model elements in terms of order and hierarchy level. These three steps inspire corresponding building blocks of an automated annotation approach which we refer to as element annotation, context detection and annotation fitting. They are described in the following along with adjustment parameters. 3 Building Blocks and Parameters 3.1 Element Annotation For annotating process model activities, relevant activity concepts in the taxonomy have to be found. It is thus necessary to match model element labels against activity concepts from the vocabulary as it is illustrated in Fig. 1. To match process labels against vocabulary concepts, we basically need a similarity function 𝑠𝑖𝑚𝑎𝑐 () that re- turns the similarity between a process activity 𝑎 ∈ 𝐴 and an activity concept 𝑐 ∈ 𝐶 between 0 and 1. 𝑠𝑖𝑚𝑎𝑐 (𝑎, 𝑐) ∈ [0,1] (1) Using this function, a set of annotation candidates 𝑀 (for “metadata”) can be com- puted containing process elements 𝑎 that match to vocabulary concepts 𝑐 with a match- ing value 𝑠 ∈ [0,1] being above a similarity threshold 𝑡ℎ𝑟𝑠𝑖𝑚 and that occurs between a minimum level 𝑙ℎ𝑚𝑖𝑛 (to exclude root node) and maximum level 𝑙ℎ𝑚𝑎𝑥 (to prevent too fine-grained annotations) hierarchical position in the taxonomy. The hierarchical position for a concept 𝑐 ∈ 𝐶 is given by the function ℎ(𝑐). 𝑀 = { (𝑎, 𝑐, 𝑠) | 𝑎 ∈ 𝐴 ∧ 𝑐 ∈ 𝐶 ∧ 𝑠 ≥ 𝑡ℎ𝑟𝑠𝑖𝑚 ∧ 𝑙ℎ𝑚𝑖𝑛 ≤ ℎ(𝑐) ≤ 𝑙ℎ𝑚𝑎𝑥 } (2) Confirm order to customer Send notification about order acceptance Fig. 1. Element annotation 3.2 Identification of Context If the business topic such as e.g. “Human Resources” of a process model is known, then this knowledge can be leveraged to improve the annotation result. To do so, it could be used to discriminate between activity concepts with a comparable lexical matching value that are candidates for annotation. Hence it is required to detect the general topic of a model which we call category in the following. A category 𝑑 ∈ 𝐷 (for “domain”) can be interpreted as activity concepts that are sub-concepts of the taxonomy root, i.e. 𝐷 = {𝑑|𝑑 ∈ 𝐶 ⋀(𝑑, 𝑟𝑜𝑜𝑡) ∈ 𝐻} with 𝐻 being the set of hierarchy relations of the con- cepts in the taxonomy. In the simplest form, a category may be specified for the whole model by the user. If that is not possible, a category for the whole model may be derived in an automated way. However, unfortunately there may be models with multiple cate- gories (i.e. multiple topics in one model such as HR and financial planning) and hence it is not clear which category is dominating the model. Such an example is illustrated by Fig. 2. In order to cope with the possibility of multi-category models, the model needs to be partitioned into fragments 𝑓 ∈ 𝐹 containing subsets of activities 𝑓 ⊆ 𝐴 re- ferring to the same category (with a default-subset 𝑓𝑑 for parts of the model that cannot be assigned to a category): 𝐴 = ∪𝑁 𝑖=1 𝑓𝑖 ∪ 𝑓𝑑 where 𝑓𝑖 ∩ 𝑓𝑗 = ∅ , 𝑖 ≠ 𝑗 (3) The function 𝑑(𝑓) returns the category 𝑑 ∈ 𝐷 for a given fragment 𝑓 and function 𝑓(𝑎) returns the corresponding fragment 𝑓 ∈ 𝐹 for a given activity 𝑎 ∈ 𝐴. Each frag- ment is associated to exactly one category, i.e. ∀𝑓 ∈ 𝐹 ∶ ∃𝑑 ∈ 𝐷 ∧ |𝑑(𝑓)| = 1. Like- wise, each activity should be contained in exactly one fragment, i.e. ∀𝑎 ∈ 𝐴 ∶ ∃𝑓 ∈ 𝐹 ∧ |𝑓(𝑎)| = 1. About "Human Resources" About "Financial Planning" Fig. 2. Context in a process model The task of partitioning the model into fragments that are associated to a category is dependent on two parameters. First, a minimum size 𝑠𝑚𝑖𝑛 of a fragment has to be spec- ified in order to prevent partitioning the model in activity fragments containing single activities and hence losing the usefulness of categories to discriminate between candi- date activity concepts. Second, a lower threshold value for the minimum lexical match- ing value 𝑙𝑠𝑚𝑖𝑛 has to be set specifying the minimum average lexical similarity value between all activities contained in a fragment and sub-concepts in the category 𝑑 ∈ 𝐷 that may be assigned to the fragment. Hence a splitting function 𝑠𝑝𝑙𝑖𝑡() takes the pro- cess model and these two parameters as input and generates a set 𝐹 of process fragments as output. 𝑠𝑝𝑙𝑖𝑡(𝑃, 𝑠𝑚𝑖𝑛 , 𝑙𝑠𝑚𝑖𝑛 ) = 𝐹 (4) In order to use categories associated to fragments to augment element annotation, the implementation of the similarity function 𝑠𝑖𝑚𝑎𝑐 (a, c) has to be extended. Such an extended similarity function will give a higher similarity value if the category given by 𝑑(𝑓(𝑎)), i.e. the category associated to the fragment an activity belongs to, matches the category of 𝑐. In order to detect the latter, a function 𝑑′(𝑐) which returns the cate- gory 𝑑 ∈ 𝐷 for a given activity concept 𝑐 ∈ 𝐶 is defined. In order to control the influ- ence of category matches, i.e. if 𝑑(𝑓(𝑎)) = 𝑑′(𝑐), a weight 𝑤𝑐𝑎𝑡 is added to the refined similarity function 𝑠𝑖𝑚′ 𝑎𝑐 (). 𝑠𝑖𝑚′𝑎𝑐 (𝑎, 𝑐, 𝑤𝑐𝑎𝑡 ) ∈ [0,1] (5) 3.3 Annotation Fitting In order to increase the semantic quality of the annotation, annotations may be “fitted”. This in essence means to choose between activities that receive comparable matching values in the element annotation step according to notion of betweeness and differences in the hierarchy level. Fig. 3 illustrates this with a small example of three activities that are symbolized on the left side. The discrimination problem is considered for the middle activity. Comparably well matching activity concepts are depicted as grey shaded small circles on the right side. According to the notion of betweeness, the bottom most activ- ity concept c3 can be neglected since it is not in the area of preferred annotation candi- dates (surrounded by a dotted line). This is due to the fact that it is not between the already selected best matching activity concepts for the previous and following process activity which are illustrated as solid black filled circles. Further, according to the prin- ciple of preferring a similar hierarchical level, c2 can also be skipped. Hence amongst similar lexical matches, c1 is superior to c2 and c3 and hence c1 is selected for annotation. c1 ? c2 ? Area of preferred ? annotation candidates c3 ? Fig. 3. Selection of possible activity concepts The procedure introduced so far to select activity concepts for annotation roughly resemble to interpolation routines in common image manipulation software where the color of a pixel is calculated according to its neighbors (e.g. Gaussian Filter). We stick to that analogy and call this procedure of fitting semantic interpolation. Analogously, the radius of interpolation may be more than just one pixel which in our case is one preceding and one following activity. The radius may be extended to all preceding and following elements which in literature is also referred to as the corona of a process model element. In this way, the size of the corona may range from 1 to n, i.e. from all preceding and following activities reachable via one step or n steps in the process graph. Beyond the radius parameter for semantic interpolation, it is important that a fitting function is able to adjust the influence of the lexical matching in relation to the influ- ence of hierarchy. Taking this into account, we define a function 𝑓𝑖𝑡() as follows: 𝑓𝑖𝑡(𝑀, 𝑟, 𝑤𝑙𝑒𝑥 , 𝑤𝑡𝑎𝑥 ) = 𝑀𝐹𝐼𝑇 (6) where 𝑀 are the annotation candidates (cf. formula 2), 𝑟 is the radius used in se- mantic interpolation, 𝑤𝑙𝑒𝑥 the weight of the lexical matching result and 𝑤𝑡𝑎𝑥 is the weight of the hierarchy match. The latter refers to how the difference of the hierarchy levels of two activity concepts 𝑐 ∈ 𝐶 given by |ℎ(𝑐𝑖 ) − ℎ(𝑐𝑗 )| with 𝑖 ≠ 𝑗 affects the semantic interpolation. The function produces a fitted annotation set 𝑀𝐹𝐼𝑇 ⊆ 𝑀 with just one annotation per process activity, i.e. |𝐴| = |𝑀𝐹𝐼𝑇 |. 3.4 Overview of Possible Configurations The building blocks specified in in the previous sections may be combined leading to different configurations of the overall automatic annotation system. The following Ta- ble 1 describes these configurations. The first column provides a number for each con- figuration. The next three columns indicate if the building blocks (B1-B3) are used that are specified in the previous sections. The next column Configuration Description pro- vides a short name (in bold) and description of this configuration variant. The next column Configuration Parameters provides a list of accumulated configuration param- eters resulting from the different building blocks described in the previous sections. Table 1. Building blocks and parameters # B1 B2 B3 Configuration Description Configuration Parameters 1  Element matching – 𝑡ℎ𝑟𝑠𝑖𝑚 similarity threshold Element matching using lexical – 𝑙ℎ𝑚𝑖𝑛 min. hierarchy level of activity concepts matching strategies provides a simple approach useful e.g. to pro- – 𝑙ℎ𝑚𝑎𝑥 max. hierar. level of activity concepts vide a human user with sugges- tions for annotation. 2   Element matching with category Parameters of configuration variant 1 plus the information following additional parameters: Element matching is augmented with category information so that – 𝑠𝑚𝑖𝑛 minimum size of a fragment annotation better reflects the busi- – 𝑙𝑠𝑚𝑖𝑛 minimum average lexical similarity value ness context of the process model. It hence reduces off-topic annota- – 𝑤𝑐𝑎𝑡 weight of category matches tions. 3   Element matching with semantic Parameters of configuration variant 1 plus the interpolation following additional parameters: Element matching is augmented with semantic interpolation so that –𝑟 radius of semantic interpolation the annotation better reflects the – 𝑤𝑙𝑒𝑥 weight of lexical matching in interpol. order and granularity of activities represented in the vocabulary. It – 𝑤𝑡𝑎𝑥 weight of taxon. matching in interpolation hence provides a more “smooth” and standard-oriented annotation. 4    Element matching with category information and sematic inter- polation Parameters of configuration variant 1-3 leading to This configuration combines #2 an overall set of 9 parameters. and #3 and hence provides the most comprehensive annotation approach that has the highest po- tential to imitate human annotation behavior. The configurations and parameters described in Table 1 may be used in the develop- ment, comparison and optimization of different implementation strategies and hence support a systematically evaluation of automated annotation approaches. 4 Preliminary Analysis and Insights To gain first insights regarding the implementation of the building blocks introduced so far, a project was set up to create a simple algorithm. The goal of this algorithm is to match labels of process model activities to a similar or equal counterpart in a standard- ized framework. The standardized annotations are provided by the Process Classifica- tion Framework (PCF). It consists of twelve main hierarchies which are structured into sub-hierarchies of four levels. The hierarchies are in order of increased detail: Process category, process group, process and activity. The first five Process Categories contain operating process while the other describe management and support processes. The models were created by students in a non-related task. The labels were manually matched to the best corresponding concept in PCF thus creating a gold standard. This standard is the basis on testing the accuracy of the algorithm and offers valuable infor- mation by analyzing the characteristics of correct matches. 4.2 Simple Algorithm Structure The algorithm is designed to apply a sequence of techniques that each individually af- fect the outcome. The workflow below shows each step the algorithm takes in order to find the best match for a label. The round-edged rectangles represent techniques. The sharp-edged rectangles indicate the resource data as well as the resulting outcome. Fig. 4. Implemented procedure of annotation In the following, we describe how the implementation reflect the three building blocks introduced in the previous sections. Element annotation. The process starts with the resources holding the information schemas of in this case a model and the PCF. For convenience the labels of the individ- ual model elements are called activity labels and the annotations in the PCF are called PCF elements. Each step aims at deriving information about the activity label on dif- ferent levels. The first method compares solely the characters of two strings. The result is an editing distance showing lexical similarity. In this algorithm, the method used is called the Sorensen-Dice Coefficient. The result of this comparison between activity labels and PCF elements are cross products stored in a similarity matrix. This matrix is the core of the algorithm holding a similarity value for all possible pairs thus enabling an analysis for the best match. As stated in Section 3.1, the scores range between 0 and 1 with 1 being a 100% match. The following steps aim at modifying this value to single out the best match. We did not restrict the similarity threshold 𝑡ℎ𝑟𝑠𝑖𝑚 , the min. hierar- chy level 𝑙ℎ𝑚𝑖𝑛 was set to 1 and the max hierarchy level 𝑡ℎ𝑚𝑎𝑥 was set to unbounded. Context detection. To identify the context of a model helps to narrow down the list of possible matches according to their domain. For this analysis the theory in Section 3.2 is taken and applied on the test case, although at the moment we identify the context of a process model only globally, i.e. not partitioning the model in fragments (parame- ters 𝑠𝑚𝑖𝑛 and 𝑙𝑠𝑚𝑖𝑛 hence are irrelevant). The PCF is already classified into 12 domain specific hierarchies with a total of 4 layers. The hierarchy level process group (layer 2) thereby serves as the reference hierarchy. To match an activity label to the correct pro- cess group means in this case that the results are narrowed down to a list of max 60 PCF elements. The method uses information extracted by an external tool to derive certain words from the activity labels called synonyms as well as from the process groups called keywords which indicate domain affiliations. Keywords are words that show representative value for a process group in the PCF list. A word suits a keyword if it is present more than 3 times in a sub hierarchy (Fig. 9). This application roughly corresponds to building block 2 from Section 3.4. The matching process compares words by lexical matching. Semantic matching does not take place yet. The same procedure takes place to create the synonyms list. The synonyms are gen- erated by analyzing the activity labels for representative words. Since a gold standard is provided, a list of representative words on the activity label side towards process groups can be extracted (Fig. 10). The further calculation is the same as with the key- words. The synonyms list is however furthermore in need of constant updating for ap- plication on other models. The list created in this case is solely based on the gold stand- ard and still requires verification for other models. Both lists are a preliminary result and are stored as a reference table. Matching activity labels to this list enables a context- related comparison and highlights matches on a more abstract level. Controlling the influence of the these matches corresponds to parameter 𝑤𝑐𝑎𝑡 . Annotation fitting. The last step implies on a theoretical level (cf. Section 3.2) an- alyzing the predecessors and successors (i.e. the semantic context) of an activity label in the respective model. The content of models can be affiliated to certain process groups. The aim of this step is to punish and reward similarity scores by manipulating the overall score of a model in a process group. This behavior encourages scores of certain process groups to increase more in order to narrow down the highest similarity scores onto few process groups. For instance this step should increase the scores in process group 2.1 and 2.2 for model “1” above all other process groups to reward sim- ilarity scores in this sub-matrix (Fig. 8). This pattern of models showing an affinity to certain process groups was discovered by analyzing the gold standard but the algorithm further assumes that the results of the previous steps create the same pattern. The algo- rithm therefore manipulates the calculated scores. In terms of parameters, radius 𝑟 is unbounded and weights 𝑤𝑡𝑎𝑥 and 𝑤𝑙𝑒𝑥 are implicitly set to 0. After all methods are processed, an analysis is run to extract the best matches from the similarity matrix with the highest similarity score. Thus two scores are calculated showing the percentage of correct activity label to PCF element match and the percent- age of correct activity label to process group match (for the sake of brevity, process group is also simply called hierarchy in Fig. 5-10). 4.3 Test and Results The basic function of the algorithm is to create a matrix containing measures of simi- larity ranging between 0 and 1 across all possible matches and to manipulate these measures following each step. Each step can be adjusted on the amount of influence on the similarity measure. The testing phase was conducted in order to determine the ef- fectiveness of each step at varying influence. The test set includes 430 annotations de- rived of 33 independent models. The base result for comparison consists solely on lex- ical matching. The results of the test is displayed in three graphs (Fig. 5, 6, 7) found in the appendix. In total synonym matching achieved the best results and has a big effect on matching a label to the correct PCF element. It manages to increase the percentage of correct matches from 12% to 18% for direct matches and from 25% to 57% for hi- erarchy matches, i.e. where at least the process group (level 2) is detected correctly. The other steps show little to no improvement. The analysis and test are overall a first try at matching model labels. A test of the algorithm with combined influence of all steps is still missing and keyword and syno- nym matching step is heavily based on the characteristics of the gold standard. Further- more steps like semantic matching that have not been implemented yet. The analysis could however confirm that semantic patterns such as the overall topic of a process model can be detected and that the PCF can in fact be leveraged as a valuable standard framework. The analysis moreover showed promising results concerning the creation and application of a synonym list. Comparing our results to the method proposed by Leopold et al. [19], this method achieves about 76% correctness for annotating all activities with a concept from the correct main category and 44% of correctness for the process group level. In the light of these numbers (especially correctness on the process group level for which we achieve 57%), our approach seems promising. However, it requires a manually created synonym list. A detailed comparison is left open for future work. 5 Related Work Most approaches developed so far only suggest manual annotation (cf. an overview [18]). For example, [4] describe a mapping relation of a BPEL4WS process to an OWL- S ontology as well as relations between concepts from the OWL-S profile ontology to domain ontologies; [5–7] develops an approach for adding properties of model ele- ments or establishing relations to separate annotation models. A model for semantically annotating business process models is devised in [8]. There are however some works considering the annotation support by tools. For example, Bögl et al [9] describe a se- mantic linkage of Event-driven Process Chain (EPC) functions and events to ontology instances supported by a lexicon (WordNet), term extraction and stemming. Similarly, annotation approaches for BPMN models with ontologies have been developed [10– 12] and partly supported using various lexical analysis techniques. Also, the annotation of process models with other domain specific ontologies such as the SCOR model for supply chain management has been explored [13] as well as annotations of process models with goal models [14]. However, the only approach that we are aware that considers context information (e.g. in the form of preceding or following annotations) when calculating an annotation suggestion is that of Leopold et al. [19]. The approach makes use of a Markov Logic- based formalization and considers automated annotation as an optimization problem. Further, in the field of execution-level (i.e. runtime) processes, the structure and lifecy- cle of involved objects in the process is considered [15–17]. 6 Discussion and Conclusion As of today, annotation of process models is rarely automated. Also, rarely prototypes are shown. Regarding the semantics of annotation, context information is (apart from [19]) almost never used [18]. This is a surprising research gap that exists even today – after almost one decade of research on semantic technologies applied to BPM that started with simple process model annotation proposals like [1]. Therefore, a research opportunity lies in developing (semi-)automated annotation approaches in order to first leverage existing standards such as PCF and second to make use of the wealth of se- mantic technologies (e.g. for search and matching of models on the semantic level) when process models could automatically be annotated. In this paper, we first have described the nature of the annotation task and how humans perform it. We then iden- tify building blocks and parameters for automated systems that imitate human annota- tion behavior. We then conduct first empirical studies on the effect of parameters. It turned out that context information such as the topic of a process model is indeed very important for an automated annotation approach. All in all, this contribution aims to inspire more research on methods in (semi-)automatic approaches capable of linking semi-formal process models with more formal knowledge representations. With this, new use cases are possible as described in [18] shifting the automated interpretation of process models to a new and more semantic level. Literature 1. Thomas, O., Fellmann, M.: Semantic Process Modeling - Design and Implementation of an Ontology-Based Representation of Business Processes. Bus. Inf. Syst. Eng. 1, 438–451 (2009). 2. APQC: Process Classification Framework (PCF), Version 5.2.0. (2010). 3. Malone, T.W., Crowston, K., Herman, G.A.: Organizing Business Knowledge: The MIT Pro- cess Handbook. The MIT Press (2003). 4. Aslam, M.A., Auer, S., Shen, J., Herrmann, M.: Expressing Business Process Models as OWL-S Ontologies. In: Eder, J. and Dustdar, S. (eds.) Business Process Management Work- shops. pp. 400–415. Springer Berlin Heidelberg (2006). 5. Fill, H.-G.: Using Semantically Annotated Models for Supporting Business Process Bench- marking. In: Grabis, J. and Kirikova, M. (eds.) Perspectives in Business Informatics Re- search. pp. 29–43. Springer Berlin Heidelberg (2011). 6. Fill, H.-G., Schremser, D., Karagiannis, D.: A Generic Approach for the Semantic Annota- tion of Conceptual Models Using a Service-Oriented Architecture. Int. J. Knowl. Manag. 9, (2013). 7. Fill, H.-G.: On the Social Network Based Semantic Annotation of Conceptual Models. In: Buchmann, R., Kifor, C.V., and Yu, J. (eds.) Knowledge Science, Engineering and Manage- ment. pp. 138–149. Springer International Publishing (2014). 8. Mturi, E., Johannesson, P.: A context-based process semantic annotation model for a process model repository. Bus. Process Manag. J. 19, 404–430 (2013). 9. Bögl, A., Schrefl, M., Pomberger, G., Weber, N.: Semantic Annotation of EPC Models in Engineering Domains to Facilitate an Automated Identification of Common Modelling Prac- tices. In: Filipe, J. and Cordeiro, J. (eds.) Enterprise Information Systems. pp. 155–171. Springer Berlin Heidelberg (2009). 10. Francescomarino, C.D., Tonella, P.: Supporting Ontology-Based Semantic Annotation of Business Processes with Automated Suggestions. In: Halpin, T., Krogstie, J., Nurcan, S., Proper, E., Schmidt, R., Soffer, P., and Ukor, R. (eds.) Enterprise, Business-Process and In- formation Systems Modeling. pp. 211–223. Springer Berlin Heidelberg (2009). 11. Di Francescomarino, C., Tonella, P.: Supporting Ontology-Based Semantic Annotation of Business Processes with Automated Suggestions: Int. J. Inf. Syst. Model. Des. 1, 59–84 (2010). 12. Rospocher, M., Francescomarino, C.D., Ghidini, C., Serafini, L., Tonella, P.: Collaborative Specification of Semantically Annotated Business Processes. In: Rinderle-Ma, S., Sadiq, S., and Leymann, F. (eds.) Business Process Management Workshops. pp. 305–317. Springer Berlin Heidelberg (2010). 13. Wang, X., Li, N., Cai, H., Xu, B.: An Ontological Approach for Semantic Annotation of Supply Chain Process Models. In: Meersman, R., Dillon, T., and Herrero, P. (eds.) On the Move to Meaningful Internet Systems: OTM 2010. pp. 540–554. Springer Berlin Heidelberg (2010). 14. Lin, Y.: Semantic Annotation for Process Models: Facilitating Process Knowledge Manage- ment via Semantic Interoperability. Department of Computer and Information Science Nor- wegian University of Science and Technology, Trondheim, Norway (2008). 15. Born, M., Dörr, F., Weber, I.: User-Friendly Semantic Annotation in Business Process Mod- eling. In: Weske, M., Hacid, M.-S., and Godart, C. (eds.) Web Information Systems Engi- neering – WISE 2007 Workshops. pp. 260–271. Springer Berlin Heidelberg (2007). 16. Born, M., Hoffmann, J., Kaczmarek, T., Kowalkiewicz, M., Markovic, I., Scicluna, J., We- ber, I., Zhou, X.: Semantic Annotation and Composition of Business Processes with Maestro. In: Bechhofer, S., Hauswirth, M., Hoffmann, J., and Koubarakis, M. (eds.) The Semantic Web: Research and Applications. pp. 772–776. Springer Berlin Heidelberg (2008). 17. Born, M., Hoffmann, J., Kaczmarek, T., Kowalkiewicz, M., Markovic, I., Scicluna, J., We- ber, I., Zhou, X.: Supporting Execution-Level Business Process Modeling with Semantic Technologies. In: Zhou, X., Yokota, H., Deng, K., and Liu, Q. (eds.) Database Systems for Advanced Applications. pp. 759–763. Springer Berlin Heidelberg (2009). 18. Fellmann, M.: Towards Automated Process Model Annotation with Activity Taxonomies: Use Cases and State of the Art. In: Abramowicz, W. (ed.) Business Information Systems. pp. 74–90. Springer, Cham (2017). 19. Leopold, H., Meilicke, C., Fellmann, M., Pittke, F., Stuckenschmidt, H., Mendling, J.: To- wards the Automated Annotation of Process Models. In: Zdravkovic, J., Kirikova, M., and Johannesson, P. (eds.) Advanced Information Systems Engineering. pp. 401–416. Springer International Publishing (2015). 20. Fellmann, M., Thomas, O.: Process Model Verification with SemQuu. In: Nüttgens, M., Thomas, O., and Weber, B. (eds.) Enterprise Modelling and Information Systems Architec- tures (EMISA 2011), Hamburg, Germany. pp. 231–236. Köllen, Bonn (2011). Appendix Selected results from the implementation (Fig. 5-7). Annotation quality (precision, y- axis) is shown in relation to various parameter values (x-axis). Direct Hit means correct annotation in regard to the gold standard. Hierarchy means correct annotation at the process group level (level 2) of the Process Classification Framework. Fig. 5. Results for Synonym Matching Fig. 6. Results for keyword Matching Fig. 7. Results for Pre-Successor Fig. 8. Demonstration of the affinity of process models of the gold standard to distinct cate- gories on the process group level of the Process Classification Framework. The models are la- beled by ascending numbers on the leftmost column. The gold standard is used to derive the label for the process group in the PCF (for the sake of brevity, this level is called “hierarchy” in the figures). Model 1 for instance shows strong affiliation to hierarchy 2.1 and 2.2. Fig. 9. keyword example Fig. 10. Synonym example