<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Sixth Workshop on Natural Language for Artificial Intelligence, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Process Extraction from Natural Language Text: the PET Dataset and Annotation Guidelines</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrizio Bellan</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chiara Ghidini</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mauro Dragoni</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simone Paolo Ponzetto</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Han van der Aa</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Fondazione Bruno Kessler</institution>
          ,
          <addr-line>via sommarive, 18, Povo (Tn)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Free University of Bozen-Bolzano</institution>
          ,
          <addr-line>Bolzano (Bz)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Mannheim</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>30</volume>
      <issue>2022</issue>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Although there is a long tradition of work in NLP on extracting entities and relations from text, to date there exists very little work on the acquisition of business processes from unstructured data such as textual corpora of process descriptions. With this work, we aim to fill this gap and establish the first steps towards bridging data-driven information extraction methodologies from Natural Language Processing and the model-based formalization aimed at Business Process Management. For this, we develop the first corpus of business process descriptions annotated with activities, gateways, actors, and flow information. We present our new resource, including a detailed overview of the annotation schema and guidelines, as well as a variety of baselines to benchmark the dificulty and challenges of business process extraction from text.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Process Extraction from Text</kwd>
        <kwd>Business Process Management</kwd>
        <kwd>Information Extraction</kwd>
        <kwd>Annotation Schema</kwd>
        <kwd>Annotation Guidelines</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Information Extraction (IE), a key area of research focused on extracting structured
representations from unstructured text, has a long-standing tradition in Natural Language Processing
(NLP), from seminal contribution in the context of the Message Understanding Conference using
ifnite-state techniques [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] all the way through current neural approaches to document-level
relation extraction [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Despite this large volume of work, historically, most of the focus has
concentrated on standard newswire text1. Moreover, most successful approaches are rather
schema-weak, an approach epitomized by a very successful line of research such as Open
Information Extraction [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. In this work, we propose the first steps towards shifting some
of this focus in IE towards a new domain and task. Specifically, we focus on the problem of
extracting a Business Process Model from textual content – which can, in turn, be viewed
as the problem of extracting activities and workflow elements from process descriptions that
can be represented by adopting Business Process Management and Notation or compiled into
Petri Nets [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. But while there has been in recent years growing interest from the Business
Process Management (BPM) community in the extraction of processes from text [
        <xref ref-type="bibr" rid="ref10 ref11 ref9">9, 10, 11</xref>
        ],
current work has major limitations arguably due to the limited availability of domain-specific,
human-annotated gold-standard data that could be used to train from scratch or fine-tune
datadriven methods, and which are essential to enable task-specific comparison across competing
approaches [
        <xref ref-type="bibr" rid="ref12">12, 13</xref>
        ]. Creating benchmarks from text, however, is at the heart of much work
the NLP community – cf. the long-standing tradition of SENSEVAL and SemEval evaluation
campaigns in computational semantics: despite the major limitations shown by current
‘leaderbordism’ [14], the availability of reference gold-standard dataset has the potential of fostering
the application of NLP techniques to other fields, such as for instance BPM, and crucially make
clear what the applicability and limitations of state-of-the-art approaches for the domain of
interest are.
      </p>
      <p>In [15] we have presented a new annotated dataset, called PET, of human-annotated processes
in a corpus of process descriptions. In this paper we present this resource more in detail, by
providing insights to the problem of Process Annotation from Text (Section 2), to the annotation
guidelines (Section 3), the PET dataset itself (Section 4), and baseline results for information
extraction tasks for process model extraction (Section 5).</p>
      <p>Our vision builds upon bringing together heterogeneous communities such as NLP and
BPM practitioners by defining shared tasks and resources (cf. previous work from [ 16] at the
intersection of NLP and political science). All resources described in this paper are freely
available for the research community at huggingface.co/datasets/patriziobellan/PET.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Problem Background</title>
      <p>The extraction of a process model from documents is a complex task since the analysis of the
natural language description of a process has to take care of the multiple linguistics levels
(syntactical, semantics, and pragmatics) and mitigate linguistics phenomena such as syntactic
leeway, simultaneously. Moreover, it has to handle the multiple possible interpretations (in
forms of process model) that can be inferred from the same text, because the same semantic
can be conveyed in multiple ways, and maybe not always equivalent. For example, there are
diferent ways to represent, in a business process diagram, a repeated event or activity, but
maybe the case that only one of these possible interpretations is the correct one to represent in
the formal model.</p>
      <p>Figure 1 presents an example of the process extraction task. The gray shadow boxes link
each process description sentence on the left of the figure that describes a process element to
its corresponding process element in the process diagram, represented in BPMN, on the right
part of the figure. Here, the ninth sentence can be represented in diferent ways than the one
reported in the diagram. It is possible to represent the same semantic, for example, adopting a
sub-process element (in the case we need to re-use this part of the diagram somewhere else), or
as a multi-instance activity (either parallel or sequential).</p>
      <p>In general, the first task to perform when analyzing a process description regards filtering
uninformative sentences of the process description out, because not all the sentences represent
process elements. Then, Actions, Actors, Events, Gateways, Artifacts, and various types of
process flows can be extracted. However, not only each sentence can describe multiple process
elements, but also each word can have multiple meanings. Determining the correct intended
meaning and mapping it into the corresponding process element implies considering these
two aspects at once. Finally, the process elements discovered have to be logically organized
following the semantics conveyed in the process description. So, defining the logical succession
of process model elements is another challenge to tackle.</p>
      <p>Figure 2 shows an abstract level of the process extraction from natural language text task that
is conceptualized as an algorithmic function f that aims to “map” a natural language process
description into its process model. In the figure, the process description is represented with
the blue document icon on the left, and the process model generated from the function f, is
represented on the right as a BPMN diagram.
r
e
tsoum nFeleigdhetd</p>
      <p>C
Organisational
y
c
n
e
g
A
l
ve request
raT received</p>
      <p>Start event</p>
      <p>Check travel
agency website</p>
      <p>Check flight</p>
      <p>offer
Flight request</p>
      <p>Make
flight offer</p>
      <p>Activity</p>
      <p>Message
event
offer received</p>
      <p>Data object
Flight offer</p>
      <p>Event-based
gateway</p>
      <p>Exclusive
gateway
Flight offer
[paid]</p>
      <p>Booking and
payment received</p>
      <p>Book and pay
flight</p>
      <p>Reject offer</p>
      <p>Offer rejected
Flight offer Ticket received
[rejected]
Data object
with state</p>
      <p>Prepare ticket</p>
      <p>Flight paid
End
event</p>
      <p>Flight
organised</p>
      <p>Ticket
Offer rejection received Offer cancelled</p>
    </sec>
    <sec id="sec-3">
      <title>3. Annotation Guidelines</title>
      <p>In this Section, we introduce 2 the annotation guidelines we defined to create the dataset. The
annotation of a document describing a process is a dificult task. Being able to identify process
elements requires having at least a rough understanding of the typical elements contained in
process modeling languages. In terms of languages, we target a procedural language such as
BPMN, although the guidelines may also apply to other standard procedural modeling languages
such as UML Activity Diagram. To provide an overview of the graphical language and of the
type of elements it typically contains, please refer to the diagram in Figures 3, taken from [18],
which provides a model of a customer buying a flight ticket from a travel agency. Besides
illustrating the scenario, the diagram is “annotated” with speech balloons indicating the type of
entity denoted by the graphical constructs. Following the classification made in [ 18], we can
group these constructs into three macro categories:
1. Behavioral. These elements are the ones that refer to the so-called control flow of
the process, that is the flow determined by the set of activities that are performed in
coordination. This category is the most articulated in a business process and contains at
least 3 types of objects:
• activites and events, that is the things that happen in time3. In our example the
activity check the flight offer or the event payment received.
• flow objects , that is constructs that enable the routing of the flow between the
activities such as the sequence relation between activities, or the gateways that
enable the routing of the flow. In our example the (precedence) relation between
2The reader may ifnd the complete annotation guidelines document at pdi.fbk.eu/pet/
annotation-guidelines-for-process-description.pdf</p>
      <p>3While these elements often have a diferent meaning in some modeling languages we do not distinguish
between them here
make flight offer and check flight offer or the (mutually) exclusive
gateway between reject offer and book and pay flight, and finally
• states, that is conditions of the world that afect the flow in the process such as the
pre-post conditions for the occurrence of an activity or a guard on a gateway. In
our example, the (un)satisfied status of the customer w.r.t. a flight ofer.
2. Data object. These elements usually describe, at a high level of abstraction, the objects
upon which an activity acts. Examples in the scenario above are the flight request
and the flight ticket. Note that sometimes these data objects complement the activity
itself (as in the case of the data object flight request, which is produced by the activity
check travel agency website while in other cases they are implicitly described in
the activity itself, as in the case of flight ticket with the activity prepare ticket.</p>
      <p>In this latter case, the data object is often left implicit.
3. Organizational. These elements are usually related to the who question, and often
describe, at a high level of abstraction, the roles / organizational structures involved in
the activities of the process.</p>
      <p>It is important to highlight that Data and Organizational objects do not exist per-se in a
business process diagram but they usually refer to the activities. More formally, they are
participants of the activity as they participate in the activity itself.</p>
      <p>We aim at proposing a general annotation schema able to deal with unknown scenarios. Note
that, while inspired by [18], the conceptual layers described in that paper slightly difer from the
Annotation schema proposed in this document. This was done to increase the flexibility of the
annotation schema to capture the diferent ways in which a process element can be described.</p>
      <p>As a crucial example, we decided to break down activity to diferentiate among the activity
elements it is composed of. In particular, we capture the activity “action” expression and the
object the activity acts on in two diferent annotation layers. This choice allows to easier
the annotation workload and it also reduces the possibility of making errors (for example,
connecting with a Sequence Flow relation the activity data to the actor responsible for the
execution of the activity). For instance, we diferentiate the expression describing an Activity to
the object the activity uses. The overall goal is to annotate process model elements and their
relations in documents.</p>
      <p>We implemented the annotation schema described in this document in the Inception
Annotation tool (inception-project.github.io). The schema can be downloaded from pdi.fbk.eu/pet/
inception-schema.json.</p>
      <sec id="sec-3-1">
        <title>3.1. Layers Overview</title>
        <p>Here, we explore the process elements we considered in the proposed dataset and their relations.
Figure 4 provides an overview of the diferent layers and shows their relations.
Behavioral Layer The Behavioral layer captures information about the behavioral elements
described and their relations. Figure 4 shows the relations between the Behavioral layer and the
other ones. The Behavioral layer is the core layer since it captures activities, gateways, branch
conditions, and flow relations. An activity element represents a single task performed within a
process model. A gateway element represents a decision point, and the condition specification
represents the condition that a process execution instance must satisfy to be allowed to enter
a specific branch of a gateway. A Flow is a relation that defines the process model logic by
connecting all the elements that belong to this layer together.</p>
        <p>The Behavioral layer is composed of six features: Element Type, Uses, Flow, Roles, Further
Specification , Same Gateway. Since this layer captures both Activity and Gateways, not all the
features are always required, but they depend on the Element Type and the situation described
in the text. For example, if a text does not describe any Actor Performer the feature Roles is left
empty.</p>
        <p>The feature Element Type defines the type of the process model elements marked as Activity,
or AND Gateway, or XOR Gateway, Condition Specification . This layer is connected to the layer
Activity Data by the Uses relation. This feature links activity to the Activity Data annotated in
the layer Activity Data. Hence, this relation allows connecting an activity expression (either
verbal or nominal) with the object the activity acts on. Process participants (actors involved
in an activity) that are captured in the Organizational layer, are bound to activity through the
feature Roles. Here we diferentiate between Actor Performer relation that links the actor who
is responsible for an activity execution to the activity, to the Actor Recipient relation that links
the actor who receives the results of the execution of an activity. The Further Specification
feature allows connecting activity to its important details (captured in the Further Specification
layer). The Further Specification layer captures the important information about an activity
that is not captured by the other layers, such as the mean, the manner of execution, or how an
activity is executed. The Same Gateway feature allows connecting all the parts describing the
same gateway together, since its description may span over multiple sentences. This means
that only gateway elements can be connected by this relation. The Behavioral layer makes a
connection to itself through the relation Flow. This feature allows for defining the process
model logic by connecting behavioral elements in sequential order.</p>
        <p>Activity Data Layer The Activity Data layer captures the object of an activity expression
acts on.</p>
        <p>Further Specification The Further Specification layer captures important details of an
Activity, such as the mean or the manner of its execution.</p>
        <p>Organizational Layer The Organizational layer is meant to annotate at a high level of
abstraction the process participants that are responsible for activities. They typically represent
the Actors involved in a process.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Examples</title>
        <p>We conclude this section by showing some annotations in a text. We start with the annotation
of activities.</p>
        <p>The sentence
“The office sends the forms to the customer by email”
is annotated as follows: the activity sends Uses the activity data the forms; the actor The ofice
is the Actor Performer of sends while the actor the customer is its Actor Recipient; the further
specification details by email is link to the sends via Further Specification relation.</p>
        <p>A diferent situation concerns the annotation of gateways. Besides the well-defined theoretical
definitions of Gateways, the annotation of gateways is challenging for two main reasons. First,
the description of a gateway typically spans on multiple text pieces and/or on multiple sentences.
To deal with this challenge, we use the Same Gateway relation to reconstruct the gateway.
Consider the following example:</p>
        <p>“If an error is detected another arbitrary repair activity is
executed, otherwise the repair is finished.""</p>
        <p>Here, the gateway description span over two sentence pieces: If and otherwise. We reconstruct
the gateway object by connecting If to otherwise using the Same Gateway relation. The second
reason concerns the lack of an explicit description of merging points in texts. To deal with this
challenge, we decided to capture the annotation of a merging point using Flows relation. We
connect the ending point of each branch of a gateway to the next common behavioral element.
As in:</p>
        <p>“The ongoing repair consists of two activities. The first
activity is to check the hardware, whereas the second activity checks
the software. Then, the CRS test the system functionality.""</p>
        <p>Here, the word whereas describes an AND Gateway with two branches: (i) check the hardware
and (ii) check the software. The next common element (where the process flow goes through)
is test the system functionality. To create a merging point at the end of the gateway, we
connect check the hardware to test the system functionality, and checkthe software to test
the system functionality.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. The Dataset</title>
      <p>The guidelines described in Section 3 were used for the creation of the PET dataset described
in this Section and available at huggingface.co/datasets/patriziobellan/PET. The creation of the
ifrst version of the PET dataset started from the Friedrich dataset: a set of 47 textual documents
preliminary exploited within the BPMN community to start the investigation of the process
extraction from natural language text research topic. The reader may find the introduction to
the raw version of the textual documents used for building the PET in [19].</p>
      <p>The reasons for which we started from this set of documents are two-fold. First, these
documents are well-known within the community. This aspect allows to give continuity to the
investigation in this research area as well as to start from a base set of documents that are in
line with the type of process narratives considered relevant by the community. Second, the
documents contained in the dataset are not explicitly annotated with the elements described
in Section 3. Indeed, the dataset described in [19] contains only the raw text and a possible
corresponding BPMN diagram. However, many of these diagrams were translated by the same
authors from other process modeling languages into BPMN without any validation performed
by experts. Therefore, the diagrams should not be taken as gold-standard reference. As a
consequence, they can not be used to mark process elements in the process descriptions. Hence,
the whole work of text processing and elements annotation has to be provided.</p>
      <p>The dataset construction process has been split into five main phases:
1. Text pre-processing. As the first operation, we check the content of each document
and tokenized it. The initial check activity was necessary since some of the original texts
were automatically translated into English by the authors of the dataset. The translations
were never validated, indeed, several errors have been found and fixed.
2. Text Annotation. Each text has been annotated by using the guidelines introduced in 3.</p>
      <p>The team was composed of five annotators with high expertise in BPMN. Each document
has been assigned to three experts that were in charge of identifying all the elements and
lfows with each document. Within this phase, each annotator has been supported by the
Inception tool integrated with the annotation schema available on the dataset web page.
3. Automatic annotation fixing. At the end of the second phase, we run an automatic
procedure relying on a rules-based script to automatically fix annotations that were not
compliant with the guidelines. For example, if a modal verb was erroneously included in
the annotation of an Activity, the procedure removed it from the annotation. Another
example is the missing article within an annotation related to an Actor. In this case, the
script included it in the annotation. This phase allowed us to remove possible annotation
errors and to obtain annotations compliant with the guidelines.
4. Agreement Computation. Here, we computed, on the annotation provided by the
experts, the agreement scores for each process element and each relation between
process elements pair adopting the methodology proposed in [20].4 By following such a
methodology, an annotation was considered in agreement among the experts if and only
if they capture the same span of words and they assign the same process element tag to
the annotation. In the same way, a relation was considered in agreement if and only if the
experts strictly annotated the same span of words representing (i) the process element
related to the source element; (ii) the process element related to the target element; and,
(iii) the relation tag between source and target. The only exception regards the same
gateway relation in which source and target are interchangeable since in this type of
relation the relation arrow does not matter. The final agreement scores were obtained by
averaging the individual scores obtained by the comparison of annotators pairs. Tables 2
and 3 show the annotation agreement computed for each process element and each
process relation, respectively. We can observe how, in general, experts agreed concerning
the main elements and flows contained within a process description. On the contrary, the
annotation of information classified as Further Specification led to several disagreement
situations. Such situations were analyzed and mitigated within the next phase.
5. Reconciliation. The last phase consisted of the mitigation of the disagreements within
the annotations provided by the experts. This phase aims to obtain a shared and agreed
set of gold annotations on each text for both entities and relations. Such entities enable, as
well, the generation of the related full-connected process model flow that can be rendered
by using, but not limited to, a BPMN diagram. During this last phase, among the 47
documents originally included in the dataset, 2 of them were discarded. Such texts were
not fully annotated by the annotators since they were not able to completely understand
which process elements were included in some specific parts of the text. For this reason,
the final size of the dataset is 45 textual descriptions of the corresponding process models
together with their annotations. We report in Table 1 the statistics related to the current
version of the document, in Tables 4 the detailed statistic about process elements, and in
Table 5 the detailed statistic about process elements relations.</p>
      <p>We loaded the dataset to the Hugginface repository. We created two task cards for our dataset:
(i) token classification that aims to predict process elements described in texts, and (ii) relation
extraction that aims at classifying the relation between two process elements.</p>
      <p>4We measured the agreement in terms of the F1 measure because, besides being straightforward to calculate, it
is directly interpretable. Note that chance-corrected measures like  approach the F1-measure as the number of
cases that raters agree are negative grows [20].</p>
      <sec id="sec-4-1">
        <title>Annotators</title>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Baselines Results</title>
      <p>We present in this Section three baselines we developed to provide preliminary results obtained
on the dataset and also to show how the dataset can be used for testing diferent extraction
approaches. Indeed, as described in Section 4, there are diferent types of elements that can be
extracted (e.g., activities, actors, relations) and diferent assumptions that can be made (e.g., the
exploitation of gold information or the process of the raw text).</p>
      <p>From this perspective, we tested our baselines under three diferent settings and by using
two diferent families of approaches: Conditional Random Fields (CRF) and Rule-Based (RB):
• Baseline 1 (B1): by starting from the raw text (i.e., no information related to process
elements or relations has been used), a CRF-based approach has been used for building a</p>
      <sec id="sec-5-1">
        <title>Further</title>
      </sec>
      <sec id="sec-5-2">
        <title>Specification XOR</title>
      </sec>
      <sec id="sec-5-3">
        <title>Gateway AND</title>
      </sec>
      <sec id="sec-5-4">
        <title>Gateway</title>
      </sec>
      <sec id="sec-5-5">
        <title>Condition Specification</title>
        <p>model to support the extraction of single entities (e.g., activities, actors).
• Baseline 2 (B2): by starting from the existing gold information concerning the annotation
of process elements, an RB strategy has been used for detecting relations between entities.
• Baseline 3 (B3): this baseline relies on the output of B1 concerning the annotations of
process elements. Then, the RB strategy has been used for detecting relations between
entities.</p>
        <p>Concerning the CRF approach, we adopted the CRF model described in [21] by encoding data
following the IOB2 schema.</p>
        <p>Results have been obtained by performing 5-folds cross-validation and by averaging observed
performance.</p>
        <p>While, concerning the RB approach, we defined a set of rules taking into account the text
position of process elements. The rules defined are the following:
1. Rule 1 (R1): (sequence flows ) are annotated by connecting two consecutive behavioral
process elements.
2. Rule 2 (R2): (same gateway) relations are annotated by connecting two gateway of the same
type if they are detected in the same sentence or if they are detected in two consecutive
sentences.
3. Rule 3 (R3): (sequence flows ) relations are annotated between each gateway that is not
part of any same gateway relation and the next activity detected.
4. Rule 4 (R4): for each activity defined in a sentence, ( actor performer/recipient) relations
are annotated by linking the left-side closest actor as actor performer and the right-side
closest actor as actor recipient.
5. Rule 5 (R5): (further specification ) annotations are defined by connecting each further
specification element to the closest activity in the text.
6. Rule 6 (R6): (uses) annotations are defined by connecting activity data elements to the
closest left-side activity of the same sentence. If no activities are defined on the left side,
the right side is considered.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we presented the PET dataset. The dataset contains 45 documents containing
narrative descriptions of business processes and their annotations. Together with the dataset,
we provided the set of guidelines we defined and adopted for annotating all documents. The
dataset-building procedure has been described and, for completeness, we provided three
baselines implementing straightforward approaches to give a starting point for designing the next
generation of process extraction from natural language text approaches.
the AIxIA 2020 Discussion Papers Workshop co-located with the the 19th International
Conference of the Italian Association for Artificial Intelligence (AIxIA2020), Anywhere,
November 27th, 2020, volume 2776 of CEUR Workshop Proceedings, CEUR-WS.org, 2020,
pp. 19–30.
[13] P. Bellan, Process extraction from natural language text, in: W. M. P. van der Aalst,
J. vom Brocke, M. Comuzzi, C. D. Ciccio, F. García, A. Kumar, J. Mendling, B. T. Pentland,
L. Pufahl, M. Reichert, M. Weske (Eds.), Proceedings of the Best Dissertation Award,
Doctoral Consortium, and Demonstration &amp; Resources Track at BPM 2020 co-located with
the 18th International Conference on Business Process Management (BPM 2020), Sevilla,
Spain, September 13-18, 2020, volume 2673 of CEUR Workshop Proceedings, CEUR-WS.org,
2020, pp. 53–60.
[14] K. Ethayarajh, D. Jurafsky, Utility is in the eye of the user: A critique of NLP leaderboards,
in: B. Webber, T. Cohn, Y. He, Y. Liu (Eds.), Proceedings of the 2020 Conference on Empirical
Methods in Natural Language Processing, EMNLP 2020, Online, November 16-20, 2020,
Association for Computational Linguistics, 2020, pp. 4846–4853.
[15] P. Bellan, H. van der Aa, M. Dragoni, C. Ghidini, S. P. Ponzetto, PET: An annotated
dataset for process extraction from natural language text tasks, in: Proceedings of the 1st
Workshop on Natural Language Processing for Business Process Management (NLP4BPM),
2022. To appear. Also available at https://arxiv.org/abs/2203.04860.
[16] F. Nanni, G. Glavaš, S. P. Ponzetto, S. Tonelli, N. Conti, A. Aker, A. P. Aprosio, A. Bleier,
B. Carlotti, T. Gessler, T. Henrichsen, D. Hovy, C. Kahmann, M. Karan, A. Matsuo, S. Menini,
D. Nguyen, A. Niekler, L. Posch, F. Vegetti, Z. Waseem, T. Whyte, N. Yordanova, Findings
from the hackathon on understanding euroscepticism through the lens of textual data,
in: D. Fišer, M. Eskevich, F. de Jong (Eds.), Proceedings of the Eleventh International
Conference on Language Resources and Evaluation (LREC 2018), European Language
Resources Association (ELRA), Paris, France, 2018.
[17] H. van der Aa, H. Leopold, H. A. Reijers, Detecting inconsistencies between process models
and textual descriptions, in: H. R. Motahari-Nezhad, J. Recker, M. Weidlich (Eds.), Business
Process Management - 13th International Conference, BPM 2015, Innsbruck, Austria,
August 31 - September 3, 2015, Proceedings, volume 9253 of Lecture Notes in Computer
Science, Springer, 2015, pp. 90–105. URL: https://doi.org/10.1007/978-3-319-23063-4_6.
doi:10.1007/978-3-319-23063-4\_6.
[18] G. Adamo, S. Borgo, C. Di Francescomarino, C. Ghidini, N. Guarino, E. M. Sanfilippo,
Business processes and their participants: An ontological perspective, in: Proceedings
of the 16th International Conference of the Italian Association for Artificial Intelligence
(AI*IA 2017), volume 10640 of Lecture Notes in Computer Science, Springer International
Publishing, 2017, pp. 215–228.
[19] F. Friedrich, Automated generation of business process models from natural language
input, M. Sc., School of Business and Economics. Humboldt-Universität zu Berli (2010).
[20] G. Hripcsak, A. S. Rothschild, Technical brief: Agreement, the f-measure, and reliability in
information retrieval, J. Am. Medical Informatics Assoc. 12 (2005) 296–298.
[21] N. Okazaki, Crfsuite: a fast implementation of conditional random fields (crfs), 2007.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Nozza</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Passaro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Polignano</surname>
          </string-name>
          , Preface to the
          <source>Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI)</source>
          , in: D.
          <string-name>
            <surname>Nozza</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          <string-name>
            <surname>Passaro</surname>
          </string-name>
          , M. Polignano (Eds.),
          <source>Proceedings of the Sixth Workshop on Natural Language for Artificial Intelligence (NL4AI</source>
          <year>2022</year>
          )
          <article-title>co-located with 21th International Conference of the Italian Association for Artificial Intelligence (AI*IA</article-title>
          <year>2022</year>
          ), November 30,
          <year>2022</year>
          , CEUR-WS.org,
          <year>2022</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Hobbs</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. E.</given-names>
            <surname>Stickel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Appelt</surname>
          </string-name>
          , P. Martin,
          <article-title>Interpretation as abduction</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>63</volume>
          (
          <year>1993</year>
          )
          <fpage>69</fpage>
          -
          <lpage>142</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Yao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Han</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Huang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Zhou</surname>
          </string-name>
          , M. Sun,
          <article-title>DocRED: A large-scale document-level relation extraction dataset, in: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics</article-title>
          , Florence, Italy,
          <year>2019</year>
          , pp.
          <fpage>764</fpage>
          -
          <lpage>777</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Cimiano</surname>
          </string-name>
          ,
          <article-title>Ontology learning and population from text - algorithms, evaluation</article-title>
          and applications, Springer,
          <year>2006</year>
          . URL: https://doi.org/10.1007/978-0-
          <fpage>387</fpage>
          -39252-3. doi:
          <volume>10</volume>
          .1007/ 978-0-
          <fpage>387</fpage>
          -39252-3.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>G.</given-names>
            <surname>Petrucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Rospocher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <article-title>Expressive ontology learning as neural machine translation</article-title>
          ,
          <source>J. Web Semant</source>
          .
          <fpage>52</fpage>
          -
          <lpage>53</lpage>
          (
          <year>2018</year>
          )
          <fpage>66</fpage>
          -
          <lpage>82</lpage>
          . URL: https://doi.org/10.1016/j.websem.
          <year>2018</year>
          .
          <volume>10</volume>
          .002. doi:
          <volume>10</volume>
          .1016/j.websem.
          <year>2018</year>
          .
          <volume>10</volume>
          .002.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>M.</given-names>
            <surname>Banko</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Cafarella</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Soderland</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Broadhead</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Etzioni</surname>
          </string-name>
          ,
          <article-title>Open information extraction from the web</article-title>
          ,
          <source>in: Proceedings of the 20th International Joint Conference on Artifical Intelligence</source>
          , IJCAI'
          <fpage>07</fpage>
          , Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <year>2007</year>
          , p.
          <fpage>2670</fpage>
          -
          <lpage>2676</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>L.</given-names>
            <surname>Cui</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Neural open information extraction</article-title>
          ,
          <source>in: Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume</source>
          <volume>2</volume>
          :
          <string-name>
            <surname>Short</surname>
            <given-names>Papers)</given-names>
          </string-name>
          ,
          <source>Association for Computational Linguistics</source>
          , Melbourne, Australia,
          <year>2018</year>
          , pp.
          <fpage>407</fpage>
          -
          <lpage>413</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>W. M.</given-names>
            <surname>Aalst</surname>
          </string-name>
          ,
          <article-title>Business process management as the "killer app" for petri nets</article-title>
          ,
          <source>Softw. Syst. Model</source>
          .
          <volume>14</volume>
          (
          <year>2015</year>
          )
          <fpage>685</fpage>
          -
          <lpage>691</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>F.</given-names>
            <surname>Friedrich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Mendling</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Puhlmann</surname>
          </string-name>
          ,
          <article-title>Process model generation from natural language text</article-title>
          , in: H.
          <string-name>
            <surname>Mouratidis</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          Rolland (Eds.),
          <source>Advanced Information Systems</source>
          Engineering - 23rd International Conference, CAiSE
          <year>2011</year>
          , London, UK, June 20-24,
          <year>2011</year>
          . Proceedings, volume
          <volume>6741</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2011</year>
          , pp.
          <fpage>482</fpage>
          -
          <lpage>496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>H. van der</given-names>
            <surname>Aa</surname>
          </string-name>
          , C. D.
          <string-name>
            <surname>Ciccio</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Leopold</surname>
            ,
            <given-names>H. A.</given-names>
          </string-name>
          <string-name>
            <surname>Reijers</surname>
          </string-name>
          ,
          <article-title>Extracting declarative process models from natural language</article-title>
          , in: P.
          <string-name>
            <surname>Giorgini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          Weber (Eds.),
          <source>Advanced Information Systems</source>
          Engineering - 31st International Conference, CAiSE
          <year>2019</year>
          , Rome, Italy, June 3-7,
          <year>2019</year>
          , Proceedings, volume
          <volume>11483</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2019</year>
          , pp.
          <fpage>365</fpage>
          -
          <lpage>382</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Qian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Wen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kumar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Lin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zong</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>An approach for process model extraction by multi-grained text classification</article-title>
          , in: S.
          <string-name>
            <surname>Dustdar</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Salinesi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Rieu</surname>
          </string-name>
          , V. Pant (Eds.),
          <source>Advanced Information Systems</source>
          Engineering - 32nd International Conference, CAiSE
          <year>2020</year>
          , Grenoble, France, June 8-12,
          <year>2020</year>
          , Proceedings, volume
          <volume>12127</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2020</year>
          , pp.
          <fpage>268</fpage>
          -
          <lpage>282</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>P.</given-names>
            <surname>Bellan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Ghidini</surname>
          </string-name>
          ,
          <article-title>A qualitative analysis of the state of the art in process extraction from text</article-title>
          , in: G. Vizzari,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Orlandini (Eds.),
          <source>Proceedings of</source>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>