=Paper= {{Paper |id=Vol-3730/paper16 |storemode=property |title=Extracting Data from Unstructured Crime Text to Represent in Structured Occurrence Nets using Natural Language Processing |pdfUrl=https://ceur-ws.org/Vol-3730/paper16.pdf |volume=Vol-3730 |authors=Tuwailaa Alshammari |dblpUrl=https://dblp.org/rec/conf/apn/Alshammari24 }} ==Extracting Data from Unstructured Crime Text to Represent in Structured Occurrence Nets using Natural Language Processing== https://ceur-ws.org/Vol-3730/paper16.pdf
                                Extracting Data from Unstructured Crime Text to
                                Represent in Structured Occurrence Nets using
                                Natural Language Processing
                                Tuwailaa Alshammari1
                                1 School of Computing, Newcastle University, Science Square, Newcastle upon Tyne, NE4 5TG, United Kingdom



                                                                      Abstract
                                                                      Structured occurrence nets (SONs) are a Petri net-based formalism for representing the behaviour of
                                                                      complex systems, capturing concurrent events and interactions between subsystems. SONs can be used in
                                                                      modelling different applications such as accidents, crime and cybercrime investigation. Using graphical
                                                                      representations can greatly benefit investigators by revealing the causal relationships within crime events.
                                                                      However, identifying crime-related information that allows such modelling from unstructured resources is a
                                                                      challenging task. This paper proposes integrating SONs with Natural Language Processing (NLP) to extract
                                                                      and model crime events accurately. This is done by creating a new custom Named Entity Recognition (NER)
                                                                      model to identify additional crime-related entities like weapons, location, and transportation. Furthermore,
                                                                      we propose a syntactic pattern-based approach for verb identification aimed at generating more precise
                                                                      event results. This method relies on analysing manually created SON crime-related models extracted from
                                                                      crime documents. Our goal is to identify Acyclic Nets (ANs) and construct patterns that facilitate event
                                                                      extraction. The NER model has shown acceptable performance, achieving a precision rate of 83.95%, a
                                                                      recall of 87.45%, and an F1-score of 84.84%, suggesting its effectiveness in NER tasks.

                                                                      Keywords
                                                                      structured occurrence net, structured acyclic nets, communication structured acyclic net, natural language
                                                                      processing, event extraction, crime visualisation




                                1. Introduction
                                Structured occurrence Nets (SONs) [1, 2] are a Petri net-based formalism for representing the
                                behaviour of complex systems consisting of subsystems that proceed concurrently and interact
                                with each other. SONs extend the concept of an occurrence net, which represents a single ‘causal
                                history’ and provides a full and unambiguous record of all causal dependencies between its
                                constituent events. In recent years, there has been a research focus on analysing complex systems
                                like cybercrime using SON modelling, with [3] being a notable example.
                                   An extension of SONs is communication structured acyclic nets (CSA-nets) [4] which are based
                                on acyclic nets (ANs) rather than occurrence nets (ONs). A CSA-net joins together two or more
                                AN s by employing buffer places to connect pairs of events from different AN s. Connections of
                                this kind can be either synchronous or asynchronous. When communication is synchronous,
                                events are performed simultaneously. In asynchronous communication, events can be performed

                                PNSE’24, International Workshop on Petri Nets and Software Engineering, 2024
                                " t.t.t.alshammari2@ncl.ac.uk (T. Alshammari)
                                                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                 CEUR
                                 Workshop
                                 Proceedings
                                               http://ceur-ws.org
                                               ISSN 1613-0073       CEUR Workshop Proceedings (CEUR-WS.org)



                                                                                                                                       270




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Tuwailaa Alshammari CEUR Workshop Proceedings                                            270–282


either concurrently or sequentially.
   It can be challenging for investigators to comprehend a crime and make decisions based on the
massive amounts of information acquired during criminal investigations. In crime investigations,
sources like police reports and witness statements are used to gather relevant information for
analysis. Analysis of such unstructured data sources and extraction of crime events can be
facilitated by the use of Natural Language Processing (NLP) techniques. According to [6], NLP
originated in the 1950s. The rise of computers necessitated the development of human-machine
interaction to enable computers to understand human language through the manipulation and
analysis of human text and speech. NLP is a branch of artificial intelligence and linguistics that
focuses on teaching computers to recognise and understand texts written in human languages.
Currently, various applications such as machine translation, sentiment analysis, and chatbots use
NLP . As a result, several natural language processing tools and libraries have been introduced
in recent years, such as C ORE NLP, NLTK and SPAC Y. S PAC Y is an open-source natural
language processing toolkit designed to help developers implement natural language processing
annotations and tasks. It is a statistical model known for its ability to analyse text. SPAC Y
provides a range of essential linguistic functionalities, including Part-of-speech (POS) tagging,
dependency parsing and Named Entity Recognition (NER). NLP tools are invaluable for analysing
the textual representation of natural language. In this work, we extend [5] by integrating SONs
with SPAC Y to extract more accurate information for modelling and visualising criminal events
from unstructured textual sources. Investigators often depend on various sources including
police reports and witness statements. In this work, we analyse homicide stories [6] to extract
information to construct a model in SONs. To be more precise, we use occurrence nets for
modelling rather than more general acyclic nets.
   The remainder of this paper is organised as follows: Section 2 provides an overview of the
research background and related work. Section 3 introduces the basic definitions of acyclic nets
and CSA-nets. Section 4 discusses the preliminary rules for mapping text to SON components.
Section 4.1 explains the design, experimentation and analysis of the modelling process, presenting
both manual and potential automatic models derived from the automatic extraction of an example
story. Section 5 discusses the evaluation and testing results of the NER model. Finally, Section 6
concludes the paper and outlines future work.


2. Background and related work
The visual representation of crime events can be helpful for investigation and analysis. [7]
outlines the creation of a tool for criminal investigations that employs Twitter data to provide
contextual details about crime occurrences in a specific location. This system has been tested
as a prototype in the San Francisco region, and it provides a visual representation of criminal
incidents and related tweets in the area. This allows users to explore the tweets and crimes that
happened before and after a crime incident, as well as to obtain information about the spatial
and temporal characteristics of a crime through the internet. Furthermore, [8] highlights data
mining methods such as clustering that have been effective in extracting insights from publicly
accessible structured data sources such as the National Crime Records Bureau. Additionally,
the paper describes an approach for retrieving data from news media through web scraping, as



                                               271
Tuwailaa Alshammari CEUR Workshop Proceedings                                                    270–282


well as the fundamental NLP techniques for extracting information that is not accessible through
typical structured data sources.
   Authors in [9] presented WoPeD (Petri net editor and simulator), which has new capabilities
for combining Business Processing and natural language processing. WoPeD is an open-source
Java application that allows the creation of business processes using workflow nets. The paper
demonstrated algorithms for converting graphical process models to textual descriptions and vice
versa. Nonetheless, the tool encounters a prevalent problem of semantic ambiguity in NLP.
   SON have shown to be effective in accident, criminal and cybercrime investigations. For
instance, [10] demonstrated that modelling accident behaviours using SON can help investigators
comprehend how an accident occurred and trace the sequence of events leading up to its cause.
Similarly, [3] proposed the use of SON features to detect DNS tunnelling during a cyber attack. The
authors developed a unique method based on SONs for detecting DNS tunnellings and discussed
how the data was pre-processed and eventually translated to a SON. Previous work [5] combined
NLP and SONs to extract crime information suitable for modelling. The approach presented three
methods to identify events: identifying root verbs, root verbs in conjunction with common crime
verbs, and including all verbs to express events. Root verbs refer to the sentence’s main verb
which acts as the head or root of the sentence’s dependency tree. The paper concluded that the
most efficient of the three techniques is to combine root verbs with common verbs. However,
verbs were identified from crime reports based on their frequency of occurrence, rather than
according to the perspective of SON modellers. This paper builds upon the work presented in [5]
in two key ways. First, we introduce a set of named entity labels and a trained custom NER model
to identify crime entities more accurately. Second, we link verbs to related entities in the same
sentence using dependency parsing. Both are discussed in more detail in the following sections.


3. Preliminary
3.1. Acyclic nets and occurrence nets
An acyclic net [11] can be used as a ‘database’ of empirical facts (both actual and hypothetical
expressed using places, transitions, and arcs linking them) accumulated during an investigation.
Acyclic nets can represent alternative ways of interpreting what happened, and so can exhibit
(backward and forward) non-determinism. An example of acyclic net is occurrence nets, which
provides a full and unambiguous record of all causal dependencies between the events it involves.
An occurrence net represents a single ‘causal history’.
   Formally, an acyclic net is a triple acnet = (P, T, F) = (Pacnet , Tacnet , Facnet ), where P and T are
disjoint sets of places and transitions respectively, and F ⊆ (P × T ) ∪ (T × P) is a flow relation
such that F is acyclic, and, for every t ∈ T , there are p, q ∈ P such that pFt and tFq. Moreover,
acnet is an occurrence net if, for each place p ∈ P, there is exactly one t ∈ T such that tF p, and
exactly one u ∈ T such that pFu.
   An acyclic net is well-formed if for every step sequence starting from the default initial marking
(i.e., the set of places without incoming arcs), no transition occurs more than once, and the sets of
post-places of the transitions which have occurred are disjoint. Note that all occurrence nets are
well-formed.




                                                   272
Tuwailaa Alshammari CEUR Workshop Proceedings                                                   270–282


3.2. Communication structured acyclic nets
A communication structured acyclic net [11] consists of a number of disjoint acyclic nets which
can communicate through special (buffer) places. CSA-nets can exhibit backward and forward
non-determinism. They can contain cycles involving buffer places. Formally, a communication
structured acyclic net (or CSA-net) is a tuple csan = (acnet1 , . . . , acnetn , Q,W ) (n ≥ 1) such that:
   1. acnet1 , . . . , acnetn are well-formed acyclic nets with disjoint sets of nodes (i.e., places and
      transitions). We also denote:
                                     Pcsan = Pacnet1 ∪ · · · ∪ Pacnetn
                                     Tcsan = Tacnet1 ∪ · · · ∪ Tacnetn
                                     Fcsan = Facnet1 ∪ · · · ∪ Facnetn .

   2. Q is a set of buffer places and W ⊆ (Q × Tcsan ) ∪ (Tcsan × Q) is a set of arcs adjacent to the
      buffer places satisfying the following:
        a) Q ∩ (Pcsan ∪ Tcsan ) = ∅.
        b) For every buffer place q:
               i. There is at least one transition t such that tW q and at least one transition u such
                  that qWu.
              ii. If tW q and qWu then transitions t and u belong to different component acyclic
                  nets.
That is, in addition to requiring the disjointness of the component acyclic nets and the buffer
places, it is required that buffer places pass tokens between different component acyclic nets. In
the step semantics of CSA-nets, the role of the buffer places is special as they can ‘instantaneously’
pass tokens from transitions producing them to transitions needing them. In this way, cycles
involving only the buffer places and transitions do not stop steps from being executable.
   A CSA-net csan = (acnet1 , . . . , acnetn , Q,W ) is a communication structured occurrence net (or
CSO -net) if the following hold

   1. The component acyclic nets are occurrence nets.
   2. For every q ∈ Q, there is exactly one t ∈ Tcsan such that tW q, and exactly one u ∈ Tcsan
      such that qWu.
   3. No place in Pcsan belongs to a cycle in the graph of Fcsan ∪W .
That is, only cycles involving buffer places are allowed.
   All CSO-nets are well-formed in a sense similar to that of well-formed acyclic nets. As a result,
they support clear notions of, in particular, causality and concurrency between transitions.
   In this paper, we use occurrence nets rather than more general acyclic nets. However, this
will change in the future work when we move to the next stages of our work where alternative
statements in textual documents are taken into account.


4. Towards csa-net components extraction
This section outlines steps taken to extract useful information from textual resources for modelling
and visualising using CSA-nets. Such a visualisation can help investigators comprehend, e.g.,



                                                  273
Tuwailaa Alshammari CEUR Workshop Proceedings                                               270–282


the dynamics of crime incidents and the interactions among involved parties. Since CSA-nets
consist of interconnected acyclic nets, our approach involves first identifying individual acyclic
nets and their components, followed by the construction of communication links. Each acyclic
net (acnet) comprises places (circle nodes) and events/transitions (square nodes), representing
the progression of system execution from one state to another. SON models facilitate a visual
understanding of the behaviour of complex systems to enhance comprehension.
   An example of a complex system is crime investigation, which involves examining numerous
variables to assist investigators in decision-making. Extracting information usually involves
unstructured data, like written reports, and so presents significant challenges. Investigators often
depend on various sources including police reports and witness statements.
   CSA -nets provide a method for analysing these types of crimes by representing events and
their relationships revealing causal links between them. However, existing CSA-net approaches
lack the ability to automatically extract information from written sources and reports. To address
this, we leverage statistical Natural Language Processing models by integrating SPAC Y with
CSA -nets to automatically extract useful information from written sources and represent crime
events in CSA-nets. In order to enhance the accuracy and consistency of the extraction process,
the following rules have been established to guide the linking of natural language with SONs.
   □ ENTITIES in crime text are the representation of SON acyclic nets. An “entity” is a proper
     noun that refers to a distinct real-world object or entity, such as a person, organisation, or
     location.
   □ VERBS represents SON EVENTS / TRANSITIONS within acyclic nets.
   □ Shared VERBS by several entities will result in the formation of communication buffer
     places (communication links).
   To illustrate how text is represented in SONs, consider the following phrase: “Allen shot ... ”.
Using NLP, the extractor will first identify Allen as an entity and then assign the tag ’PERSON’.
Next, the extractor searches for verbs related to the entity Allen. In this case, the verb shot is
identified as an event related to Allen. This results in the construction of the occurrence net shown
in Figure 1.

                                      Allen
                                         p1             p2
                                                shot


Figure 1: Simple Occurrence net.



4.1. Experiment
The aim is to automatically extract CSA-net components (transitions and acyclic nets) from
free textual resources to prepare for modelling. Our initial step involved finding a method to
comprehend how SON expert users identify transitions and acyclic nets. To achieve this, we
engaged expert users to model stories in SONs, which we evaluated to observe the variations in
the resulting models. Following a comprehensive assessment of these models, we recognised



                                                274
Tuwailaa Alshammari CEUR Workshop Proceedings                                             270–282


that there is a mapping from English text to formal models, and concluded that one can represent
named entities as acyclic nets and map verbs to events/transitions.
   Regarding entities, we decided to create a new custom NER model and introduced new labels
for the model to learn. Essential entities such as weapons, family members, and transportation are
not identified by the default NER model. Therefore, we developed and trained a new custom NER
model on these new labels to generate more precise acyclic nets. Subsequently, when evaluating
how expert users identified transitions (verbs), we found it to be a challenging task due to the
complexity of language structure. Consequently, identifying verb types was not possible, and so
we examined the syntactical verb patterns in verbs manually as identified by SON expert users,
resulting in the identification of patterns for recognising transitions.

Entity identification:
A customised new NER model was developed to enhance crime entity detection. The model
underwent training using a dataset comprising approximately 256 concise crime stories [6].
Through an in-depth analysis of these stories and an accurate manual modelling process, we
identified the need to introduce new entity labels to capture key details more effectively.
   One noteworthy observation was the frequent mention of weapons, transportation, and family
members, which were often undetected by default NER models due to the absence of specifically
related labels. In response to these findings, we introduced new entity labels to reinforce
the model’s accuracy and comprehensiveness. The newly added labels encompass a range of
entities including [‘ LAW _ ENFOR ’, ‘ WITNESS ’, ‘ WEAPON ’, ‘ TRANSPORTATION ’, ‘ RELATIVE ’,
‘ PROFESSION ’, ‘ UNNAMEDPER ’]. (Note: ‘ UNNAMEDPER ’ is a NER tag that we propose for
tagging unnamed people in the text. For instance, words like ’one woman’ and ’third employee’
refer to people but are not labelled. Using this tag, we can identify such ambiguous words.)
This improvement aims to significantly advance the NER model’s ability to recognise diverse
crime-related entities, ensuring a more effective analysis of crime narratives. To train the model
on these new entities, we collected approximately 334 crime stories, dividing them into 256
for training data and 78 for evaluation. This process involved several steps, including manual
annotation of the testing data, training, evaluation, and finally testing of the custom NER model.
   The annotation process involves human intervention for identifying and labelling named entities
(such as people’s names, weapons, locations) in text data according to predefined categories.
Annotators highlight entities and assign the appropriate labels using annotation tools such as
TagEditor [12]. These annotations provide labelled data for training and testing NER models,
enabling accurate recognition and categorisation of entities.
   Another challenge arises from the presence of pronouns. To handle this, we employed coref-
erencing models, which involve resolving pronouns and mentions to their original entity. To
address this issue, we integrated N EURAL C OREF [13], a SPAC Y compatible neural network
model capable of automatically annotating and resolving coreferences. It is worth mentioning
that applying coreference models to texts of considerable size can result in uncertain resolutions.




                                               275
Tuwailaa Alshammari CEUR Workshop Proceedings                                                 270–282


Verb identification:
Identifying verbs that accurately represent SON events from crime stories presents a significant
challenge. To address this challenge, we adopted a pattern-based approach to identify verbs
representing events from manually created models. To ensure precision, we examined various
crime-related SON models created by expert users, selecting verbs identified by at least 50% of
the modellers. Through dependency parsing, we determined the positional relationships of these
verbs within sentences concerning surrounding words. Specifically, we identified the types of
words and their dependency relationships preceding and following specified verbs by collecting
all verbs representing events from manually created models. We then analysed the part-of-speech
tags and syntactical tags of the words before and after the verbs. This enabled us to recognise
syntactical patterns for event identification, reflecting the human process observed in manual
SON models. This approach led to the identification of seven patterns, outlined in Table 1. For
instance, the verb may be preceded by a noun, proper noun, or pronoun subject and followed by a
noun, proper noun, or pronoun object. The second column in the table indicates the dependency
relationship between the words around the verb and the verb itself. For instance, the noun that
falls before the verb is a subject, and the noun after the verb is an object.
   The pattern approach employs linguistic features to identify verbs representing events. S PAC Y
processes text through tokenisation and linguistic annotations, assigning each token (word)
with part-of-speech (POS) tags and dependency labels. These annotations facilitate the accurate
identification of verbs, nouns, and other parts of speech, as well as the determination of syntactic
relationships within sentences. By analysing manually selected verbs and their associated patterns
from expert users, we compiled a list of patterns. If a verb pattern matches any of the patterns
in this list, it is considered a SON event. Furthermore, the evaluation of verb extraction is
unnecessary, as it relies more on pattern-rule-based techniques rather than statistical predictive
identification.

Table 1
List of all patterns identified from the manual models created by SON users [POS]verb[POS].
 No    Part-of-Speech Relationship                    Dependency Relationship
 1     [noun, proper noun, pronoun][EVENT][noun,      [nsubj, nsubjpass] [dobj, iobj, pobj]
       proper noun, pronoun]
 2     [proper noun, pronoun][EVENT][verb, adverb]    [nsubj, nsubjpass] [ccomp, conj, advcl, xcomp,
                                                      advmod]
 3     [verb][EVENT][verb]                            [advcl] [advcl]
 4     [noun, proper noun, pronoun][EVENT][verb,      [dobj, iobj, pobj] [ccomp, conj, advcl, xcomp,
       adposition]                                    advmod, pcomp, root]
 5     [verb, adverb][EVENT][verb, auxiliary]         [advcl, advomd, ccomp][advcl, conj, ccomp,
                                                      root]
 6     [pronoun][EVENT][noun]                         [nsubj] [relcl]
 7     [noun][EVENT][verb]                            [nsubj] [advcl]




                                                276
Tuwailaa Alshammari CEUR Workshop Proceedings                                                                              270–282


Entity-verb linking:
We applied the NER model to extract entities, and then, by employing the proposed verb-pattern-
based identification, we also extracted the verbs. Subsequently, we generated lists for each entity
and appended all relevant verbs to each entity list. Algorithm 1 describes the comprehensive
extraction process and the assignment of verbs to entities. Figure 2 shows the structure of
the output lists (one list for each story), denoted as ExtractedData_Lists. Each list within
the ExtractedData_Lists contains information regarding an entity and its corresponding verbs
extracted from the text. For each entity and each verb associate with it, three information items
are provided: the verb (v), the location of the verb within the sentence (vsent_loc ), and the index
of the verb within the text (vindex ). Following the extraction from each story, a list can contain
multiple entities alongside their respective verbs extracted from the text.

                                                                                     1,k1 1,k1          1,k1
              ExtractedData_Lists = [entity1 , v1,1 , v1,1         1,1
                                   [︁
                                                       sent_loc , vindex , . . . , v     , vsent_loc , vindex ],
                                                                                 ..
                                                                                  .
                                             [entityn , vn,1 , vn,1         n,1
                                                                sent_loc , vindex , . . . , v
                                                                                              n,kn n,kn          n,kn ]︁
                                                                                                  , vsent_loc , vindex ]


Figure 2: Extraction output shows how lists are structured.


   To transform entity lists into CSA-nets, a sequential acyclic net is constructed to represent each
entity list. This involves creating a line-like structure where each verb encountered in the entity
list leads to the construction of an event. The construction of acyclic net acnetm starts by creating
an initial place pm,0 . Then, for every verb vm,i encountered in the entitym list, a corresponding
event t m,i is constructed, followed by the insertion of a new place pm,i , as illustrated in Figure 4.
   Following the construction of acyclic nets, we use the verb information (i.e., v, vsent_loc , and
vindex ) to construct the communication link later on. This will involve looping and searching for
similar verb information among the entities. If such information is found, a communication link
is established.

Communication identification:
Communication between different acyclic nets is possible and leads to CSA-nets. Such nets are
constructed when transitions in component acyclic nets are connected using buffer places. To
accomplish this, we have already extracted verbs along with their corresponding information,
such as the specific location within the text and the sentence in which the verb is found. This
extracted information enables us to establish communication by identifying identical verbs across
different acyclic nets. Upon discovering a match, new buffer places are created to facilitate
communication (synchronisation). For instance, if acyclic nets acnetm (modelling entitym ) and
acnetl (modelling entityl ) are such that vm,i = vl, j , then we add two buffer places, btv and bvt ,
and four arcs between transitions t and v (modelling vm,i and vl, j , respectively) to ensure their
synchronicity (the added arcs are: (t, btv ), (btv , v), (v, bvt ), and (bvt ,t)). To improve readability, in
the diagrams we depict such an addition of places and arcs by showing just a single buffer place



                                                              277
Tuwailaa Alshammari CEUR Workshop Proceedings                                                      270–282


linked by dashed edges with t and v, as shown in Figure 4.

Algorithm 1 Entities and Events Extraction Algorithm
 1: Input: Text document
 2: Output: Lists of entities with verbs - [ExtractedData_Lists file]
 3: Read the text file, then process the text using spaCy, produce Doc
 4: Initialize entLabels list with NER labels
 5: Initialize empty lists names, verbs, name_info, and verb_info
 6: for each sentence in doc do
 7:     for each token in sentence do
 8:          if token’s entity label ∈ entLabels then
 9:               if token not in names then
10:                    Add the token in names list
11:          else
12:               if token is verb and in the verb pattern then
13:                    if token ∈
                                / verbs then
14:                         Add token, sent_number, and token index to verbs and verb_info
15: Initialize name_verb_mapping{}
16: for each name in names do
17:     Initialize name_info with name
18:     for each sentence in doc do
19:          if name appears in sentence then
20:              for each token in sentence do
21:                  if token is verb and token ∈ verb_info then
22:                       if token matches any verb in verbs and is in syntactical relation with name then
23:                            Add token, sent_number, and token index name_info
24:   Sort name_verb_mapping by the order of appearance
25: write to FILE



Modelling Experiment:
To illustrate the applicability of our methodology, consider the following text (S TORY A):
“Brenton Rhasheem Davis, 24, allegedly shot and killed Tommy Jones III, 19, with a 9mm
handgun in the parking lot of a Walgreens in Columbus, Georgia. Witnesses say they saw Davis
and Jones enter the store together. When they left toward the parking lot, Jones ran from Davis
who then allegedly shot Jones in the leg and back. Davis then walked up to Jones, who had
collapsed, and allegedly fired more shots. Jones was shot at least four times. Nine shell casing
from the 9mm handgun were found at the scene. A witness added that Davis then picked up a bag
Jones had been carrying and then fled the scene. It is unknown what was inside the bag. Jones
had gone to the store to meet Davis in order to buy a 45-caliber handgun from him. The two men
did not previously know each other. Davis was charged with one count of murder.”
   Here, we illustrate the diverse modelling approaches applied to STORY A. Figure 3 shows the
manually created models, while Figure 4 illustrates a potential automatically extracted model
based on the results of the proposed extraction method. It should be noted that humans tend to
abstract and therefore selectively omit verbs they consider irrelevant. Conversely, the proposed



                                                    278
Tuwailaa Alshammari CEUR Workshop Proceedings                                                  270–282


verb extraction method aims to identify all verbs present within the pattern, which leads to the
generation of larger SON models. However, abstraction may be employed as an option. This could
be explored further in future work, with the aim of utilising BSONs as an abstraction method for
the produced SONs.


5. Results
Evaluation focused on the NER model using the 20% of the data (78 stories from [6]) that was
manually annotated. The data used was similar to the data used to train the model. The precision
indicates that roughly 83.95% of predicted entities were correctly identified, highlighting the
accuracy of the model’s predictions. Additionally, the recall score suggests that the model captures
around 87.45% of all relevant entities in the test texts, demonstrating its ability to retrieve relevant
information. The F1-score (84.84%) provides a balanced assessment of the model’s performance
reflecting good identification of relevant entities while minimising false positives and negatives.
These results illustrate that the NER model has a good performance indicating its effectiveness
across various NER tasks.

Table 2
NER model performance evaluation.
                                  Data Set      Measure     Score %
                                                Precision   83.95
                                  Validation    Recall      87.45
                                                F1          84.84
                                                Precision   69.72
                                  Testing       F1 Recall   77.15
                                                F1          71.08

   Then the model was tested on 54 cases related to police shootings obtained from [14], which
was not part of the training dataset. The testing scores reveal acceptable performance, with
a precision of 69.72% and a recall value of 77.15%. While the model effectively identifies
relevant instances, the F1-score of 71.08% indicates balanced performance. Comparing validation
and testing results, there are several possible reasons for the drop in performance, such as
overfitting and sample size. However, in this specific case we think that data distribution may be
a contributing factor. This means that the patterns learned during training may not closely match
the data used in testing. In general, the model demonstrates potential for practical applications.
However, depending on the specific crime type, further training and fine-tuning may be required
to ensure optimal performance.


6. Conclusion and future work
A new approach to extracting useful information for SON modelling from crime reports is
proposed. Initially, entities and verbs are identified to represent and describe the acyclic nets and
events. New named entity labels are introduced, and a custom NER model is trained on these



                                                  279
Tuwailaa Alshammari CEUR Workshop Proceedings                                                                                                    270–282




      Witness
                enter_store

         p1                    p2

                         q1
       BRD
                enter_store           le f t_store              shoots                  walks                more_shoots         takes_bag_ f lees

         p1                    p2                      p3                       p4                     p5                       p6                     p7
                               q2                q3                      q4                                           q5
       TJ                                                                                                                                          q7


        p8 enter_store p9                 p1 0        run       p11                  p12 collapses p13                      p14

                                                                                                                      q6
                                                                                                 Bag


                                                        (a) Modeller 1
                                                                                                     p1 5                   p1 6                   p1 7
          Witness
                         saw

                p1                   p2
                         q1
       Davis
       saw            enter           le f t            shot           walked               f ired           killed        pickedup            f led

 p1             p2             p3                p4              p5                p6                p7               p8                p9                  p10
                       q2                 q3                     q4              q5                           q7                q8                            q10
       Jones


         p1 hadGone p2              saw    p3 enter         p4        le f t    p5    ran       p6collapses p7 f ired                p8 died     p9


                                                                                     p5 carrying            p5

                                     q9                                                                                    q6
            Store                                                                                    Bag


                p15         enter         p16                                                               p17 carrying p18 pickedup p20


                                                               (b)Modeller 2


Figure 3: Two models created by different SON expert users (Modeller 1 and Modeller 2) depict the
events from story A and demonstrate close entity identification




                                                                               280
Tuwailaa Alshammari CEUR Workshop Proceedings                                                                                                                                                                        270–282


  0.9mm_handgun                                                        Witness

               shot                  killed                                       saw                  enter

       p1                   p2                    p3                        p1               p2                  p3

                    q1                q2                                              q3                  q4
    Davis

             shot           killed         saw         enter            le f t             ran         shot          walked        f ired         picked      carrying          f led           gone          meet          buy

      p1              p2              p3          p4              p5             p6               p7            p8            p9            p10            p11           p12              p13          p14            p15          p16
                    q5               q6           q7             q8              q9              q10           q11        q12               q13         q14           q15                q16           q17           q18          q19
     Jones
             shot           killed         saw         enter            le f t             ran         shot          walked        f ired         picked      carrying          f led           gone          meet          buy

      p1              p2              p3          p4              p5             p6               p7            p8            p9            p10            p11           p12              p13          p14            p15          p16


                      q20                  q21                   q22                  q23              q24                                        q25             q26               q27
 the_parking_lot                                                                                                       A_witness

                                                                                                                                                                                                         q28          q29           q30
       p1      shot         p2   killed          p3     le f t         p4        ran        p5     shot        p6              p1 picked p2 carrying p3                     f led       p4



                                                                                                                                                   45 − caliber_handgun



                                                                                                                                                                 p1     gone        p2          meet     p3          buy     p4




Figure 4: A potential SON model could be developed based on the results of the extraction for the
same story (story A). The extraction approach identified more entities and verbs compared to human
identification in Figure 3.


labels by annotating crime stories. Additionally, verbs are identified using syntactic patterns
generated from manually modelled crime documents using CSA-nets. An algorithm for the
identification and extraction process is proposed, and the experiment showed acceptable results.
   Future work will focus on structurally and formally representing the extracted data and mod-
elling it using SONCraft. Furthermore, improvements to this approach could involve developing
new tagger and NER models to enhance accuracy. Additionally, we will investigate the identifica-
tion and construction of concurrent and conflicting events within acyclic nets.


Acknowledgement
Appreciation is extended to the anonymous referees for their comments and suggestions which
have led to improvements in the content and presentation of this paper. Furthermore, sincere
gratitude is extended to Maciej Koutny and Anirban Bhattacharyya for their invaluable guidance
and unwavering help throughout the development of this paper.




                                                                                                               281
Tuwailaa Alshammari CEUR Workshop Proceedings                                          270–282


References
 [1] M. Koutny, B. Randell, Structured occurrence nets: A formalism for aiding system failure
     prevention and analysis techniques, Fundam. Informaticae 97 (2009) 41–91.
 [2] B. Randell, Occurrence nets then and now: The path to structured occurrence nets, in: L. M.
     Kristensen, L. Petrucci (Eds.), Applications and Theory of Petri Nets - 32nd International
     Conference, PETRI NETS 2011, Newcastle, UK, June 20-24, 2011. Proceedings, volume
     6709 of Lecture Notes in Computer Science, Springer, 2011, pp. 1–16.
 [3] T. Alharbi, M. Koutny, Domain name system (DNS) tunneling detection using structured
     occurrence nets (sons), in: D. Moldt, E. Kindler, M. Wimmer (Eds.), Proceedings of the
     International Workshop on Petri Nets and Software Engineering (PNSE 2019), volume 2424
     of CEUR Workshop Proceedings, CEUR-WS.org, 2019, pp. 93–108.
 [4] B. Li, M. Koutny, Unfolding CSPT-nets, in: D. Moldt, H. Rölke, H. Störrle (Eds.), Pro-
     ceedings of the International Workshop on Petri Nets and Software Engineering (PNSE’15),
     volume 1372 of CEUR Workshop Proceedings, CEUR-WS.org, 2015, pp. 207–226.
 [5] T. Alshammari, Towards Automatic Extraction of Events for SON Modelling, in: M. Köhler-
     Bussmeier, D. Moldt, H. Rölke (Eds.), Petri Nets and Software Engineering 2022 co-
     located with the 43rd International Conference on Application and Theory of Petri Nets
     and Concurrency (PETRI NETS 2022), Bergen, Norway, June 20th, 2022, volume 3170
     of CEUR Workshop Proceedings, CEUR-WS.org, 2022, pp. 188–201. URL: https://ceur-
     ws.org/Vol-3170/paper11.pdf.
 [6] Violence Policy Center, [online] Available at: https://vpc.org/. (2022).
 [7] P. Siriaraya, Y. Zhang, Y. Wang, Y. Kawai, M. Mittal, P. Jeszenszky, A. Jatowt, Witnessing
     crime through tweets: A crime investigation tool based on social media, in: F. B. Kashani,
     G. Trajcevski, R. H. Güting, L. Kulik, S. D. Newsam (Eds.), Proceedings of the 27th ACM
     International Conference on Advances in Geographic Information Systems, SIGSPATIAL
     2019, Chicago, IL, USA, November 5-8, 2019, ACM, 2019, pp. 568–571.
 [8] S. Chakravorty, S. Daripa, U. Saha, S. Bose, S. Goswami, S. Mitra, Data mining tech-
     niques for analyzing murder related structured and unstructured data, American Journal of
     Advanced Computing 2 (2015) 47–54.
 [9] T. Freytag, P. Allgaier, Woped goes NLP: conversion between workflow nets and natural
     language, in: W. M. P. van der Aalst et. al (Ed.), Proceedings of the Dissertation Award,
     Demonstration, and Industrial Track at BPM 2018, volume 2196 of CEUR Workshop
     Proceedings, CEUR-WS.org, 2018, pp. 101–105.
[10] B. Li, Visualisation and Analysis of Complex Behaviours using Structured Occurrence Nets,
     Ph.D. thesis, School of Computing, Newcastle University, 2017.
[11] M. Alahmadi, S. Alharbi, T. Alharbi, N. Almutairi, T. Alshammari, A. Bhattacharyya,
     M. Koutny, B. Li, B. Randell, Structured acyclic nets, CoRR abs/2401.07308 (2024).
     doi:10.48550/ARXIV.2401.07308. arXiv:2401.07308.
[12] I. TagEditor, Tageditor annotation tool, 2020. URL: https://github.com/d5555/TagEditor,
     accessed: (2022).
[13] NeuralCoref, Neuralcoref 4.0: Fast coreference resolution in spacy with neural networks,
     https://github.com/huggingface/neuralcoref, 2022.
[14] Fatal Encounters, [online] Available at: https://fatalencounters.org/. (2024).



                                              282