=Paper=
{{Paper
|id=Vol-3741/paper36
|storemode=property
|title=Using Graph Neural Networks for Heterogeneous Event Classification
|pdfUrl=https://ceur-ws.org/Vol-3741/paper36.pdf
|volume=Vol-3741
|authors=Valerio Bellandi,Stefano Montanelli,Darya Shlyk,Stefano Siccardi
|dblpUrl=https://dblp.org/rec/conf/sebd/BellandiMSS24
}}
==Using Graph Neural Networks for Heterogeneous Event Classification==
Using Graph Neural Networks for Heterogeneous Event Classification Valerio Bellandi1 , Stefano Montanelli1 , Darya Shlyk1 and Stefano Siccardi2 1 Università Degli Studi di Milano, Department of Computer Science, Via Celoria 18, Milano, Italy 2 Consorzio Interuniversitario Nazionale per l’Informatica, Via Ariosto, 25, Roma, Italy Abstract Graph Neural Networks (GNNs) are specialized machine learning models designed for analyzing and making predictions based on graph-structured data. In this paper, we introduce a GNN-based approach for the analysis of heterogeneous event graphs. By heterogeneous event graph, we refer to a temporal graph where multiple types of nodes representing events as well as multiple type of edges among events are defined. A distinguishing aspect of the proposed approach lies in the specification of the subgraph of nodes for each event to classify, and in the use of embedding techniques for representing all the relevant features/attributes of the subgraph. The proposed approach is compatible with different GNN models; the use of PSCN, namely a CNN for subgraph classification, is discussed in the paper under two different settings. Preliminary results on two different case-studies, as well as a real application in the field of historical data classification are also discussed. Keywords Multi-layer Event Classification, GNN, Historical Data Analysis 1. Introduction Graph Neural Networks (GNNs) are a class of machine learning models that have gained attention in the recent years for their ability to work with data structured as graphs. Graphs are mathematical structures that consist of nodes and edges, and they are used to represent a wide range of real-world systems, such as for example social networks, recommendation systems, and biological networks. GNNs are designed to address the unique challenges posed by these graph-structured data and have emerged as a powerful tool in various domains. Since their first description in the seminal paper [1], several models have been proposed, with the common characteristic of operating directly on graph data and capturing complex dependencies and interactions within the data. Many GNNs can propagate information between connected nodes in a graph and update the feature representations of nodes by aggregating information from their neighboring nodes. Often, low dimensional vector representations of nodes (embeddings) are used in order to make their features usable in downstream tasks, such as node classification, SEBD 2024: 32nd Symposium on Advanced Database Systems, June 23-26, 2024, Villasimius, Sardinia, Italy * Corresponding author. † These authors contributed equally. $ valerio.bellandi@unimi.it (V. Bellandi); stefano.montanelli@unimi.it (S. Montanelli); darya.shlyk@unimi.it (D. Shlyk); stefano.siccardi@unimi.it (S. Siccardi) 0000-0003-4473-6258 (V. Bellandi); 0000-0002-6594-6644 (S. Montanelli); 0009-0006-4051-6871 (D. Shlyk); 0000-0002-6477-3876 (S. Siccardi) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings link prediction, and graph classification. Sometimes, GNNs propagate information through multiple layers of the network. A hierarchy is the result, capable to capture increasingly complex patterns and abstractions in the data. In this paper, we propose an approach based on GNNs for node classification over a het- erogeneous event graph. By heterogeneous event graph, we refer to a temporal graph where multiple types of nodes representing events as well as multiple type of edges among events are defined. In addition to specific application-dependent attributes, an event is associated with a timestamp denoting when the event occurs. This way, the approach can deal with a variety of event graphs, and it is capable to represent entities from many different domains, including social or physical system, that can be organized in a network. The approach consists in the specification of a subgraph of nodes for each event to classify, and in the use of embedding techniques for representing all the relevant features/attributes of the subgraph. A GNN is finally invoked to enforce event classification over each selected subgraph. Our approach can be employed with different GNN models, thus, for tests and experiments, we consider the use of two different solutions, that are both based on a CNN for subgraph classification called PSCN [11] under two distinct settings based different criteria for graph construction. We stress that a contribution of our proposed approach is about the application/specialization of PSCNto event graph classification with multiple types of nodes and edges (i.e., graph of heterogeneous events). In particular, when defining the subgraphs of the events to classify, we allow to specify application-driven constraints expressed through a time window in which events have to occur for being considered and filters to customize the type of nodes/edges to include. Two different case studies are discussed about i) blackmailing detection and ii) bitcoin transaction rating, respectively. For future experiments, we envisage a further application case study of the proposed approach to a historical dataset in the framework of the Tacitroots ERC H2020 project1 . The paper is organized as follows. In Section 2, we discuss the relevant references in the literature. Section 3 presents the proposed GNN approach for event classification. Experiments are illustrated in Section 4. Section 5 outlines a further case-study on the historical dataset of the Tacitroots project. Concluding remarks are finally given in Section 6. 2. Related work Temporal Event Graphs, sometimes called Dynamic Graphs, have been described in [3], and they are characterized by nodes that represent events and links that connect an event and its nearest temporal neighbours. The paper also considers “motifs” which characterize pairs of events according to the involved nodes and the direction of their interaction. Events involve just two entities and they have no labels, that is just one type of events is considered. The work in [4] deals with the dynamics of diffusion-like processes on temporal networks and introduces weighted event graphs. The authors consider spreading agents that can be transmitted onward from a node only within some time 𝛿𝑡. This time limit is the control parameter of a percolation problem and the structure of paths created within 𝛿𝑡 determines where the process can move. Weighted event graphs are static, weighted, and directed acyclic graphs whose nodes are the 1 https://sites.unimi.it/tacitroots/. events. Each event is linked to all events sharing an involved entity; links have a weight proportional to the difference of moments when the linked events happened in the timeline of the system. To apply a threshold to the event graph one can “cut” all links whose weight exceeds the threshold. The work in [5] considers the transformation of a temporal network to a weighted event graph similar to the one described above. An event embedding mechanism is defined by sampling the neighbourhood of events. The sampling is done according to probabilities determined by the weights of the links that connect the actual event to its neighbours. These samplings are passed as inputs to the Skip-Gram model, resulting in a d-dimensional vector space in which events are represented by Cartesian coordinates. Experiments indicate that these embeddings capture in large part the time ordering, and some mesoscale structures defined by temporal vicinity of events. An application is the study of the final outcome of the epidemic originated by a specific event. Regarding Graph Embedding, in general terms, the embedding consists in projecting a graph in a 𝑑−dimensional space in such a way that the graph structure is preserved as much as possible. We quote, for instance, the node2vec method [6], VERSE [7], and proNE [8]. Entity embedding computation by node2vec is based on random walks traversing the graph and measuring the probability of observing a specific entity considering different staring points. On the other hand, VERSE learns embeddings by training a single-layer neural network, and ProNE enhances the embedding using spectral spaces techniques. The above algorithms do not distinguish link types or labels, they are concerned, in other terms, with single layer graphs. On the other hand, event graphs are multilayered, that is their nodes are connected by links of different types. In [9], a multilayered version of VERSE, MultiVERSE, can be found: a fast and scalable method to learn node embeddings from multiplex and multiplex-heterogeneous networks, based on Random Walks. In [10], a method based on the normalized graph Laplacian matrix and its spectrum is described, that is able to compute suitable embeddings for multi-layered graphs. Following a different path, in [11], the authors describe a construction for node neighborhood that can be directly applied to machine learning using a CNN network, dealing with a graph in the same way one would do with an image (PATCHY-SAN or PSCN). As a further option, in [12], the authors describe a method that we will call “ad hoc embedding” to derive an embedding specifically tailored for a query, based on suitable node counts in a temporal window around each event. Analogous methods are used in [13] and [14] for pattern matching in event graphs and to study correlations between events, respectively. The Graph Neural Network paradigm was formalized in [1], that is a type of neural network that can act directly on graphs. The review paper [2] describes a taxonomy of GNNs, and it discusses some limitations of the original method and possible variants proposed to address the issues. It is worth noting that we use the term GNN in a broader sense than the model described in [1], that includes some models quoted in our previous section, e.g. PSCN. The GNN models have been extended to multilayer graphs, for instance in [15], where a multi-dimensional convolutional neural network model, mGCN, is proposed to capture rich information in learning node-level representations for multi-dimensional graphs. In [16], a general approach is described that is not task specific to extend GNNs to graphs with multiple edge types between nodes. With respect to the above solutions, the main contribution of our proposed approach is about the application and specialization of PSCN to event graph classification involving multiple types of nodes and edges, thereby forming a graph of heterogeneous events. Specifically, we propose a solution for delimiting the subgraphs for event classification, and the specification of related application-driven constraints. These constraints are expressed through a time window in which events must occur to be considered, as well as filters that allow the customization of the types of nodes and edges to include. 3. The proposed approach As described before we propose to extend the GNN model presented in [1] for being applied to heterogeneous event graphs. Our proposed approach consists of three steps described in the following, namely event graph construction, event subgraph definition, subgraph analysis. The event graph construction is performed by defining a node for each event to classify. An event is characterized by a timestamp, featuring when the event occurred. Two event nodes are connected by an edge in the graph when i) the two events are next (i.e., successive) to each other in time, and ii) they have a shared attribute, namely a common property that describes the link between the two nodes, and it is used to label the edge. For instance, if the event nodes are phone calls, possible shared attributes are i) the calls have the same caller, ii) the calls have the same receiver, iii) one call is received by the caller of the other, and iv) one call is made by the receiver of the other. It is worth noting that a node is linked only to the next node/event with a certain shared attribute, not all the successive nodes. As an example, in case of calls with the same caller as shared attribute, a call/node is only linked to the successive call according to the timestamps. Further details about the event graph construction are provided in [14]. The event subgraph definition is about the extraction of an appropriate subgraph for each event to classify. A subgraph is targeted to a specific event node and it has to include all the nodes and edges that are potentially useful and pertinent to classify the target. Each subgraph can work as the receptive field of a CNN, and it is extracted according to the method described in [11]. A subgraph must contain a prefixed number of nodes, and a notion of time window is introduced to define a mechanism for choosing the nodes to include in the subgraph. Since a node of our event graph is associated with a timestamp, the time window is defined by setting a maximum time afterwards (or backwards) with respect to the initial target event node. Furthermore, the time window specifies the maximum distance of a node from the target for being included in the subgraph. Given an event node to classify (i.e., target node), the definition of the corresponding subgraph is performed as follows. The target node is added to the subgraph. Then, the neighboring nodes are added to the subgraph when i) the timestamp of neighbors falls in the expected time window, and ii) the prefixed size of the subgraph as well as the maximum distance from the target are not reached. The addition of neighboring nodes is repeated until the above conditions are not violated. Dummy nodes are added to the subgraph when the time window is passed and the graph contains less nodes than expected. Optionally, filters can be defined to apply application-driven constraints based on specific types of edges/nodes during subgraph extraction. A vector representation, namely an embedding, can be derived from the subgraph by listing all the nodes and edges, including any feature that can be relevant for classifying the target. In Figure 1, we show an illustrative example of subgraph definition by considering a target node/event denoted by 𝑇 . The graph contains different types of event each one denoted by a Figure 1: Example of subgraph definition. specific color, and different types of edge, each one denoted by a specific textual label. For the construction of the subgraph of the node 𝑇 , we assume that the time flows from right to left and we include nodes in the subgraph of 𝑇 by considering i) a time back-window criterion, ii) a maximum of 7 subgraph nodes, and iii) a max distance of 2 from the target. As further application-driven constraints, we specify that only edges of type a and c are considered when the distance from the target is 1, while all the edge labels are considered at distance 2. As a result, the subgraph for the target node 𝑇 contains the nodes 1, 2, and 4 considering the nodes at distance 1 from target; and the nodes 3, 7, 8, and 9 considering the nodes at distance 2. The node 5 is excluded since linked to the target by an edge labelled with 𝑏 (distance 1). The node 10 is excluded since it is beyond the time window. The node 6 is not considered since the previous node 5 has been excluded. Since the expected number of nodes is inserted in the subgraph, no dummy nodes are added. The subgraph analysys is finally enforced. The goal is to classify the target event of each subgraph that has been previously defined. Different methods can be employed to perform subgraph analysis, and the most appropriate method can be invoked depending on the case study at hand. In the remaining of the paper, we consider two specific application case-studies with related experiments based on a CNN for subgraph classification that is PSCN. 4. Experiments For experimentation, we exploits two different application case-studies with related datasets. The first case-study is called blackmailing and the goal is to recognize when a node/event can be considered a request for money from blackmailed people according to the following pattern: somebody calls or writes to a number of persons in a short time; afterwards each of the persons withdraws (using his/her debit card or bank account) some money; then the caller deposits on his/her bank account an amount that is approximately the sum of the withdrawn amounts. The possible events in the graph are phone calls, email messages, debit card withdrawals, and bank account movements. We are aware that the one above is not a realistic blackmailing Figure 2: Example of sub graph for a (simplified) pattern of the blackmailing dataset. pattern; however, in this case-study, our goal is to show that our proposed approach is capable to correctly recognize a given pattern even when the graph is noisy and the target is confused by spurious event nodes. Synthetic data for experimentation was generated as follows: • records for 1000 persons were generated and assigned phones, email addresses, bank accounts, debit cards and “friends”, that is a vector of other persons’ identifiers • based on adjustable parameters, the program generated records for phone calls, email messages, debit card withdrawals and bank account moves at discrete times • a number of persons generated the specified pattern, with events marked as targets To test the sensitivity of the methods, we generated two sets of data with increasing confusion between the targets and the other events. For all the sets the simulated period was 200 days: 1. average interval between i) non-target phone calls is 24 hours, ii) email messages is 4 days, iii) bank account movements is 8 days, iv) debit card withdrawals is 6 days; v) minimum interval between target events is 2 hours, the maximum is 6 hours; 2. average interval between i) non-target phone calls is 6 hours, ii) email messages is 12 hours, iii) bank account movements is 8 days, iv) debit card withdrawals is 6 days; v) minimum interval between target events is 4 hours, the maximum is 12 hours. For each set of parameters, two datasets were generated: one for training and one for testing. In particular, the datasets with the parameter set 1 are called BM1a and BM1b, while the datasets with the parameter set 2 are BM2a and BM2b. Figure 2 shows the sub graph that one expects to find for a target pattern. The rightmost node, labelled O1, is the final deposit. It is linked to the nearest phone call, C1, made by the operator. This call is not necessarily one of the calls made Table 1 Event Graphs for the blackmailing case-study Code N.Nodes N.Edges N.Deposits N.Targets BM1a 338,621 1,739,669 16,158 90 BM1b 338,617 1,739,578 15,998 90 BM2a 920,102 5,176,665 16,080 90 BM2b 920,200 5,177,204 16,145 90 to solicit money (target calls), but it is linked to a previous one made by the same caller, until a target call is found. This can be linked to a financial operation like in case of C2 and O2, if the called person did not receive any other call before withdrawing money. However, it may happen that the financial operation is linked to another call, that can be found following the edges linking calls received by the same person, as for the nodes C3, C5 and O3. Both situations may occur, as for the node C4, if several financial operations were made by the called person in the same period. We created the event graph linking each node with the previous ones of each type, sharing an agent, that is: phone calls to phone calls with the same caller, same receiver, the receiver as caller and the caller as receiver; analogously for links to to emails (caller as sender, etc.); to the previous deposit made by the caller and by the receiver; to the previous account move made by the caller and the receiver. Analogous links where created for the other event types. A total of 49 link types were created. The derived event graphs features are presented in Table 1. This is a case-study of binary classification on the nodes of type Deposit: the goal is to detect target and non-target deposits according to the above pattern of blackmailing. The second dataset is called Bitcoin-OTC2 and it is commonly used for this kind of experi- ments (e.g., [18, 19]). It consists of 35,592 records, that are edges between two users of Bitcoin OTC. Each edge represents a rating the first user gave (called source) of the second (called target) at a specific time, in a scale from -10 (total distrust) to +10 (total trust) in steps of 1, with 89% positive values. We note that normally this dataset is processed as an edge classification task. Considering edges as event nodes, we can trace it back to our event classification task. We created the event graph linking each node to the previous ones having: the same source (𝑠𝑠), the same target (𝑡𝑡), the source as target (𝑠𝑡), and the target as source (𝑡𝑠). The resulting graph had 124,145 edges and 35,592 nodes, corresponding to the edges of the original graph. 4.1. Results for the blackmailing dataset In the experiments, the PSCN method has been considered with two distinct settings character- ized by different criteria used for subgraph definition. Results are summarized in Tables 2 in terms of True Positives, False Negatives, True Negatives, and False Positives. We used a time window of 70,000 sec. (around 20 hours), a requested number of 50 nodes for BM1a and BM1b, and 100 training epochs; a time window of 100,000 sec. (around 28 hours), a requested number of 150 nodes for BM2a and BM2b. 2 http://snap.stanford.edu/data/soc-sign-bitcoin-otc.html. The first setting of PSCN is generic, in the sense that we do not consider constraints on the shape of target subgraphs. Constraints on the distance of nodes from the target have been not specified. Edge labels were filtered, so that only 16 edge types have been selected in the subgraphs, in order to select phone calls and emails originated by the operator of deposits and bank movements, withdrawals made by people who received a phone call or an email and chains of phone calls and emails by the same person. Table 2 Blackmailing Event Graphs, PSCN results Code TP FN TN FP Code TP FN TN FP BM1a train 90 0 16,068 0 BM1b train 90 0 15,908 0 BM1b test 85 5 15,889 19 BM1a test 73 17 16,060 8 BM2a train 90 0 15,990 0 BM2b train 90 0 16,055 0 BM2b test 21 69 16,045 10 BM2a test 29 61 15,964 26 The second setting of PSCN is specialized, in the sense that we define a maximum distance of nodes from the target (i.e., max distance is 3), and constraints on the node/edge types to consider are specified at each distance. At distance 1, that is for neighbours of the target node to be classified, only links to phone calls and emails made by the person who made the deposit were considered. At distance 2 and 3, only emails and phone calls made by the same person as above, and financial operations by people who received emails and phone calls were considered. An illustration of the resulting subgraphs can be found in Figure 3 where an example of target (on the left) and non-target (on the right) graphs are provided. Both graphs start with a deposit (red) and contain chains of emails and phone calls (blue) as well as financial operations (withdrawals and account moves, yellow). Results are reported in Table 3. Figure 3: Blackmailing dataset: example of a target (left) and a non-target (right) sub graph. We note that the specialized setting of PSCN provides very interesting results, especially on the second, more complex, group of tests. Considering that in all the cases the task consists in finding a small number of events, we see that the number of false positive is always fairly low. However, the general method fails to detect approximately two thirds of the targets in the complex case, while the specialized model is able to find always more than 85%. Table 3 Blackmailing Event Graphs, PSCN results with the specialized version Code TP FN TN FP Code TP FN TN FP BM1a train 90 0 16,067 1 BM1b train 90 0 15,908 0 BM1b test 90 0 15,908 0 BM1a test 90 0 16,064 4 BM2a train 87 3 15,988 2 BM2b train 90 0 16055 0 BM2b test 77 13 16,049 6 BM2a test 83 7 15,982 8 4.2. Results for the Bitcoin-OTC dataset We considered subgraphs whose events shared either the source or the target user with the event to classify. We considered a time window of 4,000,000 sec., and a maximum distance of three steps from the target node. Figure 4 shows an example of subgraph for a 10 labelled event, and one for a -10 labelled event. Labels from -10 to -5 are drawn in red, from -5 to -1 in yellow, from 1 to 5 in blue and greater that 5 in green. It can be seen that in this dataset, the subgraph structures may be not so different, and the values of the labels may be the discriminant feature. The obtained results can be compared to those of [20] and [18]. In [20], the problem is Figure 4: Example of a 10-labelled (left) and a -10-labelled (right) sub graph. expressed in terms of edge classification and it reports a F1 micro average scores in a figure, with values above 0.85. In [18], the problem is presented in terms of edge-weight prediction and it reports as measures the Root Mean Square Error (RMSE) and the Pearson Correlation Coefficient (PCC) of predictions vs true values. The best results quoted are 0.31 RMSE and 0.49 PCC for a single method; 0.26 RMSE and 0.67 PCC using predictions of several algorithms in a supervised classification. Scores where normalized to the [−1, 1] interval for this computation. We note that RMSE seems to be a better measure for the quality of results, than f1 score. Actually, a real user would be more interested in an approximation of the values, that are ratings in an interval, than exact guesses. Some older work are presented as edge sign prediction (e.g. [21]), and a comparison between their results and ours cannot be done, as they used other datasets. However, we report, only for test data, a binary f1 score referred to edge signs. Results are shown in Table 4. We performed experiments under two settings. In the first one, we used Table 4 Bitcoin-OTC Event Graphs, PSCN results Case f1 RMSE PCC Binary f1 Case f1 RMSE PCC Binary f1 50% train 0.610 0.295 0.621 33% train 1st 0.603 0.297 0.612 50% test 0.608 0.304 0.595 0.966 66% test 1st 0.608 0.305 0.591 0.966 33% train 2nd 0.609 0.290 0.613 33% train 3rd 0.608 0.293 0.630 66% test 2nd 0.603 0.301 0.587 0.966 66% test 3rd 0.600 0.301 0.595 0.964 half of the data for training, and half for testing. In the second setting, we used one third for training, and the remaining for testing, by also considering three three different compositions of the training set. 5. Application of the approach to historical data In the following, we envisage a real application case-study where the proposed approach could be effective. The case-study is based on a dataset of historical events in the context of the Tacitroots ERC H2020 project. The goal of Tacitroots is to study an extensive corpus of unpublished documents covering several decades of scientific work in the European continent. The collected data includes thousands of letters exchanged between scholars throughout Europe in the 16th century, discussing open scientific issues, and documenting various experimental findings. Our idea is to consider the exchange of letters between scholars as events and to build a corresponding event graph according to the approach described in Section 3. In particular, an event node in our graph represents a letter sent by a scholar to a colleague at a certain moment in time. An event is associated with the sending date, the sender name, and the receiver name. Furthermore, every event node is associated with a topic that is discussed in the letter, such as medicine, astronomy, or history. Additionally, a node/event can be associated with further attributes when it mentions the execution of an experiment with related observations, as well as an instrument description. Adjacent nodes can be linked by edges of different type featuring a shared attribute. Possible examples of edge type are: common sender/receiver, or cross-reference between sender and receiver of the linked letters. Further edge types are about common experiments or instruments mentioned in two letters. Figure 5 illustrates an example of event graph built with the Tacitroots datasets and it represents a sequence of three letter exchanges. Letter 1 and Letter 2 refer to letters written by Vincenzo Viviani on 19th and 21st March 1664 to Giovanni Borelli and Carlo Rinaldini, respectively. Both events discuss thermology and mention telescope, and they are therefore connected by SAME SENDER and SAME INSTRUMENT relationships. Letter 3 represents the response of Carlo Rinaldini to Vincenzo Viviani addressing astronomy and holding a notice of observation on comets. This later event is linked to the previous one via RECEIVER = SENDER and SAME INSTRUMENT relationships. Considering the above example of the event graph, we would like to address the following node classification task. Given a target event node, describing a letter exchange between two scholars, the objective is to predict the topic addressed by the letter from the recent history of correspondences involving the sender and the receiver of the target event. We believe that integrating available node-level descriptive features, including instrument Figure 5: Fragment of the historical event graph based on the dataset of the Tacitroots project. and experiment, can enhance classification accuracy. For this reason, we plan to consider all these features along with edge labels when computing the vector-based representation of target nodes and related neighbors. Ultimately, with this case-study we aim to showcase the applicability of our methodology to a broader field of historical studies, and its potential for uncovering valuable insights into the relationships and topics discussed among scholars. 6. Concluding remarks In this paper, we presented an approach for heterogeneous event classification through GNN. The preliminary experimental results on two different case-studies discussed in the paper provides interesting scalability performances. For instance, run times for the BM2 datasets are roughly increased of 1/3 compared to run times for the BM1 datasets; on the other side the BM2 datasets are approximately three times larger than BM1. Another interesting point is that results are stable even when the training set size is reduced by more than 30%. As a further result, we note that the proposed approach is quite flexible, and it can be used to represent very different event graphs. The Blackmailing graph has several types of nodes (with no attributes) and several types of edges. The Bitcoin graph has only one type of nodes with an attribute and just two types of edges. The first gives raise to relatively more complex subgraphs than the latter, as Figure 3 and 4 show. A further case-study is also sketched-out based on real historical data, with single type of nodes and multiple types of edges. Test and experiments on this case-study are currently ongoing. As a final remark, we note that the choice of the features to use for defining the subgraph requires some domain knowledge of the underlying phenomena. Also the choice of the most appropriate GNN can depend on the domain and features of the considered events. Experiments with further GNN models are currently ongoing. For instance, the KONG method (Kernels for Ordered-Neighborhood Graphs) [17] has been experimented on both the paper case-studies, but results are less interesting than those obtained with PSCN, thus requiring deeper investigations. Acknowledgments Research supported, in parts, by i) Università degli Studi di Milano under the program “Piano di Sostegno alla Ricerca”. ii) project MUSA - Multilayered Urban Sustainability Action - project, funded by the European Union - NextGenerationEU, under the National Recovery and Resilience Plan (NRRP) Mission 4 Component 2 Investment Line 1.5: Strengthening of research structures and creation of R&D “innovation ecosystems”, set up of “territorial leaders in R&D” (CUP G43C22001370007, Code ECS00000037), iii) project SERICS (PE00000014) under the NRRP MUR program funded by the EU - NGEU. Views and opinions expressed are however those of the authors only and do not necessarily reflect those of the European Union or the Italian MUR. Neither the European Union nor the Italian MUR can be held responsible for them. References [1] Scarselli, F., Gori, M., Tsoi, A., Hagenbuchner, M. and Monfardini, G., The graph neural network model, IEEE Transactions on Neural Networks 2009, vol. 20, no. 1, pp. 61-80. [2] Jie Zhou, Ganqu Cui, Shengding Hu, Zhengyan Zhang, Cheng Yang, Zhiyuan Liu, Lifeng Wang, Changcheng Li, Maosong Sun, Graph neural networks: A review of methods and applications, AI Open, Volume 1, 2020, Pages 57-81, ISSN 2666-6510, https://doi.org/10.1016/j.aiopen.2021.01.001. [3] Mellor, A. The Temporal Event Graph. Journal of Complex Networks (2017) [4] Kivelä, M., Cambe, J., Saramäki, J. et al. Mapping temporal-network percolation to weighted, static event graphs. Sci Rep 8, 12357 (2018). https://doi.org/10.1038/s41598-018-29577-2 [5] Torricelli, M., Karsai, M. Gauvin, L. weg2vec: Event embedding for temporal networks. Sci Rep 10, 7164 (2020). https://doi.org/10.1038/s41598-020-63221-2 [6] Grover, Aditya and Leskovec, Jure node2vec: Scalable feature learning for networks, Pro- ceedings of the 22nd ACM SIGKDD international conference on Knowledge discovery and data mining (2016) [7] Tsitsulin, Anton and Mottin, Davide and Karras, Panagiotis and Müller, Emmanuel VERSE: Versatile Graph Embeddings from Similarity Measures. Proceedings of the 2018 World Wide Web Conference (2018). https://doi.org/10.1145/3178876.3186120 [8] Zhang, Jie and Dong, Yuxiao and Wang, Yan and Tang, Jie and Ding, Ming ProNE: Fast and Scalable Network Representation Learning. IJCAI (2019) [9] Léo Pio-Lopez, Alberto Valdeolivas, Laurent Tichit, Élisabeth Remy, Anaïs Baudot. Mul- tiVERSE: a multiplex and multiplex-heterogeneous network embedding approach. 2020. ffhal-02997038 [10] Xiaowen Dong, Pascal Frossard, Pierre Vandergheynst, and Nikolai Nefedov Cluster- ing on Multi-Layer Graphs via Subspace Analysis on Grassmann Manifolds (2014). IEEE TRANSACTIONS ON SIGNAL PROCESSING, VOL. 62, NO. 4 [11] Mathias Niepert, Mohamed Ahmed, Konstantin Kutzkov Learning Convolutional Neu- ral Networks for Graphs. Proceedings of the 33rd International Conference on Machine Learning (2016) [12] Bellandi, V. and Ceravolo, P. and Maghool, S. and Siccardi S. Graph embeddings in criminal investigation: towards combining precision, generalization and transparency. World Wide Web 25, 2379–2402 (2022). https://doi.org/10.1007/s11280-021-01001-2 [13] Bellandi, V. and Ceravolo, P. and Maghool, S. and Pindaro, M. and Siccardi, S. Pat- tern Matching Algorithm in Event Graphs, Intl Conf on Cyber Science and Technology Congress, CyberSciTech, Falerna, Italy, 2022, pp. 1-7, doi: 10.1109/DASC/PiCom/CBD- Com/Cy55231.2022.9927860. [14] Bellandi, V. and Ceravolo, P. and Maghool, S. and Pindaro, M. and Siccardi, S. Correlation and pattern detection in event networks, IEEE International Conference on Big Data (Big Data), Orlando, FL, USA, 2021, pp. 4103-4112, doi: 10.1109/BigData52589.2021.9671512. [15] Yao Ma and Suhang Wang and Chara C. Aggarwal and Dawei Yin and Jiliang Tang. Multi- dimensional Graph Convolutional Networks. Proceedings of the 2019 SIAM International Conference on Data Mining (SDM). 2019 https://doi.org/10.1137/1.9781611975673.74 [16] Grassia, M. and De Domenico, M. and Mangioni, G. mGNN: Generalizing the Graph Neural Networks to the Multilayer Case. arXiv:2109.10119 [17] Draief, Moez and Kutzkov, Konstantin and Scaman, Kevin and Vojnovic, Milan. KONG: Kernels for ordered-neighborhood graphs. Advances in Neural Information Processing Systems. Vol. 31. 2018 [18] Kumar, S. and Spezzano, F. and Subrahmanian, V.S. and Faloutsos, C. Edge Weight Predic- tion in Weighted Signed Networks. IEEE International Conference on Data Mining (ICDM), 2016. [19] Kumar, S. and Hooi, B. and Makhija, D. and Kumar, M. and Subrahmanian, V.S. and Faloutsos, C. REV2: Fraudulent User Prediction in Rating Platforms. 11th ACM International Conference on Web Searchand Data Mining (WSDM), 2018 [20] Pareja, A. and Domeniconi, G. and Chen, J. and Ma, T. and Suzumura, T. and Kaneza- shi, H. and Kaler, T. and Schardl, T.B. and Leiserson, C.E. EvolveGCN: Evolving Graph Convolutional Networks for Dynamic Graphs, arXiv:1902.10191, 2019 [21] Leskovec, J. and Huttenlocher, D. and Kleinberg, J. Predicting positive and negative links in online social networks, WWW, 2010.