<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LISE: a Logic-based Interactive Similarity Explainer</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Simona Colucci</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Maria Donini</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Verdiana Schena</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Politecnico di Bari</institution>
          ,
          <addr-line>Via Orabona 4, 70125, Bari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Università della Tuscia</institution>
          ,
          <addr-line>Via S. Maria in Gradi 4, 01100 Viterbo</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This work presents LISE (Logic-based Interactive Similarity Explainer), a system for explaining the similarity of clusters of RDF resources, by identifying common characteristics in their RDF descriptions. LISE follows a pipeline that consists of four main modules: Machine Learning Module, which creates a representation of RDF resources as vector embeddings and clusters them; Logic-Based Module, which, for each cluster, computes a Knowledge Graph (with blank nodes) modeling the common characteristics of resources in the cluster; Natural Language Generation Module, which translates the computed Knowledge Graphs into human-readable descriptions; and User Interaction and Feedback Loop, which collects user feedback about the relevance of generated explanations. LISE operates in a closed loop, leveraging user feedback to refine embeddings and subsequently improve clustering. It was tested on an RDF dataset containing structured drug-related information, demonstrating promising results in terms of explainability and interpretability of clustering results.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Knowledge Graph Embeddings</kwd>
        <kwd>Resource Description Framework</kwd>
        <kwd>Interactive Clustering</kwd>
        <kwd>User Interaction</kwd>
        <kwd>Natural Language Generation</kwd>
        <kwd>Explainable Artificial Intelligence</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Unsupervised learning, particularly clustering, is a fundamental technique for the initial exploration of
unstructured data. Clusterization groups similar data points together, enabling users to identify hidden
patterns and potential classifications by analyzing these groups. However, the notion of "similarity" is
inherently subjective and depends on the users’ objectives and the specific features they prioritize[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
For instance, in drug-related data, diferent aspects such as protein interactions or sequence similarity
might be of primary importance[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Once a clustering algorithm generates groups, it is essential to
understand the characteristics that define each cluster. This is particularly relevant in interactive
clustering [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], where users iteratively refine the process based on their needs. However, comprehensible
feature extraction is not always straightforward, especially when dealing with purely numerical data
or when items are identified by IRIs, whose features can be inferred from a linked Knowledge Graph
(KG). While KGs can theoretically provide meaningful feature information, the embedding process used
in clustering often obscures the shared characteristics within a cluster. To address this challenge, we
developed LISE (Logic-based Interactive Similarity Explainer), a modular system that clusters groups of
RDF resources, and for each cluster, computes an RDF KG describing the commonalities of resources
and generates an English description from such KG; finally, system users vote on the relevance of every
single part of the description to improve the clustering process in an interactive feedback loop.
      </p>
      <p>
        The natural language description of cluster commonalities serves as an explanation service for the
clustering method. From the perspective of eXplainable Artificial Intelligence (XAI) [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], LISE can
be classified as text-based, method-agnostic (compatible with any clustering approach), and post-hoc
(providing explanations after clustering is completed).
      </p>
      <p>With respect to previous work, LISE introduces two main innovations: the application of pruning
techniques to filter out irrelevant information, detailed in Section 2.3, and the inclusion of user interactivity,
which enables a feedback loop to iteratively refine the clustering process.</p>
      <p>The paper is organized as follows. In the next section, LISE architecture and main modules are
described, with reference to an extended use case addressing drugs comparison problem. In particular,
Section 2.1 explains how the input KG is extracted and how entities are selected from the reference RDF
dataset: DrugBank1. Sections 2.2–2.5 explain the working mode of each system component. Section 3
closes the paper.</p>
    </sec>
    <sec id="sec-2">
      <title>2. System</title>
      <p>LISE explains the similarity of RDF resources grouped through clustering by logically computing
commonalities in their RDF descriptions. It implements a pipeline that transforms an RDF dataset into
a KG, generates a representation for KG entities based on vector embeddings, clusters embeddings,
computes a logic-based explanation for clusters, and iterates the process based on user feedback.</p>
      <p>
        LISE architecture is depicted in Figure 1 and structured into four main modules:
• Machine Learning Module: Responsible for representing RDF resources as feature vectors by
training RDF2Vec [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] embedding models and performing clustering to group resources based on
similarity.
• Logic-Based Module: Computes an RDF KG that models the common characteristics of the resulting
clusters while filtering out irrelevant information.
• Natural Language Generation Module: Converts the RDF KG into a human-readable description
by using a template-based approach.
• User Interaction and Feedback Loop: Collects user feedback on each explanation, leveraging this
feedback to refine and retrain the embeddings, thereby improving alignment with user-defined
relevance criteria.
      </p>
      <p>Each module plays a crucial role in enhancing the quality of explanations, contributing to a feedback
loop that iteratively improves the interpretability of clustering results.</p>
      <p>The following sections provide a discussion of system functionalities, by introducing the role of each
module. Preliminarily (Section 2.1), the entity selection process is introduced. Then, Section 2.2 details
the generation of embeddings from RDF resources (Section 2.2.1) and the application of clustering
(Section 2.2.2). Section 2.3 describes the logic-based computation process that produces, for each cluster,
an RDF KG modeling characteristics shared by cluster items. Section 2.4 explains how LISE generates
natural language descriptions from the Knowledge Graphs obtained in Section 2.3; additionally, it
explores the integration in LISE of a Large Language Model (LLM) for Natural Language Generation
(NLG). Finally, Section 2.5 illustrates how user feedback is collected and used to iteratively improve the
system.</p>
      <sec id="sec-2-1">
        <title>2.1. Knowledge Graph and Entity Selection</title>
        <p>
          We show LISE capabilities on DrugBank, a dataset containing structured information about drugs and
available in RDF format. The dataset is first converted into a pyRDF2Vec 2 KG3, in which the set of drugs
to divide into clusters is selected by choosing entities belonging to the Drug class and characterized as
small molecules. This selection process results in a dataset of 7,391 resources. To enhance the quality
of the graph and eliminate redundant information, all predicates irrelevant to drug comparison are
excluded. In our examples, we identified a preliminary list of so-called "stop-patterns"[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ], that we list
in Appendix A, for the sake of reproducibility. For each drug, a subgraph is extracted from the initial
KG, up to a maximum depth of 7. These subgraphs are employed by the Logic-based Module, which is
further analyzed in Section 2.3.
        </p>
        <sec id="sec-2-1-1">
          <title>1https://download.bio2rdf.org/files/current/drugbank/drugbank.html</title>
          <p>2https://pyrdf2vec.readthedocs.io/en/latest/index.html
3pyRDF2Vec KG is a structure for modeling and managing RDF data in the form of a Knowledge Graph, in which triples are
represented as vertices and edges, enabling the extraction of RDF paths.</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Machine Learning Module</title>
        <p>
          2.2.1. Embedding generation
To represent entities as numerical feature vectors, we employ RDF2Vec [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], a method that applies the
word embedding principle to KGs. RDF2Vec represents each entity as a point in a continuous vector
space, providing versatile input for various machine learning applications.
        </p>
        <p>
          Specifically, we use pyRDF2Vec, a Python implementation that extracts walks from KGs to generate
embeddings. Since the number of possible walks can be exponential in the worst case, pyRDF2Vec
samples possible walks, allowing for the selection of diferent sampling strategies [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. RDF2Vec operates
by generating sequences of entities and relations (walks) within the KG, treating them as sentences
from a text corpus. These sequences are then used to train a Word2Vec4 model, which learns vector
representations for each entity, capturing both semantic and structural relationships. The embedding
generation process follows these key steps:
• Definition of the walking strategy : A random walking strategy is employed, where each entity in
the KG is explored through paths with a maximum depth of 7. The number of walks generated per
entity is limited to 3310, corresponding to the highest number of triples present in the extracted
subgraphs. The random state parameter is set to 42.
• Definition of the sampling strategy : We build upon the existing Predicate Frequency Weight [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
strategy, which assigns a weight to each predicate based on its frequency of occurrence in the
KG, but we address a more specific concept of weight — according to the call in the pyRDF2Vec
documentation to develop new walking, sampling, and embedding strategies. In particular,
we introduce a custom sampling strategy, namely Predicate Relevance Weight, which assigns
weights to specific walks based on their relevance for cluster explanation. These scores are
derived from user feedback, where users rate the relevance of explanatory sentences generated
for each cluster (see Figure 9 for a screenshot). LISE learns these weights through an interactive
component, employing a regression model to predict the weights. In the first iteration, the
4https://radimrehurek.com/gensim/models/word2vec.html
weights for each predicate are manually set to 0.01 or 0.02, with the exception of the predicate
http://bio2rdf.org/drugbank_vocabulary:category, which is assigned a weight of
0.99. This is because our goal was to cluster resources based on this specific predicate, and
therefore, we gave it a higher weight.
• Definition of the embedding strategy : The extracted walks are processed as textual sentences and
used as input for the Word2Vec model. The model is configured with the skip-gram architecture
while maintaining default hyperparameters for training.
        </p>
        <p>After applying RDF2Vec to the entire KG, each entity is transformed into a fixed-dimensional vector.
These vector representations enable the application of clustering techniques to group semantically
related entities.</p>
        <p>
          While RDF2Vec provides compact and interpretable embeddings, their efectiveness depends on how
well they preserve entity semantics within the given domain context. Although RDF2Vec generates
embeddings that are computationally eficient in terms of size, their ability to retain the original
semantic content of RDF entities remains disputable [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. Specifically, previous studies have demonstrated
that the assumption underlying RDF2Vec—that similar entities will have similar embeddings—is not
consistently supported in real-world machine learning applications [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. Indeed, when analyzing the
common properties of the entities within the clusters using LISE, we find that the explanations are often
uninformative. This indicates that, although clustering groups entities that are close in the embedding
space, they are not necessarily semantically similar. To analyze the distribution of embeddings, we
employ Principal Component Analysis (PCA) from the scikit-learn library [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] to project the original
100-dimensional embeddings into a two-dimensional space for visualization (Figure 2).
2.2.2. Clustering
Clustering is performed using the k-means algorithm via the scikit-learn library [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] with the parameters
in Table 1. Clustering models alternative to k-means were tested, but they did not produce significant
improvements in terms of amount or relevance of common information within each cluster.
        </p>
        <p>Number of Clusters</p>
        <p>Initialization</p>
        <p>Random State
Number of Initializations</p>
        <p>302
k-means++
42
5</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Logic-Based Module</title>
        <p>
          LISE explains clusters by performing a logic-based processing of RDF KGs, for computing the
characteristics shared among clustered resources. A core component of this process is the computation of
the Least Common Subsumer (LCS) [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ], an RDF KG which describes the common features of a group
of RDF resources. The LCS is computed on the subgraphs previously extracted for the resources to
compare (Section 2.1).
        </p>
        <p>To ensure meaningful graph comparison, we perform some optimizations:</p>
        <p>• we exclude from the subgraphs all walks on predicates irrelevant to the comparison, by providing
a list of stop-patterns (Appendix A)
• we filter out explicitly defined uninformative triples (Appendix B) from the LCS, transforming it
into a Common Subsumer (CS) that consequently retains only relevant information.</p>
        <p>To provide a general understanding of the dataset commonalities, we first compute the CS to the
entire dataset, identifying information shared across all resources. This CS is a rather uninformative
KG, that is shown in Appendix C and models only one commonality: the fact that all drugs have a type
which is small molecule5.</p>
        <p>Subsequently, to explain individual clusters, LISE computes the CS for each cluster. A sample CS of
Cluster no.58, which contains 52 elements, is shown in Appendix D6. In this case, the CS models more
commonalities among the 52 cluster items. First, also drugs in Cluster no.58 are of kind small molecule;
second, they share a target protein, labeled as Tyrosine-protein phosphatase non-receptor
type 1 [drugbank:BE0000623], which is fully described in the CS. In particular, the CS shows that
this target afects a human organism type.</p>
        <p>
          Although the generated explanations are already filtered for irrelevant details, LISE further refines
them by applying two irrelevance definitions taken from previous work[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Specifically, LISE prunes:
• information "irrelevant to the context", i.e., the features shared by the entire dataset, because they
5Figure 6 shows this specific CS, verbalized in natural language using the template-based approach described in Section 2.4.
6A verbalization of Cluster no.58, using the template-based approach, is shown in Figure 5.
        </p>
        <p>do not discriminate clusters.
• information "irrelevant to the user", i.e., information already known to the user, based on his/her</p>
        <p>Personal KG.</p>
        <p>In the use case addressed, information irrelevant to the context is the one listed in Appendix C,
stating that all drugs have type small molecule.</p>
        <p>Regarding irrelevance to the user, in the use case we assume that user knows that some drug targets
(Pyridoxal kinase, Cyclin-dependent kinase 2, Glutathione S-transferase A1, Tyrosine-protein kinase
JAK2, Tyrosine-protein phosphatase non-receptor type 1) act on human organism. Thus, we model
his/her Personal Knowlwedge Graph in RDF, as shown in Appendix E.</p>
        <p>By pruning the CS of Cluster no.58 of both types of irrelevant information, we obtain a new CS,
shown in Appendix F7. This new CS does not include the information that all cluster items have small
molecules (irrelevant to the context) and that protein "Tyrosine-protein phosphatase non-receptor type
1" acts on the human organism (already known to the user).</p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Natural Language Generation Module</title>
        <p>This component is responsible for translating the relevant commonalities modeled by the Common
Subsumer, as previously computed and refined by the logic-based component (Section 2.3) of LISE, into
human-readable explanations.
7The new CS is also shown in Figure 7 in a human-readable format, by using the template-based verbalization approach.</p>
        <p>
          To this end, LISE integrates a previously developed template-based Natural Language Generation
tool [
          <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
          ].
        </p>
        <p>In our use case, LISE produces the explanation in Figure 5 for Cluster no.58, before pruning irrelevance.
By pruning information irrelevant both to the context (explained in Figure 6, thanks to the tool) and to
the user, the explanation in Figure 7 is generated.</p>
        <p>While efective in producing clear and understandable explanations in specific domains, the NLG tool
requires the manual definition of a context-specific dictionary, a task that is challenging to automate.
To address this limitation, we explored the use of Large Language Models as an alternative to the
template-based tool. Specifically, we achieved promising results by training Google Gemini 8 to process
RDF Knowledge Graphs, focusing on CSs, which represent the common characteristics to verbalize in
natural language. With reference to our use case, we achieved the explanation shown in Figure 8 for
Cluster no.58.</p>
        <p>Our experiment utilizes the Google Gemini 1.5-Flash9 model. Results are generated via an API call in
Python, which processes the RDF triples in NT format related to a Common Subsumer. The API call,
executed with the request "Verbalize with discursive phrases these RDF NT triples", is defined with the
following system instruction:</p>
        <sec id="sec-2-4-1">
          <title>8https://ai.google.dev/ 9https://ai.google.dev/gemini-api/docs/models/gemini?hl=it#gemini-1.5-flash</title>
          <p>For each blank node (identified with ’genid’) in the text you are given, you must
associate a generic variable. Create a dictionary composed of blank nodes with
associated variable. Then verbalize in natural language the triples present in the
text considering the defined vocabulary and when the variables are repeated you must
refer to that one by continuing the same sentence. Starting the sentence with the
verbalized form of root node, create discursive sentences.</p>
          <p>Verbalize in natural language the URIs content.</p>
          <p>Verbalize in natural language the LITERALS content.</p>
          <p>Handle blank nodes with general terms.</p>
          <p>Return only verbalization.</p>
          <p>Additionally, the root node of the CS is passed to the API call as a variable. Table 2 shows the
parameters of API call.</p>
          <p>A comparison between the explanations generated by the template-based tool (Figure 5) and Google
Gemini (Figure 8) reveals comparability in terms of clarity and interpretability.</p>
          <p>Although both approaches require a degree of customization, Gemini’s APIs enable semi-automated
training, ofering greater flexibility and adaptability to diferent domains. Despite these advantages,
LISE continues to use the template-based tool, primarily due to system modularity. In fact, the output
of the NLG module is processed and passed to the User Interaction and Feedback Loop module, which
collects user feedback on the relevance of the generated explanations. Currently, these module requires
explanations to follow the template structure, limiting the immediate adoption of LLM-based solutions.
We are therefore investigating strategies to make the interactive components independent of the NLG
template, thereby increasing system flexibility.</p>
        </sec>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. User Interaction and Feedback Loop</title>
        <p>In this section, we present the human-in-the-loop interactive approach used to let the user evaluate and
refine explanations generated by the Natural Language Generation component (Section 2.4).</p>
        <p>LISE collects user feedback through a Graphical User Interface (GUI) developed using the Tkinter10
Python library. This interface enables users to rate explanation sentences derived from a logically
computed CS that abstracts cluster commonalities. The collected ratings are subsequently leveraged to
enhance the system’s ability to predict relevance scores for new explanations.</p>
        <p>Figure 9 shows the GUI corresponding to the CS of Cluster no.58, pruned of irrelevant information.</p>
        <p>The interface presents the generated explanation in multiple sentences, each associated with an RDF
pattern in the Common Subsumer. Users provide feedback via a star-rating system ranging from 1 to 5
stars. These ratings are subsequently normalized into numerical values on a scale from 0 (0 stars) to 1
(5 stars) in increments of 0.2.</p>
        <p>
          The user-provided ratings are stored in a structured data model, where each RDF pattern verbalized
in an explanation sentence is assigned a weight. Once feedback is collected, it is used as a dataset for
training a linear regression model to predict relevance scores for RDF patterns. In the current version,
LISE implements the ’LinearRegression’ model from the Scikit-learn Python library [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ].
        </p>
        <p>The weights estimated by the regression model are then used to refine the embedding generation
process. In particular, the Predicate Relevance Weight sampling strategy we proposed in Section 2.2
adopts weights learned by the regression model. Consequently, by training RDF2VeC, LISE follows
a relevance-focused sampling strategy that, while computing the embeddings, gives priority to the
patterns more relevant to users.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Conclusion and Future Work</title>
      <p>LISE introduces an approach for explaining the clustering of RDF resources by combining machine
learning, knowledge-based reasoning, and natural language generation within an interactive feedback
loop. Its characteristics of being text-based, method-agnostic, and post-hoc ensure flexibility and
adaptability, making LISE a valuable tool to enhance the interpretability of clustering models. The
results obtained from an RDF dataset containing structured drug-related information demonstrate the
efectiveness in generating comprehensible and relevant explanations for users. The integration of
human feedback enables the progressive refinement of vector representations, improving the alignment
between the generated explanations and user expectations. However, the current template-based natural
language generation method presents limitations in terms of flexibility. For this reason, as part of our
future work, we plan to replace the template-based approach with one employing Large Language
Models (but still allowing the user to choose what information is considered relevant) to overcome
these constraints and further enhance the quality of generated explanations. Moreover, we intend to
explore the use of diferent Knowledge Graph embeddings models to more accurately capture semantic
similarities between cluster resources.</p>
    </sec>
    <sec id="sec-4">
      <title>Acknowledgments</title>
      <p>We acknowledge partial support by the italian aerospace technological district "Distretto Tecnologico
Aerospaziale S.C.a.r.l." and project "LIFE: the itaLian system wIde Frailty nEtwork” founded by Ministry
of Health (CUP D93C22000640001).</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The authors have not employed any Generative AI tools.</title>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>A. Stop Patterns</title>
      <p>
        In the appendices, we use the RDF Turtle syntax [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], and refer to the following prefixes:
Stop-patterns (determined heuristically) are patterns of RDF triples considered irrelevant for explaining
similarity. They are constituted by all triples &lt;s p o&gt; meeting at least one of the following criteria:
•  ∈ { : ,
 :  ,
 : ,
  : ,
  : ,
 : ,
 : ,
2 _ :  ,
2 _ : ,
2 _ : ,
2 _ :  −  .,
_ : ,
_ : ,
_ : ,
_ : ,
_ : ,
_ : ,
_ : ,
_ :  ,
_ : ,
_ : ,
_ : ,
_ : ,
_ :  −  − ,
_ :  − ,
_ :  − ℎ ,
_ :  − ,
_ :  − ,
_ :  − ℎ,
_ :  − ℎ,
_ :  − ,
_ :  − ,
_ :  − ℎ,
_ :  − ,
_ :  − ,
_ :  − ,
_ :  − ℎ,
_ :  − ℎ,
_ :  − ℎ,
_ :  − ,
_ :  − ,
_ :  − }
•  =  :  and  ∈
{ :  ,
 :   ,
 :  ,
_ : ,
_ :  ,
_ : }
B.
      </p>
    </sec>
    <sec id="sec-7">
      <title>Uninformative</title>
    </sec>
    <sec id="sec-8">
      <title>Triples</title>
      <p>Uninformative triples are those &lt;s p o&gt; triples
provided that o has no successors:
•  ∈ { : ,
  : ,
  :   ,
  : ,
meeting at least one of the conditions listed below,
_ : ,
 : ,
_ : ,
_ : ,
_ : }
•  is a blank node</p>
    </sec>
    <sec id="sec-9">
      <title>C. Common Subsumer of the entire dataset</title>
      <p>The following triple represents the CS of the entire dataset of drugs selected from Drugbank:
_:N469a9065937b4f73a25076fe400b2502 drugbank_vocabulary:type drugbank_vocabulary:Small-molecule .</p>
      <sec id="sec-9-1">
        <title>We recall that all identifiers starting with _: denote a blank node.</title>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>D. Common Subsumer of Cluster no.58</title>
      <p>In what follows, the complete CS of cluster no.58 is shown. The CS is an RDF knowledge graph rooted
in the blank node _:Nc9029d61afbb426e8559468beee0277f.
_:Nc9029d61afbb426e8559468beee0277f drugbank_vocabulary:target bio2rdf:drugbank:BE0000623 ;
drugbank_vocabulary:type drugbank_vocabulary:Small-molecule .
bio2rdf:genatlas:PTPN1 a bio2rdf:genatlas_vocabulary:Resource .
bio2rdf:genecards:PTPN1 a bio2rdf:genecards_vocabulary:Resource .
bio2rdf:gi:190742 a bio2rdf:gi_vocabulary:Resource .
bio2rdf:hgnc:H9642 a bio2rdf:hgnc_vocabulary:Resource .</p>
    </sec>
    <sec id="sec-11">
      <title>E. Personal Knowledge Graph</title>
      <p>The following triples set represents the Personal Knowledge Graph of an hypothetical user of LISE,
used in the addressed use case:</p>
    </sec>
    <sec id="sec-12">
      <title>F. Common Subsumer of Cluster no.58 pruned of irrelevant information</title>
      <p>In what follows, it is shown the CS of cluster no.58 pruned of irrelevant information. The CS is an RDF
knowledge graph rooted in the blank node _:Nc9029d61afbb426e8559468beee0277f.
bio2rdf:drugbank:BE0000623 a drugbank_vocabulary:Target ;
drugbank_vocabulary:cellular-location "Endoplasmic reticulum; endoplasmic reticulum
membrane; peripheral membrane protein; cytoplasmic side" ;
drugbank_vocabulary:gene-name "PTPN1" ;
drugbank_vocabulary:general-function "Involved in protein tyrosine phosphatase activity" ;
drugbank_vocabulary:locus "20q13.1-q13.2" ;
drugbank_vocabulary:molecular-weight "49967.0" ;
drugbank_vocabulary:name "Tyrosine-protein phosphatase non-receptor type 1" ;
drugbank_vocabulary:specific-function "May play an important role in CKII- and p60c-src-induced
signal transduction cascades" ;
drugbank_vocabulary:theoretical-pi "6.21" ;
drugbank_vocabulary:transmembrane-regions "409-431" ;
drugbank_vocabulary:x-genatlas bio2rdf:genatlas:PTPN1 ;
drugbank_vocabulary:x-genecards bio2rdf:genecards:PTPN1 ;
drugbank_vocabulary:x-gi bio2rdf:gi:190742 ;
drugbank_vocabulary:x-hgnc bio2rdf:hgnc:H9642 .</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>J.</given-names>
            <surname>Bae</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Helldin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Riveiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Nowaczyk</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.-R. Bouguelia</surname>
          </string-name>
          , G. Falkman,
          <article-title>Interactive clustering: A comprehensive review</article-title>
          ,
          <source>ACM Computing Surveys (CSUR) 53</source>
          (
          <year>2020</year>
          )
          <fpage>1</fpage>
          -
          <lpage>39</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>R. T.</given-names>
            <surname>Sousa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Silva</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Pesquita</surname>
          </string-name>
          ,
          <article-title>Supervised biomedical semantic similarity</article-title>
          ,
          <source>IEEE Access 11</source>
          (
          <year>2023</year>
          )
          <fpage>60635</fpage>
          -
          <lpage>60645</lpage>
          . doi:
          <volume>10</volume>
          .1109/ACCESS.
          <year>2023</year>
          .
          <volume>3285406</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Barredo Arrieta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Díaz-Rodríguez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Del</given-names>
            <surname>Ser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bennetot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Tabik</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Barbado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Garcia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Gil-Lopez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Molina</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Benjamins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Chatila</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Herrera</surname>
          </string-name>
          ,
          <article-title>Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai</article-title>
          ,
          <source>Information Fusion</source>
          <volume>58</volume>
          (
          <year>2020</year>
          )
          <fpage>82</fpage>
          -
          <lpage>115</lpage>
          . URL: https://www.sciencedirect.com/science/article/pii/S1566253519308103. doi:https://doi.org/10.1016/j.inffus.
          <year>2019</year>
          .
          <volume>12</volume>
          .012.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Rosati</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T. D.</given-names>
            <surname>Noia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Leone</surname>
          </string-name>
          , H. Paulheim,
          <article-title>Rdf2vec: Rdf graph embeddings and their applications</article-title>
          ,
          <source>Semantic Web</source>
          <volume>10</volume>
          (
          <year>2019</year>
          )
          <fpage>721</fpage>
          -
          <lpage>752</lpage>
          . URL: https://madoc.bib.uni-mannheim.de/50498/. doi:
          <volume>10</volume>
          .3233/SW-180317.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.</given-names>
            <surname>Colucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Sciascio</surname>
          </string-name>
          ,
          <article-title>Logical comparison over RDF resources in bio-informatics</article-title>
          ,
          <source>J. Biomed. Informatics</source>
          <volume>76</volume>
          (
          <year>2017</year>
          )
          <fpage>87</fpage>
          -
          <lpage>101</lpage>
          . URL: https://doi.org/10.1016/j.jbi.
          <year>2017</year>
          .
          <volume>11</volume>
          .004. doi:
          <volume>10</volume>
          .1016/ J.JBI.
          <year>2017</year>
          .
          <volume>11</volume>
          .004.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>B.</given-names>
            <surname>Steenwinckel</surname>
          </string-name>
          , G. Vandewiele,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bonte</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Weyns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>De Turck</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Ongenae</surname>
          </string-name>
          ,
          <article-title>Walk extraction strategies for node embeddings with rdf2vec in knowledge graphs</article-title>
          , in: G. Kotsis,
          <string-name>
            <given-names>A. M.</given-names>
            <surname>Tjoa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Khalil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Moser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mashkoor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Sametinger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fensel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Martinez-Gil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Czech</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Sobieczky</surname>
          </string-name>
          , S. Khan (Eds.),
          <source>Database and Expert Systems Applications - DEXA 2021 Workshops</source>
          , Springer International Publishing, Cham,
          <year>2021</year>
          , pp.
          <fpage>70</fpage>
          -
          <lpage>80</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Ristoski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. P.</given-names>
            <surname>Ponzetto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <article-title>Biased graph walks for rdf graph embeddings</article-title>
          ,
          <source>in: Proceedings of the 7th International Conference on Web Intelligence</source>
          , Mining and Semantics, WIMS '17,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2017</year>
          . URL: https://doi.org/ 10.1145/3102254.3102279. doi:
          <volume>10</volume>
          .1145/3102254.3102279.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Jain</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kalo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Balke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krestel</surname>
          </string-name>
          ,
          <article-title>Do embeddings actually capture knowledge graph semantics?</article-title>
          , in: R.
          <string-name>
            <surname>Verborgh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Hose</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Champin</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Maleshkova</surname>
            , Ó. Corcho,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Ristoski</surname>
          </string-name>
          , M. Alam (Eds.),
          <source>The Semantic Web - 18th International Conference, ESWC</source>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          , June 6-10,
          <year>2021</year>
          , Proceedings, volume
          <volume>12731</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2021</year>
          , pp.
          <fpage>143</fpage>
          -
          <lpage>159</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -77385-
          <issue>4</issue>
          _9. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>030</fpage>
          -77385-4\_9.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>N.</given-names>
            <surname>Hubert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Paulheim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Brun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Monticolo</surname>
          </string-name>
          ,
          <article-title>Do similar entities have similar embeddings?</article-title>
          , in: A.
          <string-name>
            <surname>Meroño Peñuela</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Dimou</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Troncy</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Hartig</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Acosta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Alam</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , P. Lisena (Eds.),
          <source>The Semantic Web</source>
          , Springer Nature Switzerland, Cham,
          <year>2024</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>21</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , E. Duchesnay,
          <article-title>Scikit-learn: Machine learning in Python</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          )
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Colucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. Di</given-names>
            <surname>Sciascio</surname>
          </string-name>
          ,
          <article-title>Computing the commonalities of clusters in resource description framework: Computational aspects</article-title>
          ,
          <source>Data</source>
          <volume>9</volume>
          (
          <year>2024</year>
          ). URL: https://www.mdpi.com/ 2306-5729/9/10/121. doi:
          <volume>10</volume>
          .3390/data9100121.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Colucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          , E. Di Sciascio,
          <article-title>On the relevance of explanation for RDF resources similarity, in: Model-Driven Organizational</article-title>
          and Business Agility - Third International Workshop, MOBA 2023, volume
          <volume>488</volume>
          <source>of LNBIP</source>
          , Springer,
          <year>2023</year>
          , pp.
          <fpage>96</fpage>
          -
          <lpage>107</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S.</given-names>
            <surname>Colucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Iurilli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Sciascio</surname>
          </string-name>
          ,
          <article-title>A business intelligence tool for explaining similarity</article-title>
          , in: E. Babkin,
          <string-name>
            <given-names>J.</given-names>
            <surname>Barjis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Malyzhenkov</surname>
          </string-name>
          , V. Merunka (Eds.),
          <string-name>
            <surname>Model-Driven Organizational</surname>
          </string-name>
          and Business Agility - Second International Workshop, MOBA 2022, Leuven, Belgium, June 6-7,
          <year>2022</year>
          , Revised Selected Papers, volume
          <volume>457</volume>
          <source>of Lecture Notes in Business Information Processing</source>
          , Springer,
          <year>2022</year>
          , pp.
          <fpage>50</fpage>
          -
          <lpage>64</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -17728-
          <issue>6</issue>
          _5. doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -17728-6\_5.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Colucci</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Donini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D.</given-names>
            <surname>Sciascio</surname>
          </string-name>
          ,
          <article-title>Explaining commonalities of clusters of RDF resources in natural language</article-title>
          , in: A.
          <string-name>
            <surname>Appice</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Azzag</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hacid</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Hadjali</surname>
            ,
            <given-names>Z. W.</given-names>
          </string-name>
          <string-name>
            <surname>Ras</surname>
          </string-name>
          (Eds.),
          <source>Foundations of Intelligent Systems - 27th International Symposium, ISMIS</source>
          <year>2024</year>
          , Poitiers, France, June 17- 19,
          <year>2024</year>
          , volume
          <volume>14670</volume>
          of Lecture Notes in Computer Science, Springer,
          <year>2024</year>
          , pp.
          <fpage>160</fpage>
          -
          <lpage>169</lpage>
          . URL: https://doi.org/10.1007/978-3-
          <fpage>031</fpage>
          -62700-2_
          <fpage>15</fpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -62700-2\_
          <fpage>15</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>D.</given-names>
            <surname>Beckett</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          , Turtle - Terse
          <source>RDF Triple Language, W3C Team Submission</source>
          ,
          <year>2011</year>
          . URL: http://www.w3.org/TeamSubmission/turtle/.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>