<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>F. Guerra);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Riccardo Benassi</string-name>
          <email>riccardo.benassi@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michele Luca Contalbo</string-name>
          <email>micheleluca.contalbo@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Del Buono</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Guerra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giacomo Guiduzzi</string-name>
          <email>giacomo.guiduzzi@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Paganelli</string-name>
          <email>matteo.paganelli@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sara Pederzoli</string-name>
          <email>sara.pederzoli@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donato Tiano</string-name>
          <email>donato.tiano@unimore.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Entity Matching, Data Integration, Explainable AI, Interpretability, Explainer systems</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of Modena and Reggio Emilia</institution>
          ,
          <addr-line>Via P. Vivarelli 10, Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0003</lpage>
      <abstract>
        <p>Deep learning models achieve high performance in Entity Matching tasks, but lack interpretability, limiting user understanding of their decision-making process. Several explainers, such as LIME, Mojito, Landmark, LEMON, and CERTA, have been proposed in the literature to address this issue. However, these methods primarily focus on model fidelity without prioritizing comprehensibility, resulting in explanations dificult to interpret. This extended abstract introduce CREW, a system designed to explain matching decisions. CREW enhances both interpretability and fidelity by grouping words from EM records based on semantic similarity, dataset structure, and their importance to the model. Experimental results demonstrate that CREW produces explanations that are both more interpretable for users and more faithful to the model compared to existing methods.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ISSN1613-0073</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        As data continues to grow and become more distributed across multiple sources, ensuring its
consistency and reliability becomes increasingly challenging. Entity Matching (EM) addresses
this challenge by detecting duplicate records, which improves data quality and make the data
more efective for downstream applications. Current state-of-the-art EM methods rely on
deep learning, in particular transformer-based architectures, to achieve high performance [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ].
However, these models sufer from a lack of interpretability, making it challenging to understand
the rationale behind their decisions and limiting their applicability in real-world scenarios [
        <xref ref-type="bibr" rid="ref3 ref4 ref5 ref6">3,
4, 5, 6</xref>
        ]. Explanation systems aim to shed light on these complex models, fostering trust and
enabling their use in sensitive domains [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>The typical approach for explaining EM models is to use local post-hoc methods that build
surrogate models to simulate the decision-making process of the model around a specific data
nEvelop-O
Approach Level cFoerart.ure Text</p>
      <p>
        Focus
LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]
SHAP [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
Landmark [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] Token
Mojito [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
Mojito [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
CERTA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]
GMASK [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] Cluster
CREW (our) Cluster
      </p>
      <p>Within
Attr. attribute 7
7
3</p>
      <p>Model fidelity
3
3
3
3 UMnoddeerlsftiadbeilliittyy</p>
      <p>Label: Non Match</p>
      <p>Title Artist Genre Time Release Stars Come Out Mason Remix Dance Match
DSaMtanarcsseoCnEolRemceetmroOinxuict / / 5:49 20-May-14 Electronic 5:49 20-May-14</p>
      <p>Title Artist Genre Time Release Stars CEloemcteroOniuct4F:r0a8ncMisayR2e0m2ix0D14ance &amp;
SFtararsncCisomReemOixut / EDleacntcroen&amp;ic 4:08 M2a0y1240 LIME</p>
      <p>Dance &amp;
Stars Come Out Electronic 20-May-14
DaMnacsFeoraEnnlRecciestmroinxic 54::4098 M2a0y1240</p>
      <p>Mojito</p>
      <p>Stars Come Out EDleacntcroen&amp;ic May 2014</p>
      <p>May-14
Mason Remix 20 54::4098 Francis</p>
      <p>CREW</p>
      <p>
        NonMatch
point. Notable examples include LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], SHAP [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], ExplainER [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Mojito [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], LEMON [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
Landmark Explanation [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and CERTA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Additionally, other post-hoc methods not
originally designed for EM can also be applied. For instance, GMASK [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], which explains NLI
models by identifying clusters of correlated words, can be adapted to explain EM models. These
methods, as shown in Figure 1, assign feature importances at varying granularities: LIME,
SHAP, and Landmark focus on individual tokens, Mojito and CERTA evaluate the impact of
attributes, while GMASK computes impacts at the cluster-level.
      </p>
      <p>Despite their usefulness, existing explanation techniques sufer from several limitations.
Token-level explanations help identify the most influential tokens in a prediction, however, they
tend to be verbose and may oversimplify the EM task, failing to capture complex patterns and
dependencies between words within records. In contrast, attribute-level explanations ofer a
more compact representation but 1) they are unsuitable for textual data, which is prevalent in
real-world product description datasets, 2) struggle with noisy data, where misplaced attribute
values may cause the contribution of an attribute to rely on semantically unrelated information,
and 3) provide only an approximate representation of the model’s behavior, where dominant
impacts might obscure less obvious but still important impacts within the same attribute.
Cluster-level explanations address some of these issues by modeling relationships between
words when attributing importance, rather than treating them independently. However, they
can still sufer from poor interpretability, as most existing methods focus solely on fidelity to
the model rather than ensuring that the explanations are easily understandable for users.</p>
      <p>
        In this paper, an extended abstract of our previous work [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], we explore the use of CREW
(Cluster of RElated Words), an explanation technique for entity matching that overcomes the
previous limitations by prioritizing the usability and interpretability of explanations. More
specifically, this approach generates explanations by grouping words into clusters which are
designed to satisfy three key properties: they should contain semantically similar words, align
with the structure of entity descriptions when attributes are present, and exhibit distinct
contributions to the model’s decision. Ensuring semantic coherence within clusters allows
users to associate them with specific entity characteristics, while preserving attribute structure
maintains the logical organization of the dataset. Additionally, diferentiating clusters based on
their influence helps users identify the most critical factors driving the model’s predictions.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related work</title>
      <p>
        Entity Matching (EM) is predominantly addressed using deep learning models, which have
proven highly efective even in noisy scenarios. Early approaches like DeepER [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] and
DeepMatcher [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] pioneered the use of deep learning for EM, while state-of-the-art methods such
as Ditto [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and R-SupCon [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] leverage transformer architectures to automatically extract
meaningful record representation supporting the matching task. Other EM approaches based
on transformer architectures are discussed in [18], and a recent survey on this topic is available
in [19]. Despite their high accuracy, these models operate as black boxes, making it dificult to
understand the factors influencing their predictions [
        <xref ref-type="bibr" rid="ref2">2, 20, 21</xref>
        ].
      </p>
      <p>
        Interpreting the behavior of EM models is an emerging research problem that is addressed
through intrinsically explainable models or post-hoc methods. Intrinsically explainable models
generate matching predictions through self-explanatory structures that inherently provide
insight into their decision process. To our knowledge, WYM [22] is the only approach in this
category. This model adopts a fully interpretable workflow, which involves three main steps:
extracting decision units from the input pair of records, assigning relevance scores to each
decision unit, and combining these decision units with their associated relevance scores using a
white-box machine learning classifier. In contrast, most existing methods for EM explanation
rely on post-hoc analysis. Mojito [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Landmark [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], and LEMON [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] use LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] to estimate
token impact but difer in granularity: Mojito provides attribute-level explanations, while
Landmark generates dual token-level explanations. LEMON further extends Landmark by
producing counterfactual explanations, which provide examples of values that can flip the
prediction and whose granularity is automatically determined. CERTA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] is another post-hoc
technique that does not rely on LIME, but instead uses a probabilistic approach to estimate
the impact of each token. The method links a pair of records to a third reference record, from
which it extracts a minimal set of values. These values are then incorporated into the original
record pair to assess how the prediction would change with the added information.
      </p>
      <p>All of the previous methods prioritize model fidelity, aiming to generate explanations that
faithfully reflect the underlying model’s decision process. In contrast, CREW focuses on
enhancing explanations comprehensibility by selecting relevant information in a way that is
easy for users to understand matching decisions while maintaining fidelity to the model.</p>
    </sec>
    <sec id="sec-4">
      <title>3. The CREW’s approach</title>
      <p>CREW generates explanations for matching decisions through a two-step approach. First, it
clusters words from the target record pair into semantically meaningful groups to enhance user
interpretation. Then, it quantifies each cluster’s contribution to the model’s decision,
highlighting those that support a match versus those that indicate a non-match. More specifically, given
an EM model ℳ that labels a word sequence representing a pair of records  = ( 1,  2, … ,   ) as
1 (match) or 0 (non-match), CREW first organizes the words of the pair  into a knowledge graph,
which better models the correlations between words. A correlation clustering algorithm is then
applied to this graph, producing word clusters  1, … ,   . Each group is then scored based on its
impact on the matching decision, i.e.,   1, … ,    . This operation is performed using a standard
post-hoc local explainer (e.g., LIME). Finally, the clusters and their respective impacts are paired
together to generate the final explanation of the target pair  , i.e.,   = {(  ,   )| = 1, … , } .</p>
      <sec id="sec-4-1">
        <title>3.1. Clustering the words</title>
        <p>CREW implements the clustering operation by running an instance of the correlation clustering
algorithm [23]. The input of this algorithm is a knowledge graph, which we define as follows.
Definition 1 (Knowledge graph). We define the knowledge graph as a fully connected,
weighted graph  = ( , ) , where the vertices are the words of an EM pair, and the edges represent
the level of relatedness between the corresponding words. Each edge (, ) is associated with a label
 , , which can take values + or −, depending on whether the connection is a positive or negative
evidence of relatedness. We define  + and  − as the sets of edges labeled + and −, respectively, i.e.,
 + = {(, ) ∣  , = +}, and  − = {(, ) ∣  , = −}.</p>
        <p>
          The algorithm aims to identify a partitioning of the words that maximizes the agreement of
correlations and minimizes the disagreement of correlations for words belonging to the same
cluster. Formally, the problem translates into the identification of a valid assignment of the
variable  , ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ] (where 0 indicates that two vertices are included in the same cluster and
1 the opposite case) that minimizes the sum of the negative edges included in a cluster and
maximizes the sum of positive edges.
        </p>
        <p>minimize</p>
        <p>∑   , (1 −  , ) +
(.)∈ −</p>
        <p>
          ∑   ,  ,
(.)∈ +
subject to  , ∈ [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ],  , +  ,
≥  , ,  , =  ,
        </p>
        <p>To generate user-friendly explanations, CREW clusters words based on three relationships:
semantic relatedness, grouping related terms to highlight key entity aspects; schema relatedness,
organizing words according to the dataset schema to reflect the data provider’s perspective;
and importance relatedness, prioritizing words based on their influence on the model’s decision.
As specified in These relationships are combined linearly to determine edge weights.
Semantic relatedness. CREW measures semantic relatedness between words using the cosine
similarity of their BERT embeddings1. These similarity scores are zero-centered (i.e., adjusted
by subtracting the mean similarity) and used as the first component in determining edge weights
in the graph.</p>
        <p>Schema relatedness. CREW incorporates the knowledge of the attribute-based structure of
entity descriptions into the graph’s edge weights by applying a constant penalty  to edges
connecting words from diferent attributes.</p>
        <p>Importance relatedness. To encourage clustering of words that contribute similarly to the
model’s decision, CREW first uses LIME to obtain word-level impact scores. These scores are
positive for words that support a matching prediction and negative for those that push toward
a non-matching prediction. CREW then compares these impact scores pairwise to derive a
measure of importance relatedness. The goal is to assign scores close to 1 for words with
1https://huggingface.co/bert-base-uncased
similar impacts reinforcing the same class and close to -1 for words with opposing impacts (i.e.,
contributing to diferent classes). Equation 1 defines how the importance relatedness between
two words  1 and  2 with impact scores   1 and   2 is computed in CREW. The sign of their
ratio determines whether the words support the same or opposing predictions, ensuring that
words with conflicting impacts are not clustered together. The ratio of the smaller to the larger
absolute impact quantifies similarity on a scale from 0 to 1. Finally, the average absolute impact
scales the correlation, weighting stronger influences more heavily.</p>
        <p>(</p>
        <p>sign(  1) ⋅ min(|  1|, |  2|) |  1| + |  2|
 1,   2) = sign(  2) max(|  1|, |  2|) ⋅ 2
(1)</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Weighting the clusters</title>
        <p>After generating word clusters, CREW evaluates their impact on the model’s decision. To
achieve this, CREW relies on LIME, though other explainability methods can be easily adapted.
Unlike its previous use for computing importance relatedness, LIME is now applied at the
cluster level rather than individual words. To enable this, we introduce two key modifications
to LIME’s standard workflow. First, all words within a cluster are concatenated using a special
character, preventing LIME’s tokenizer from splitting them and ensuring they are treated as a
single unit during perturbation. Second, once LIME generates perturbed samples, the words are
split back before being fed into the EM model, restoring their original format. These adjustments
allow LIME to assess cluster-level impact while keeping both LIME itself and the EM model
unchanged, requiring only minimal string processing to assign cluster weights.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Experimental evaluation</title>
      <p>
        The experimental evaluation aims to demonstrate three complementary properties of CREW:
1) the ability to generate explanations that are easily understandable and intuitive for the
users (Section 4.1); 2) the fidelity in explaining the decisions made by a black-box EM model
(Section 4.2); and 3) the eficiency in generating explanations (Section 4.3). For additional
experiments and the full results of certain evaluations summarized here due to space constraints,
we refer readers to our original paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>Datasets. We conducted experiments using the Magellan benchmark datasets2, a widely
recognized reference for Entity Matching. These datasets, organized as entity pairs with shared
attributes, fall into three categories: structured, textual, and dirty. Following standard practice
in explainable AI, we sampled 100 pairs per dataset, evenly split between matching and
nonmatching pairs.</p>
      <p>
        Baselines and settings. We compare CREW with LIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Mojito [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], and GMASK [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
each representing a diferent family of explainability techniques. LIME provides token-level
explanations, Mojito generates attribute-level explanations for EM, and GMASK groups correlated
words, originally for NLI tasks. Mojito is the most directly comparable, as its weighted attributes
align with our weighted word clusters. To align LIME and GMASK with CREW’s output, we
2https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md
applied the OPTICS algorithm [24] to cluster words by importance scores. All explanation
systems were tested on the same BERT-based EM models, fine-tuned on the benchmark datasets
using a classification layer on top [
        <xref ref-type="bibr" rid="ref1">1, 18</xref>
        ]. However, the methods remain model-agnostic and can
be applied to other EM models like Ditto [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or DeepMatcher [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. We refer interested readers
to the original paper [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] for more information about the approach configurations.
      </p>
      <sec id="sec-5-1">
        <title>4.1. Explanation comprehensibility</title>
        <p>
          This section evaluates whether the explanations generated by CREW and competing approaches
are comprehensible and usable for users. To conduct this evaluation, we adopt an LLM-based
approach, leveraging the LLM-as-a-judge paradigm to evaluate the quality of the explanations.
We refer interested readers to the original paper [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] for a complementary assessment where
human-defined key properties are used to assess the explanation comprehensibility.
        </p>
        <p>To perform this experiment, we employ the Pairwise Ranking Prompting (PRP) approach [25]
applied to gpt-4o-mini, which provides an efective and stable procedure for leveraging an LLM
to perform ranking tasks. Additionally, to mitigate the well-documented sensitivity of LLMs to
text order [26, 27], each comparison is performed twice, swapping the order of the two elements
being compared. In our experiment, this framework was used to identify the explanation system
that produces the most efective explanations. Specifically, for each of the 100 record pairs in a
given dataset, we prompted the LLM to compare the explanations generated by CREW against
those produced by other explanation systems, considering both comparison directions. We
instructed the model to penalize explanations that focus on irrelevant attributes, overlook key
matching features, or provide inconsistent and unreliable justifications, while prioritizing those
that correctly identify key words that explain the matching decision. In total, 12k pairwise
comparisons were conducted, and each explanation was assigned a score using the following
formula, as defined in the original PRP framework:
  = 1 ⋅ ∑</p>
        <p>≠
 &gt;  + 0.5 ⋅ ∑ 

 = 
≠
(2)
where</p>
        <p>&gt;  is an indicator function that is 1 if the LLM prefers the first explanation (i.e.,   )
over the second one (i.e.,   ), and</p>
        <p>=  is 1 when the LLM provides conflicting outputs.</p>
        <p>The results in Figure 2 clearly show that CREW outperforms competing methods, winning
an average of approximately 450 comparisons. The second-best performing approach is LIME,
which generates explanations preferred, on average, over competing methods about 370 times.
Mojito and GMASK are the least performing methods, with an average of 210 and 150
comparisons won, respectively. It is interesting to note that CREW achieves excellent performance
on the only textual dataset considered in the evaluation, but performs the worst on the S-FZ
and S-BR datasets, which have descriptions with a relatively low average word count (about 20
words). Finally, it is worth highlighting that LIME demonstrates consistent results across all
datasets, with the lowest standard deviation.</p>
        <p>S-FZ S-DG S-DA S-AG S-WA S-BR S-IA T-AB D-IA D-DA D-DG D-WA</p>
        <p>CREW LIME Mojito GMASK</p>
      </sec>
      <sec id="sec-5-2">
        <title>4.2. Fidelity of the explanations</title>
        <p>This section evaluates how well explanations align with the underlying model using the
degradation score, a standard metric that measures how much model accuracy decreases when features,
deemed important by the explainer, are removed. A significant accuracy drop indicates that the
explanation is faithful. To compute the score, features are removed in order of their impact, both
from most to least relevant and vice versa. The area between the two resulting curves, MoRF
and LeRF (Most Relevant / Least Relevant features removed First), represents the degradation
score, with a larger area indicating higher trust in the explanation. In our experiments, we
implemented this metric using groups of words as features and the F1 score to measure the
model’s accuracy. The results are shown in columns “LIME” of Table 1, which reports the
degradation score for each dataset and explanation system. Table 1 reports a second experiment
(columns “Avg”) where the impact of a group is computed by averaging the token-level impacts
of the words in the group, instead of using LIME. Regardless of the configuration adopted, CREW
achieves the best performance, thus demonstrating that it generates explanations more faithful
to the underlying model than the competing approaches. LIME achieves the second-best results,
obtaining an average degradation score in the range of 0.4-0.44, while Mojito and GMASK
generate an average degradation score in the ranges of 0.2-0.24 and 0.2-0.3, respectively. For
CREW there is a clear diference in performance between using the “Avg” group weighting
technique and using LIME. The latter implementation is preferable regarding model fidelity: it
achieves a degradation score of 0.57 compared to 0.48 for the “Avg” configuration.</p>
      </sec>
      <sec id="sec-5-3">
        <title>4.3. Eficiency</title>
        <p>We assess the CREW’s eficiency by measuring the average explanation generation time, as
shown in Figure 3a. We did not report the performance of Mojito, because it is the same as</p>
        <p>S-FZ S-DG S-DA S-AG S-WA S-BR S-IA T-AB D-IA D-DA D-DG D-WA
#Words 24.88 33.19 39.83 19.63 34.99 26.27 67.16 64.18 67.16 40.89 33.68 34.99
S-FZ S-DG S-DA S-AG S-WA S-BR S-IA T-AB D-IA D-DA D-DGD-WA</p>
        <p>Know. Graph Corr. Cluster. Clust. Weight
(a) Overall eficiency (in seconds).</p>
        <p>
          (b) CREW runtime breakdown (log scale).
LIME. The results show that CREW is, on average, 10 seconds slower than the other approaches.
However, the average time is strongly influenced by three datasets (i.e., S-IA, T-AB, and
DIA) where CREW takes more than double the time required by the other approaches. This is
attributed to the length of entity descriptions, which exceeds 60 words on average, as shown at
the bottom of Figure 3a. To better understand the reasons for this variation, Figure 3b shows the
time breakdown along its three main steps: graph creation, correlation clustering computation
and clustering weighting. When the descriptions contain many words, the correlation clustering
step has the greatest impact on execution time. If performance is a concern, optimized clustering
techniques can be used without altering the rest of the pipeline. We recall that GMASK computes
clusters with only 10 words instead of considering all words. However, entity descriptions in
our datasets often exceed this threshold, reaching up to 90 words. In [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], we tested CREW
with a 40-word limit, significantly reducing execution time for S-IA, T-AB, and D-IA to 26.53,
26.65, and 26.61 seconds, respectively, making it more eficient than GMASK.
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5. Conclusion</title>
      <p>In this paper, we introduced CREW, a cluster-based explainer for Entity Matching that generates
user-interpretable and faithful explanations. CREW follows a two-step approach. First, it
clusters words based on three complementary types of knowledge: semantic, schema, and
importance relatedness. This selection ensures semantically meaningful clusters with diverse
importance levels, helping users identify key information in record matching. Second, CREW
quantifies each cluster’s contribution to the EM model’s prediction using a local post-hoc
explainer. Experimental results show that CREW provides more interpretable and faithful
explanations than state-of-the-art alternatives.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>The authors used GPT-4 for translation, grammar and spell checking. The AI-generated content
served only as a starting point, with substantial additional work contributed by the authors.
Supervised contrastive learning, Advances in neural information processing systems 33
(2020) 18661–18673.
[18] U. Brunner, K. Stockinger, Entity matching with transformer architectures - A step forward
in data integration, in: EDBT, OpenProceedings.org, 2020, pp. 463–473.
[19] N. Barlaug, J. A. Gulla, Neural networks for entity matching: A survey, ACM Trans. Knowl.</p>
      <p>Discov. Data 15 (2021) 52:1–52:37.
[20] M. Paganelli, D. Tiano, F. Guerra, A multi-facet analysis of bert-based entity matching
models, The VLDB Journal (2023) 1–26. doi:10.1007/s00778- 023- 00824- x.
[21] M. Paganelli, P. Sottovia, A. Maccioni, M. Interlandi, F. Guerra, Explaining data with
descriptions, Inf. Syst. 92 (2020) 101549.
[22] A. Baraldi, F. D. Buono, F. Guerra, M. Paganelli, M. Vincini, An intrinsically interpretable
entity matching system, in: EDBT, OpenProceedings.org, 2023, pp. 645–657.
[23] E. D. Demaine, D. Emanuel, A. Fiat, N. Immorlica, Correlation clustering in general
weighted graphs, Theor. Comput. Sci. 361 (2006) 172–187.
[24] M. Ankerst, M. M. Breunig, H. Kriegel, J. Sander, OPTICS: ordering points to identify the
clustering structure, in: SIGMOD Conference, ACM Press, 1999, pp. 49–60.
[25] Z. Qin, R. Jagerman, K. Hui, H. Zhuang, J. Wu, L. Yan, J. Shen, T. Liu, J. Liu, D. Metzler,
X. Wang, M. Bendersky, Large language models are efective text rankers with pairwise
ranking prompting, in: K. Duh, H. Gomez, S. Bethard (Eds.), Findings of the Association for
Computational Linguistics: NAACL 2024, Association for Computational Linguistics,
Mexico City, Mexico, 2024, pp. 1504–1518. URL: https://aclanthology.org/2024.findings-naacl.
97/. doi:10.18653/v1/2024.findings- naacl.97.
[26] N. F. Liu, K. Lin, J. Hewitt, A. Paranjape, M. Bevilacqua, F. Petroni, P. Liang, Lost in the
middle: How language models use long contexts, Transactions of the Association for
Computational Linguistics 12 (2024) 157–173. URL: https://aclanthology.org/2024.tacl-1.9/.
doi:10.1162/tacl_a_00638.
[27] Y. Lu, M. Bartolo, A. Moore, S. Riedel, P. Stenetorp, Fantastically ordered prompts and
where to find them: Overcoming few-shot prompt order sensitivity, in: S. Muresan,
P. Nakov, A. Villavicencio (Eds.), Proceedings of the 60th Annual Meeting of the Association
for Computational Linguistics (Volume 1: Long Papers), Association for Computational
Linguistics, Dublin, Ireland, 2022, pp. 8086–8098. URL: https://aclanthology.org/2022.
acl-long.556/. doi:10.18653/v1/2022.acl- long.556.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Tan</surname>
          </string-name>
          ,
          <article-title>Deep entity matching with pre-trained language models</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <article-title>Analyzing how BERT performs entity matching</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>15</volume>
          (
          <year>2022</year>
          )
          <fpage>1726</fpage>
          -
          <lpage>1738</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R.</given-names>
            <surname>Caruana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Lou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Gehrke</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Koch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sturm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Elhadad</surname>
          </string-name>
          ,
          <article-title>Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission</article-title>
          ,
          <source>in: Proceedings of the 21th ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>1721</fpage>
          -
          <lpage>1730</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Cicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Interpreting deep learning models for entity resolution: an experience report using LIME, in: aiDM@SIGMOD</article-title>
          , ACM,
          <year>2019</year>
          , pp.
          <volume>8</volume>
          :
          <fpage>1</fpage>
          -
          <issue>8</issue>
          :
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ebaid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. G.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          , Explainer:
          <article-title>Entity resolution explanations</article-title>
          ,
          <source>in: 2019 IEEE 35th International Conference on Data Engineering (ICDE)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>2000</fpage>
          -
          <lpage>2003</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Explaining entity resolution predictions: Where are we and what needs to be done?</article-title>
          ,
          <source>in: Proceedings of the Workshop on HumanIn-the-Loop Data Analytics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>C.</given-names>
            <surname>Molnar</surname>
          </string-name>
          ,
          <source>Interpretable Machine Learning</source>
          ,
          <year>2019</year>
          . https://christophm.github.io/ interpretable-ml-book/.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , ”
          <article-title>why should i trust you?” explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <article-title>A unified approach to interpreting model predictions</article-title>
          ,
          <source>in: NIPS</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>N.</given-names>
            <surname>Barlaug</surname>
          </string-name>
          , Lemon:
          <article-title>Explainable entity matching</article-title>
          ,
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          . doi:
          <volume>10</volume>
          .1109/TKDE.
          <year>2022</year>
          .
          <volume>3200644</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <article-title>Landmark explanation: An explainer for entity matching models</article-title>
          ,
          <source>in: CIKM, ACM</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4680</fpage>
          -
          <lpage>4684</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>T.</given-names>
            <surname>Teofili</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Martello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Efective explanations for entity resolution models</article-title>
          , in: ICDE, IEEE,
          <year>2022</year>
          , pp.
          <fpage>2709</fpage>
          -
          <lpage>2721</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Feng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Ganhotra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Gunasekara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Joshi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Ji</surname>
          </string-name>
          ,
          <article-title>Explaining neural network predictions on sentence pairs via learning word-group masks</article-title>
          ,
          <source>ArXiv abs/2104</source>
          .04488 (
          <year>2021</year>
          ). URL: https://api.semanticscholar.org/CorpusID:233204288.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R.</given-names>
            <surname>Benassi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Tiano</surname>
          </string-name>
          ,
          <article-title>Explaining entity matching with clusters of words</article-title>
          , in: ICDE, IEEE,
          <year>2024</year>
          , pp.
          <fpage>2325</fpage>
          -
          <lpage>2337</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebraheem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Joty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Distributed representations of tuples for entity resolution</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>1454</fpage>
          -
          <lpage>1467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mudgal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          , G. Krishnan,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Arcaute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raghavendra</surname>
          </string-name>
          ,
          <article-title>Deep learning for entity matching: A design space exploration</article-title>
          ,
          <source>in: Proceedings of the 2018 International Conference on Management of Data</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P.</given-names>
            <surname>Khosla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Teterwak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sarna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Tian</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Isola</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maschinot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Krishnan</surname>
          </string-name>
          ,
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>