<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Italian Symposium on Advanced Database Systems, June</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Landmark Explanation: a Tool for Entity Matching</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Baraldi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Del Buono</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Matteo Paganelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Guerra</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>DIEF - University of Modena and Reggio Emilia</institution>
          ,
          <addr-line>Modena</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <volume>1</volume>
      <fpage>9</fpage>
      <lpage>22</lpage>
      <abstract>
        <p>We introduce Landmark Explanation, a framework that extends the capabilities of a post-hoc perturbationbased explainer to the EM scenario. Landmark Explanation leverages on the specific schema typically adopted by the EM datasets, representing pairs of entity descriptions, for generating word-based explanations that efectively describe the matching model. Machine Learning (ML) and Deep Learning (DL) models have been successfully applied to the Entity Matching (EM) problem as the state-of-the-art approaches demonstrate (e.g., DeepER [1], DeepMatcher [2], DITTO [3], AutoML [4] and others [5, 6, 7]). Nevertheless, they are black-box models: the dificulty to evaluate [ 8] and to interpret their behaviors [9] hampers their adoption in business scenarios. Although many explanation systems have already been proposed in the literature (e.g., LIME [10], Shapley [11], Anchor [12], and Skater1), their application to EM tasks is not straightforward and only few approaches have partially addressed it [13, 14, 15, 16]. EM is conceived as a binary classification problem, where the classes show if the pairs of entities described in the dataset records are or are not matching. The structure of the datasets is then “unusual" for ML and DL techniques used to manage single evidence records and generic techniques for explaining ML and DL models cannot be straightforwardly applied. In this paper, we present Landmark Explanation a post-hoc perturbation-based local explainer for EM approaches. Post-hoc perturbation-based explainers build a surrogate linear model that approximates the model locally to the instance to explain. The surrogate linear model is trained with synthetic data. The dataset is generated by creating a number of alterations of the record to explain (in the so-called perturbation phase) and predicting their class by applying them the original model (in the so-called reconstruction phase). The explanation is directly obtained from</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Entity Matching</kwd>
        <kwd>Post-hoc Explanation</kwd>
        <kwd>Perturbation of EM datasets</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>sony white cybershot t series
digital camera jacket case with stylus
lcjthcw for 2007 cybershot t series
camera stylus include...
sony white cybershot t se- top loading leather black
ries digital camera jacket
case with stylus lcjthcw
sony lcs-csl cyber-shot
camera case
the surrogate model. The importance of a feature in the decision is computed by multiplying its
value in the record with the linear coeficient of the surrogate model. In textual databases, as
the ones considered in this paper, the features of the model are typically the words used in the
entity descriptions.</p>
      <p>Example 1. Table 1 shows an example of non-matching descriptions. Both the entities refer to
camera cases produced by the same brand, but since their product code is diferent they are not be
considered as the same entity. An explanation of for this record consists of a values associated to
each word in the description. Words are extracted from the descriptions via a tokenization process
(we evaluated the application of stemming techniques and the deletion of stop words). For this
reason the terms “token" and “word" are used as synonym in this paper.</p>
      <p>Landmark Explanation leverages the specificity of the EM dataset by introducing two main
innovations. The first is the generation of two explanations per dataset entry, one for each entity
described in the record. The second is a mechanism for computing meaningful explanations,
especially for records belonging to non-matching classes. The descriptions of a non-matching
entity are composed of diferent words, and selecting the ones that mostly contributed in the
decision is a complex task even for humans. To address the problem, we inject additional words
extracted from one entity into the second entity before the perturbation. The result is that the
number of diferent words in non-matching entities decreases, while the similarity increases,
thus enabling the approach to select the most relevant elements for the decision.</p>
      <p>
        We implemented Landmark Explanation as an add-on component of the LIME system. The
results of the experiments show that the explanations generated for EM datasets outperform
the ones of the competing approaches in accuracy and “interest" for the users. This paper
summarizes the Landmark Explanation presentations in [
        <xref ref-type="bibr" rid="ref17 ref18">17, 18</xref>
        ].
2. The Landmark Explanation approach
      </p>
      <sec id="sec-1-1">
        <title>2.1. Landmark Explanation principles</title>
        <p>Landmark Explanation adapts a local post-hoc explanation technique to the EM scenario. Indeed,
the direct application of a perturbation mechanism based on token removals is not efective for
EM datasets. The reason is that removing random tokens is likely to afect both the entities
represented by the two descriptions. The generated synthetic records may then contain null
or non coherent perturbations where the same tokens referring to the diferent entities are
removed. These inconsistent perturbations lead to biased explanations. Moreover, post-hoc</p>
        <sec id="sec-1-1-1">
          <title>L entity</title>
        </sec>
        <sec id="sec-1-1-2">
          <title>R entity Class</title>
        </sec>
        <sec id="sec-1-1-3">
          <title>Model</title>
          <p>LIME components</p>
        </sec>
        <sec id="sec-1-1-4">
          <title>Landmark generation</title>
          <p>+ augmentation</p>
        </sec>
        <sec id="sec-1-1-5">
          <title>Perturbation</title>
          <p>generation</p>
        </sec>
        <sec id="sec-1-1-6">
          <title>Reconstruction</title>
          <p>&amp; prediction</p>
        </sec>
        <sec id="sec-1-1-7">
          <title>Explanation via surrogate model</title>
          <p>Landmark Explanation</p>
        </sec>
        <sec id="sec-1-1-8">
          <title>Explanation</title>
        </sec>
        <sec id="sec-1-1-9">
          <title>Explanation</title>
          <p>explanation systems adopt techniques for generating perturbations based on token removal. The
resulting explanations for non-matching entity descriptions (the greatest parts of the records
generally in EM datasets) are not useful as we will describe later on. Landmark Explanation
addresses these issues by introducing the following two main innovations.</p>
          <p>Double explanation. The first innovation consists of the generation of two explanations for
each dataset entry. When we compute an explanation, we perturb a description (the varying
entity) and keep unchanged its paired description (the landmark entity). The explanation assigns
an impact to each token of the perturbed description. We repeat the computation by exchanging
varying and landmark entities. Each result explains the model decision from the perspective of
one of the two entities described in the record.</p>
          <p>Injection of features. The second is a mechanism is for contrasting the asymmetric nature
of the EM problem: an explanation of a matching pair is always composed of “interesting"
tokens since they express the reasons why the entities have been considered as matching. The
same does not happen for non-matching entities that have many reasons to be diferent. We
address this issue by injecting additional tokens extracted from the landmark entity into the
varying entity before the perturbation. Therefore, such a dataset contains entities close to the
landmark, and the surrogate model trained with these entities will be able to highlight the
distinctive tokens, that mainly contribute to the decision. Without the injection, descriptions of
non-matching entities would have a large number of tokens that would uniformly contribute to
the decision with the same low impact.</p>
        </sec>
      </sec>
      <sec id="sec-1-2">
        <title>2.2. Landmark Explanation explanations</title>
        <p>Let  be a record in an EM dataset representing a pair of entity descriptions (, ), each one
composed of a collection of tokens {1, ...,  }, where  ∈ {, }, and  is the number of
tokens belonging to the description of the entity . The application of an EM binary
classiifcation model to  returns {0, 1} when  is composed of non-matching or matching entity
descriptions, respectively. An explanation is composed of a score for each description token
 = {1, ...,  }, where  ∈ {, },  ∈ R,  is the score of token  .  is the explanation
generated by selecting  as the landmark and, vice-versa,  by selecting  as the landmark.
Positive scores push the decision towards the class of matching entities, negative towards
non-matching. The highest the absolute value of the score, the highest the importance of the
token associated with the score. An explanation with augmented features assumes the form of
 = {1, ...,  , 1, ...,  }, where for the explanation , the scores  are the ones of
the injected features from the entity description  (and vice-versa for the explanation ).</p>
      </sec>
      <sec id="sec-1-3">
        <title>2.3. Landmark Explanation workflow</title>
        <p>Perturbation generation. A representation of the neighborhood for varying entities is
generated by perturbing its tokens in multiple ways. We used LIME which generates a series of
textual phrases containing many combinations of the tokens of the varying description.
Reconstruction and prediction. We reconstruct the schema of the synthetic textual records
obtained in the last step. We concatenate each of these new records with the original landmark
entity. The produced pairs of entities are finally provided as input to the original EM model in
order to obtain the relative prediction scores.</p>
        <p>
          Explanation via surrogate model. Finally, a surrogate linear model (one for each workflow,
one for the left and right entities, respectively) is trained on the perturbed dataset to learn an
approximation of the behavior of the original model in those localities. The surrogate model
takes in input the bag of words representation of the perturbed tokens and is trained to learn the
relation between the input and the prediction score produced by the model under explanation.
The coeficients learned during training represent the impact of each token in the prediction,
and are used to generate the explanations of the original EM model for each EM record. In our
implementation we adopt LIME to perform this task, but our approach is transparent to the
explanation tool selected.
2.4. Explaining ER Models
Studies applying interpretation techniques in the entity matching area [
          <xref ref-type="bibr" rid="ref14 ref16">16, 14</xref>
          ], and tools, like
Mojito [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] and Explainer [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ], have been proposed. ExplainER provides a unified interface for
applying well-known interpretation techniques (e.g., LIME, Shapley, Anchor, and Skater) in the
EM scenario. Mojito adapts LIME for the explanation of single EM predictions and represents the
work closer to our approach. It extends LIME in two ways: 1) it exploits the subdivision of EM
data into attributes, 2) it introduces a new form of data perturbation, called LIME-COPY2, which
allows generating match elements starting from non-match elements. Diferently Landmark
Explanation, Mojito treats attributes atomically, distributing its impact equally to its constituent
tokens. Furthermore, Landmark Explanation analyzes the diversified impact that the same
token can generate depending on the entity considered as a landmark for the explanation.
        </p>
        <p>2In Section 3 we refer to this technique as Mojito Copy since it is part of the Mojito tool.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>3. Experimental evaluation</title>
      <p>We evaluated the explanations generated by Landmark Explanation according to two main
perspectives: the fidelity in representing the EM Model (in Section 3.1) and the “quality" of the
explanation. For this last evaluation, we introduce a measure for assessing the interest of the
explanations (in Section 3.2) and we propose an example of explanation for non-matching entity
descriptions (in Section 3.3). This shows the importance of the token injection mechanism.
Dataset and Model. We perform an experimental evaluation against the datasets provided by
the Magellan library3 which is considered as a standard benchmark for the evaluation of
EM tasks. The datasets are divided into structured (iTunes-Amazon S-IA, DBLP-ACM S-DA,
DBLP-GoogleScholar S-DG, Walmart-Amazon S-WA), textual (Abt-Buy T-AB) and Dirty
(iTunesAmazon D-IA, DBLP-ACM D-DA, DBLP-GoogleScholar D-DG, Walmart-Amazon D-WA). The
records in all datasets represent pairs of entities described with the same attributes. A label is
provided to express if the record represents a matching / non-matching pair of entities. A simple
logistic regression model is experimented as matcher, where the features are the similarities of
the paired attributes in the descriptions. We compute the similarity by applying the jaccard
measure on the trigrams of the attribute values. The experiments are performed by sampling
100 records per label (all records in datasets with smaller cardinality) and computing their
explanations. We generate base explanations, by using the tokens from an entity description
and augmented explanations, by generating explanations with the tokens of entity description
with the ones injected from the second entity description.
3.1. Fidelity of the explanations
To evaluate the fidelity of the explanations, i.e., if the weights assigned by Landmark Explanation
to the tokens generate a surrogate model that is consistent with the EM model, we randomly
remove 25% tokens from the record to explain, defining a new item. We then compared the
probability score obtained passing the new item to the EM model with the one of the original
records, where we have subtracted the sum of the coeficients associated with the removed
tokens. If the explanation model correctly represents the EM model these two values should
be close. The experiment is repeated 100 times per class, and the performance measured by
means of two metrics: the mean absolute error (MAE) between the explanation and the EM
Model and the accuracy that measures the percentage of times that the probability score of the
new item changes consistently with to the sum of the impacts of the tokens removed. Table 2
shows the results of the experiment. The column LIME shows the results obtained with LIME
with the same setting. Non-matching settings also include a comparison with the Mojito Copy
technique.</p>
      <p>Discussion. The experiments show that the surrogate model built by Landmark Explanation
with the base perturbation provides an accurate representation of the EM model for records
representing matching pairs of entities. At the same time, the model built with the augmented
perturbation is an accurate representation of the EM model for record representing non-matching
pairs of entities. In particular, Table 2a shows that Landmark Explanation, applied to records
3https://github.com/anhaidgroup/deepmatcher/blob/master/Datasets.md
labeled as matching entity, performs better than LIME in the datasets when the perturbation is
generated with the base technique (it obtains better accuracy in all datasets and low MAE in 8/9
datasets). The augmented generation technique performs slightly worse: in 8/9 it obtains better
accuracy and in 5/9 lower MAE). Note that this can be motivated also by the increased number
of tokens in the augmented explanations. Nevertheless, the scores, when worst, are very close
to LIME. Table 2b shows the accuracy and the MAE obtained analyzing records referring to
non-matching labels. In this scenario, the augmented entity perturbation obtains the best scores
with an accuracy better than LIME in 3/9 datasets and a lower MAE in 7/9 datasets. Finally,
the copying technique introduced by Mojito to manage records associated with non-matching
labels does not show high performance. The reason is that Mojito generates a perturbation
by duplicating entire attributes. The result of this operation is that the tokens of the replaced
attribute have the same weights, and decrease the performance.
3.2. Quality of the explanations
Since there are many reasons to be dissimilar for two entities, the explanations of non-matching
entity descriptions are typically “slightly polarized" having negative values distributed in a
range close to zero and no value dominating the others. For the user, this means not being
able to grasp a strong motivation for the non-matching decision. To evaluate if we are able to
generate “interesting explanations", we introduced a heuristic according to which an explanation
for non-matching entities is interesting if it contains tokens that, if injected into the second
entity, would make the record classified as matching. These are the elements that make the
explanation interesting for the users. To evaluate if the explanations generated by Landmark
Explanation satisfy this property, we perform the same experiment described in Section 3.1,
but selecting the tokens to remove: negative tokens are removed when the label represents a
non-matching record (all tokens that contribute to the decision). Positive tokens are removed in
case of matching records. In Table 3 we measure the interest, which is the percentage of records
where the removal of the tokens was able to generate a change in the label.
Discussion. Landmark Explanation generates interesting explanations, and the perturbation
generated with the augmented technique efectively increases “the interest" of non-matching
record explanations. In particular, Table 3a shows that Landmark Explanation is good but
slightly worse than LIME in terms of interest, when the records are labeled as matching class.
This happens even if the surrogate model is accurate (the MAE score is the lowest for all
experiments with the single-entity configuration). The problem is that in most of the cases,
even removing all tokens, the explanation created by Landmark Explanation belongs to the
same class as before the token removal. Note that if we set a decision threshold to 0.4, our
approach has the best results in all datasets. Table 3b shows that the augmented explanations of
non-matching entities generated by Landmark Explanation outperform LIME and Mojito Copy.
3.3. Showing the explanations</p>
      <p>Original Tokens
l_price, nan
l_name, case
l_description, series
rkandm l_descrilp_ntiaomn,ej,awckiteht
La l_description, cybershot
t
h
igR l_description, cybershot
l_description, custom-fitted
l_description, lcjthcw
l_name, lcjthcw</p>
      <p>Original Tokens
r_name, case</p>
      <p>r_id, 459
r_description, top
rkandm r_nra_mnaem,cea,msoenray
Lar_description, loading
ftLe r_price, nan
r_description, leather
r_description, black
r_name, lcs-csl</p>
      <p>l_name, case
l_description, case
k l_name, camera
ral_description, camera
m
itLhgdR ll__nnaammee,,jsatcykluest
an l_description, white
l_name, white
l_name, lcjthcw</p>
      <p>Original Tokens</p>
      <p>Augmented Tokens</p>
      <p>Original Tokens
r_name, lcs-csl
r_name, sony
r_description, black</p>
      <p>r_name, sony
ftrLLkaaed
m
nr_description, black
r_name, lcs-csl</p>
      <p>Augmented Tokens
l_name, lcjthcw
l_name, jacket
l_name, white
l_name, case
l_description, case
l_name, camera
l_description, camera
l_description, white
l_name, stylus
0.5 0.0 0.5
Token impact
0.5 0.0 0.5
Token impact
0.5Token impact0.5
0.0
0.5Token impact0.5
0.0
0.5Token impact0.5
0.0
0.5Token impact0.5
0.0
(a) The base technique.</p>
      <p>(b) The augmented technique.</p>
      <p>Figure 2a shows the explanations computed with the base technique for the entity descriptions
in Table 1 1. We recall that positive impacts push towards the match decision, negative towards
a non-match decision. Landmark Explanation generates two explanations per record and we can
see that no token assumes a particular importance. The resulting explanation is therefore not
interesting (and useful) for the user. Figure 2b shows the explanation obtained by the injection
of the tokens from the landmark. The first explanation (where the right entity is the landmark)
clearly shows that the token case pushes towards the match decision (both the entities refer to
camera cases) and the code lcjthcw towards the non-match decision (it is diferent from the
code in the second description). The augmented tokens show that the code lcs-csl pushes
towards a match decision. This means that if that code had been part of the description for the
left entity, it would have pushed the model towards a match decision. Similar considerations
can be done by observing the second explanation obtained setting the left entity as landmark.</p>
    </sec>
    <sec id="sec-3">
      <title>4. Conclusion</title>
      <p>This paper introduces Landmark Explanation a tool that makes a post-hoc perturbation-based
explainer able to deal with ML and DL models describing EM datasets. The approach has been
experimented coupled with the LIME explainer on a simple EM model based on logistic
regression. The results show that the explanations generated by Landmark Explanation outperform
the ones generated by the competing approaches.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ebraheem</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. R.</given-names>
            <surname>Joty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Distributed representations of tuples for entity resolution</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>11</volume>
          (
          <year>2018</year>
          )
          <fpage>1454</fpage>
          -
          <lpage>1467</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>S.</given-names>
            <surname>Mudgal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Park</surname>
          </string-name>
          , G. Krishnan,
          <string-name>
            <given-names>R.</given-names>
            <surname>Deep</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Arcaute</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Raghavendra</surname>
          </string-name>
          ,
          <article-title>Deep learning for entity matching: A design space exploration</article-title>
          , in: SIGMOD Conference, ACM,
          <year>2018</year>
          , pp.
          <fpage>19</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          , W.-C. Tan,
          <article-title>Deep entity matching with pre-trained language models</article-title>
          ,
          <source>Proc. VLDB Endow</source>
          .
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>50</fpage>
          -
          <lpage>60</lpage>
          . URL: https://doi.org/10.14778/3421424.3421431. doi:
          <volume>10</volume>
          .14778/3421424.3421431.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pevarello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. Vincini,</surname>
          </string-name>
          <article-title>Automated machine learning for entity matching tasks</article-title>
          , in: EDBT, OpenProceedings.org,
          <year>2021</year>
          , pp.
          <fpage>325</fpage>
          -
          <lpage>330</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L.</given-names>
            <surname>Gagliardelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Zhu</surname>
          </string-name>
          , G. Simonini, S. Bergamaschi,
          <article-title>BigDedup: A Big Data Integration Toolkit for Duplicate Detection in Industrial Scenarios</article-title>
          , in: TE, volume
          <volume>7</volume>
          of Advances in Transdisciplinary Engineering, IOS Press,
          <year>2018</year>
          , pp.
          <fpage>1015</fpage>
          -
          <lpage>1023</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Cappuzzo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Papotti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <article-title>Creating embeddings of heterogeneous relational datasets for data integration tasks</article-title>
          , in: SIGMOD Conference, ACM,
          <year>2020</year>
          , pp.
          <fpage>1335</fpage>
          -
          <lpage>1349</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Brunner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Stockinger</surname>
          </string-name>
          ,
          <article-title>Entity matching with transformer architectures - A step forward in data integration, in: EDBT, OpenProceedings</article-title>
          .org,
          <year>2020</year>
          , pp.
          <fpage>463</fpage>
          -
          <lpage>473</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ferro</surname>
          </string-name>
          ,
          <article-title>Evaluating the integration of datasets</article-title>
          ,
          <source>in: Proceedings of the 37th ACM/SIGAPP Symposium on Applied Computing</source>
          , SAC '22,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2022</year>
          , p.
          <fpage>347</fpage>
          -
          <lpage>356</lpage>
          . URL: https://doi.org/10.1145/3477314.3507688. doi:
          <volume>10</volume>
          .1145/3477314.3507688.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Du</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Hu</surname>
          </string-name>
          ,
          <article-title>Techniques for interpretable machine learning</article-title>
          ,
          <source>Commun. ACM</source>
          <volume>63</volume>
          (
          <year>2020</year>
          )
          <fpage>68</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>" why should i trust you?" explaining the predictions of any classifier</article-title>
          ,
          <source>in: Proceedings of the 22nd ACM SIGKDD</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghorbani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. Y.</given-names>
            <surname>Zou</surname>
          </string-name>
          ,
          <article-title>Data shapley: Equitable valuation of data for machine learning</article-title>
          ,
          <source>in: ICML</source>
          , volume
          <volume>97</volume>
          <source>of Proceedings of Machine Learning Research, PMLR</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>2242</fpage>
          -
          <lpage>2251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Anchors: High-precision model-agnostic explanations</article-title>
          , in: AAAI, AAAI Press,
          <year>2018</year>
          , pp.
          <fpage>1527</fpage>
          -
          <lpage>1535</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ebaid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. G.</given-names>
            <surname>Aref</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Elmagarmid</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          , Explainer:
          <article-title>Entity resolution explanations</article-title>
          ,
          <source>in: 2019 IEEE 35th Int. Conf. on Data Engineering (ICDE)</source>
          , IEEE,
          <year>2019</year>
          , pp.
          <fpage>2000</fpage>
          -
          <lpage>2003</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Thirumuruganathan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ouzzani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tang</surname>
          </string-name>
          ,
          <article-title>Explaining entity resolution predictions: Where are we and what needs to be done?</article-title>
          ,
          <source>in: Proceedings of the Workshop on Human-In-the-Loop Data Analytics</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>6</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Cicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Interpreting deep learning models for entity resolution: an experience report using LIME, in: aiDM@SIGMOD</article-title>
          , ACM,
          <year>2019</year>
          , pp.
          <volume>8</volume>
          :
          <fpage>1</fpage>
          -
          <issue>8</issue>
          :
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>X. W. L. H. A.</given-names>
            <surname>Meliou</surname>
          </string-name>
          ,
          <article-title>Explaining data integration, Data Engineering (</article-title>
          <year>2018</year>
          )
          <fpage>47</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <article-title>Landmark explanation: An explainer for entity matching models</article-title>
          ,
          <source>in: CIKM, ACM</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>4680</fpage>
          -
          <lpage>4684</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>A.</given-names>
            <surname>Baraldi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F. D.</given-names>
            <surname>Buono</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Paganelli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Guerra</surname>
          </string-name>
          ,
          <article-title>Using landmarks for explaining entity matching models</article-title>
          , in: EDBT, OpenProceedings.org,
          <year>2021</year>
          , pp.
          <fpage>451</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>