<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Automatic Entity Labeling through Explanation Techniques</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>(Discussion Paper)</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvana Castano</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfio Ferrara</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Donatella Firmani</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jerin George Mathew</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Montanelli</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Università di Roma Sapienza</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Entity resolution (ER) aims at matching records that refer to the same real-world entity, e.g., the same product sold by diferent websites. Recent solutions to this problem have reached unprecedented accuracy. Nonetheless, due to intrinsic limitations of automatic testing methods, it is known among researchers and practitioners that a significant manual efort is still required in production environments for verification and cleaning of ER results. In order to facilitate such activity, we are developing the E2L methodology (Entity to Labels) for automatic computation of human-readable labels of identified entities. Given a selection of entities for which the user wants to compute labels, E2L first extracts relevant features by training a classifier on the ER results, then it leverages the notion of black-box model explanation to select the most important terms for the classifier, and finally it uses those terms to compute labels. In this paper we report our first experiences with E2L. Preliminary results on a real-world application scenario show that E2L labels can provide an accurate description of entities and a natural way for humans to assess the trustworthiness of ER results at a glance.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Entity Resolution (ER) is the task of finding records in a collection that refer to the same
realworld entity. Recent works have investigated the application of machine learning (ML) and deep
learning (DL) techniques, demonstrating impressive prediction accuracy [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Nonetheless, in
production environments, humans are still required to manually inspect the entities identified
by the ER process, in order to assess their trustworthiness. This can be a gruesome activity,
especially when large datasets are considered, with entities consisting of hundreds of records.
For this reason, tools for supporting the manual inspection of ER results and speeding up the
search for possibly mismatched records are strongly demanded. Within this space, we focus on
the problem of computing human-readable textual labels of identified entities, such as those in
Table 1, which represent a natural way to support human comprehension of what is inside each
clustered entity.
      </p>
      <p>Entity Label
Canon EOS 1100D
Sony A7
(b)</p>
      <p>
        Already available solutions for analogous tasks (see Section 4) typically require some form of
human intervention, such as, providing external knowledge (e.g., vocabularies) or a selection of
sample labels for training. Fully-automated solutions instead are based on token frequencies
(e.g., TF-IDF) which may perform poorly in datasets with skewed entity size distribution. Our
main intuition is to exploit (i) recent methods to process natural language such as [
        <xref ref-type="bibr" rid="ref2 ref3">2, 3</xref>
        ] to
discover meaningful patterns in the association between records and entities with no human
efort, and (ii) recent explainable techniques such as [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] to reveal such patterns and make them
human-readable, by selecting the salient information.
      </p>
      <p>
        In this paper, we formalize these intuitions by presenting the E2L (Entity to Labels) approach
and report our experiences with a real world application scenario [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where ER results need to be
manually curated. Our current implementation, featuring two representative text classification
methods [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ] and one popular explanation method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] can achieve promising results and
highlight errors in the ER results. A repository with all our data and scripts is publicly available
for download at https://github.com/jermathew/E2L.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. The E2L methodology</title>
      <p>Let  = {1, . . . , } be a collection of record descriptions referring to a set of entities ˆ =
{1, . . . , } with  &gt; . Each record  ∈  is related to an entity  ∈ ˆ, also referred as a
cluster of records. Given two records 1 ∈ , 2 ∈ , we refer to them as matching records if
they are associated to the same entity.</p>
      <p>
        Our methodology, that we call E2L (Entity to Labels), comprises the sequence of modules in
Figure 1 as described below.
1. Classification model. Given a set of entities  ⊆ ˆ that ought to be manually checked by
the final user, E2L trains a classifier to learn a function  :  → , such that  () =  denotes
that the record  ∈  is associated with the entity . In order to build the training set, we use
standard text processing (e.g., stop-words removal) and tokenization techniques to represent
a record  ∈  as sequence of tokens  () = [1, . . . , ]. Resulting tokens can be either
single terms (e.g. Canon) or noun chunks. A noun chunk provides a singleton representation
of a composite noun (e.g., digital camera, USA warranty). Note that this step requires
no human efort as association between records and entities required for training are selected
directly from the input ER results.
2. Candidate labels. Each element  ∈ ⋃︀∈ s.t. ()=  () represents a candidate label for
the entity . Given an entity , this module computes a real-valued relevance score   for each
candidate label  by leveraging a black-box explanation technique over model  . Intuitively,
consider a token  ∈  () and let ˆ correspond to the record  without . The candidate labels
module assigns higher relevance score to tokens that yields more consistently  (ˆ) ̸=  (),
for all  s.t.  () = . Specifically, we use LIME [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] as our black-box explanation technique. In
order to compute relevance scores for a given record  and a model  , LIME creates a new set
of records  by randomly removing tokens from . In , records are represented as binary
vectors where each dimension corresponds to a diferent token. Then, given a class  ∈ , each
′ ∈  is labeled accordingly to whether  (′) =  or not. Finally, LIME fits a linear model on
. Weights of the linear model represent how much each token contributed to  (). Given
an entity , the output of this module is a sorted list of candidate labels and associated relevance
scores  = [(1,  1 ), (2,  2 ), . . . ],   ≥  +1 .
      </p>
      <p>
        We now describe the candidate labels module in more details. Let  ⊆  be the set of
records that  associates to a given entity , that is  = { ∈  |  () = }. For each
record  ∈  we submit its tokens  () to the black-box explanation function in order to get
their relevance scores  = {⟨,  ⟩ :  ∈  (),  ∈ R}. Tokens with positive relevance are
then sorted by non-increasing relevance value and selected until their cumulative relevance
is greater or equal to a user-specified fraction  ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ] of the total. As a result a selection
′ ⊆  of tokens is obtained for the record . The set of candidate labels  consists of the
union of the selected tokens ′ for each  ∈ , and, for each label , the label relevance []
is the sum of the relevance scores of ′ [] over all the records in  ∈ . We repeat these steps
for each entity  ∈
      </p>
      <p>Running the aforementioned steps can be infeasible if (i)  contains a massive number
of records or (ii) records in  consist of thousands of tokens. In both cases, the black-box
explanation function could take a significant amount of time to process . In order to address
both points, we include in E2L a record sampling step and a token sampling step – described
below – to be optionally executed before the black-box model explanation computation.</p>
      <p>(i) During the record sampling step, we aim at picking a subset ′ ⊆  such that the tokens
in ′ cover most of the relevant tokens in . In order to do so, we run the k-means clustering
algorithm with parameter  on a vector representation 1 of the input records  and then, for
each cluster, we select the closest record to its centroid based on ℓ2-norm. As a result, we obtain
 vectors from which we retrieve the corresponding records, which collectively make up ′ .
The value , corresponding to the sample size, is set such that as the number of records ||
grows, the fraction of sampled records decreases via linear interpolation.</p>
      <p>1The selected vector representation can be arbitrarily chosen, e.g a tf-idf vector representing a record  or the
mean word embedding of its constituent tokens</p>
      <p>(ii) During the token sampling step, given a record  ∈  we aim at picking a selection
 ′() ⊆  () of its most representative tokens. To that end, given a record  ∈  we sort its
tokens  () based on their Term Frequency (TF) in decreasing order, prioritizing noun chunks
over singleton text tokens. Afterwards, we select the top  tokens as those to be included in the
sample for the record . Analogously to the record sampling step, the value  is set via linear
interpolation so that as the number of tokens in  () grows, the fraction of selected tokens
decreases.
3. Label composition. Candidate labels and associated relevance scores are finally processed
to return to the user a label for each entity. Given a user parameter , we return as label the
composition (e.g concatenation) of the top  labels in .</p>
    </sec>
    <sec id="sec-3">
      <title>3. Experiences with E2L</title>
      <p>
        The E2L approach is evaluated on the camera dataset in the Alaska Benchmark, an end-to-end
benchmark tailored for a variety of tasks related to Data Integration, including ER [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and
has been recently used for the 2020 SIGMOD Programming Contest2 and for the two editions
of the DI2KG challenge3. The dataset comprises i) a set of camera descriptions collected over
diferent web sources, and ii) a manually-curated ground truth consisting of camera names (i.e.,
brand name and model name) for each description, such that multiple descriptions can refer
to the same camera. In the evaluation, we take into account the 20 entities with the highest
number of records, ranging from 184 to 53 records per entity. The resulting dataset consists of
2171 records. We use the page_title attribute from each description to compose a dataset
(hereinafter called Alaska dataset) as a list of &lt;page_title&gt;,&lt;model_name&gt; pairs, where
&lt;model_name&gt; represents the correct label expected for each group of descriptions referring
to the same camera. The longest page_title field in the dataset contains 42 words, while the
shortest one contains 3 words.
      </p>
      <p>Our experiments were performed on a server environment using an Intel Xeon E5-2966 v4
CPU, 512 GB of RAM, and 4 NVIDIA Tesla P100-SXM2 GPUs. The operating system is Ubuntu
17.10.</p>
      <p>
        Classification model. We exploited two models, a LSTM-based neural network [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and a
pretrained DistilBERT model [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], and we generated two versions of E2L, namely E2L-Bert and
E2L-Glove. The LSTM-based network consists of a pre-trained embedding layer based on GloVe
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] followed by a bidirectional LSTM (Bi-LSTM) layer whose memory dimension is 100. The
output of the last time step in the Bi-LSTM is then fed to a fully connected layer of size 64 using
ReLu as the activation function. Finally, the resulting output is passed to a fully connected layer
of size 20 where softmax is used as the activation function. As for the second model, we leveraged
the Transformers library 4 to set up a pretrained DistilBERT model for a multiclass classification
task. This model comprises two parts: the body, consisting in a pretrained DistilBERT model,
and a classification head on top of the body whose last layer consists in a fully connected layer
2http://www.inf.uniroma3.it/db/sigmod2020contest
3http://di2kg.inf.uniroma3.it
4https://github.com/huggingface/transformers
of size 20 with softmax as the activation function.
      </p>
      <p>Baselines. As baselines for comparison against E2L, we exploit two diferent approaches for
entity labeling, named TFIDF and BART. The choice of TFIDF is motivated by the fact that this is
almost a standard solution for terminology retrieval and it provides good results on the entity
labeling task. The choice of BART is motivated by the idea of comparing E2L against a solution
for document summarization, based on the idea that summarizing entity descriptions is an
efective way to enforce entity labeling. Both approaches start by joining the page_title
ifelds referring to the same camera name in the Alaska dataset. This way, we obtain a set  of
20 pseudo-descriptions, one for each camera. These pseudo-descriptions are then tokenized by
exploiting the same procedure used in E2L.</p>
      <p>
        • For the TFIDF baseline, we compute Tf-Idf on  . Then, for each pseudo-description
 ∈  , tokens are sorted by their Tf-Idf weights in descending order.
• As for the BART baseline, we feed each  ∈  to a pretrained BART model, namely
BART.large.cnn [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As a result, we obtain a summary of , that is a concise and
shorter version of . Then, we tokenize and process the summary as in the E2L approach.
      </p>
      <p>Tokens are sorted according to their position.
3.1. Experimental comparison
Let be  = [(1,  1 ), (2,  2 ), . . . ] a list of candidate labels for the entity  produced by the
approach , either one of the E2L versions or one of the baselines, sorted by their relevance score
  from the most relevant to the less relevant. For each label , we know the gold label  (i.e.,
the correct camera name) and we aim to evaluate the capability of E2L to build  by combining
the candidate labels in . Moreover, we aim to assess how many of the  labels we need
to employ to obtain exactly the gold label . The efectiveness of an entity labeling solution
can be measured by observing how many candidate labels are required to obtain the gold label.
The lower is the number of needed candidates (taken with relevance score in descending order),
the higher is the efectiveness. According to this, the quality of each approach is measured as
follows. First, we create the set  of the tokens in the gold label , by extracting single terms
(i.e., separated by spaces). Then, we do the same for the most relevant candidate label 1, by
defining 1 as the set of tokens of 1. Given  , we define 1 = 1 and we evaluate precision
(1) and recall (1) of  at candidate 1 as:
1 = |  ∩ 1 | ; 1 = |  ∩ 1 |</p>
      <p>| 1 | |  |</p>
      <p>This process is repeated for each of the  top candidate labels produced by . At each step
 &gt; 1, we define  as:</p>
      <p>= − 1 ∪  .</p>
      <p>The F1-measure () at  is the harmonic mean of  and . By exploiting these measures
of precision and recall at , we can easily check when the gold label  has been completely
obtained (i.e., the  value where we have  = 1) and how many wrong tokens we have
TFIDF</p>
      <p>BART
E2L-Glove
E2L-Bert
collected during the process (i.e., ). Thus, we measure the overall quality of E2L and the
baselines through the notion of Precision at full coverage ( *) that is defined as follows:
 * =  :  = 1</p>
      <p>In Table 2, we report the values of precision at full coverage ( *) for all the approaches,
together with the number and fraction of entities that are correctly retrieved when recall is
equal to 1 (i.e.,  * = 1), which means that the gold label has been not only completely retrieved
by also retrieved by not introducing any noisy token, that is with no errors. The experimental
results show that the use of black-box explanation techniques in E2L allows to extract relevant
terminology for composing the correct label of entities as a final stage in a ER process. Indeed,
if the statistical techniques seem to be efective for retrieving relevant terminology, they appear
also to be more prone to introduce noisy terms in the candidate labels. On the other hand,
data summaries, especially for text, tend to produce longer descriptions that are not enough
synthetic to be taken as a good entity label. By contrast, the terms found by E2L appear as a
good compromise in that they are more specifically related to the entities at hand, but also short
enough to be useful for the task of labeling entities.</p>
      <p>ER errors. Limitations of statistical techniques such as TFIDF are even clearer when there
are errors in the input ER results. Consider for instance entity merge errors, where diferent
real-world entities are mis-clustered as one entity. Table 3 reports preliminary results on a
selection of clusters with diferent sizes from our Alaska dataset. Specifically, we considered
clusters of diferent sizes, merged them, and computed labels with TFIDF and E2L-Bert. In the
table, we show the labels with  equal to the size of the gold label for each of the considered
merged clusters. In presence of merge errors, statistical techniques like TFIDF fail at identifying
relevant terminology for all the sub-clusters in the merged cluster, while E2L-Bert can return
the labels corresponding to the merged entities, thus supporting manual inspection of results
and error detection.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Related work</title>
      <p>Works related to the proposed E2L approach are about entity labeling as well as machine learning
interpretation.</p>
      <p>
        Entity labeling. A number of solutions has been proposed in the literature for entity labeling
intended as the problem of finding a representative label to a set of records that refers to the same
real-world object. A common solution is based on the idea to rely on an external knowledge
base that works as a reference vocabulary for selecting the most appropriate label to assign to
a given entity [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Entity labeling can be considered as a task of semantic data mining where
labels emerge from record descriptions and they are selected according to the results of text
processing techniques usually based on conventional information retrieval metrics (e.g., [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]).
Machine learning techniques are also employed for entity labeling [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Automatic solutions to
entity labeling can be integrated within human-in-the-loop workflows where domain experts
are involved to validate the results of automated solutions (e.g., [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]).
      </p>
      <p>
        Machine learning interpretation. In the recent years there was a surge of interest in the
novel field of interpretability (see [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]). Explanation techniques can be distinguished between
black-box and white-box. The former come with a model-agnostic interface while the latter rely
on the internal mechanisms of the model. In E2L, we adopt LIME [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] that is a widely-employed
black-box method. Other methods in the same category include SHAP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and Anchor [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
    </sec>
    <sec id="sec-5">
      <title>5. Future Work</title>
      <p>
        In this paper, we presented the E2L approach to entity labeling based on the use of techniques for
classification and model explanation. Our current implementation features two representative
text classification methods [
        <xref ref-type="bibr" rid="ref2 ref3">3, 2</xref>
        ] and one popular explanation method [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. We plan as future
works the inclusion of a wider choice of text classification and the inclusion of more explanation
methods, such as SHAP [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] and Anchor [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        Furthermore, a limitation of the current approach is that it depends on the ability of a
supervised classifier to capture the entity properties. In principle, if the classifier is
underperforming, the extracted labels can be less satisfactory. A simple solution could consists in
training an ensemble of diferent models (as opposed to a single classifier) and select labels
by using a voting system. A more sophisticated solution could be to model the ER process
as a binary model indicating whether two records are matching and then apply directly the
explanation engine, analogously to [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. This can be non-trivial and it is left as future work.
Indeed (i) computing pair-wise explanations exhaustively can be unfeasible for large datasets
and (ii) diferent record pairs in the same entity can be matched for diferent reasons (e.g., some
camera pairs could share only the model name while others could share not only the model
name but also other technical specifications) and thus important tokens may vary significantly
among pairs.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          , W.-C. Tan,
          <article-title>Deep entity matching with pre-trained language models</article-title>
          , arXiv:
          <year>2004</year>
          .
          <volume>00584</volume>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Sanh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Debut</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chaumond</surname>
          </string-name>
          , T. Wolf,
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          , arXiv:
          <year>1910</year>
          .
          <volume>01108</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Pennington</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Socher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Manning</surname>
          </string-name>
          , Glove:
          <article-title>Global vectors for word representation</article-title>
          ,
          <source>in: EMNLP</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          , “
          <article-title>Why Should I Trust you?” Explaining the Predictions of Any Classifier</article-title>
          , in: KDD,
          <year>2016</year>
          , pp.
          <fpage>1135</fpage>
          -
          <lpage>1144</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Crescenzi</surname>
          </string-name>
          ,
          <string-name>
            <surname>A. De Angelis</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mazzei</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Piai</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Alaska: A flexible benchmark for data integration tasks</article-title>
          ,
          <source>arXiv:2101.11259</source>
          (
          <year>2021</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Hochreiter</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schmidhuber</surname>
          </string-name>
          ,
          <article-title>Long short-term memory</article-title>
          ,
          <source>Neural computation 9</source>
          (
          <year>1997</year>
          )
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Lewis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Goyal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghazvininejad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mohamed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Levy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Stoyanov</surname>
          </string-name>
          , L. Zettlemoyer, Bart:
          <article-title>Denoising sequence-to-sequence pre-training for natural language generation, translation, and comprehension</article-title>
          , arXiv preprint arXiv:
          <year>1910</year>
          .
          <volume>13461</volume>
          (
          <year>2019</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>D.</given-names>
            <surname>Dou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          , H. Liu,
          <article-title>Semantic Data Mining: A Survey of Ontology-based Approaches</article-title>
          , in: ICSC,
          <year>2015</year>
          , pp.
          <fpage>244</fpage>
          -
          <lpage>251</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>X.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <article-title>On Conceptual Labeling of a Bag of Words</article-title>
          ,
          <source>in: Int. Joint Conference on Artificial Intelligence</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>F.</given-names>
            <surname>N. C. de Araújo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V. P.</given-names>
            <surname>Machado</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. H. M.</given-names>
            <surname>Soares</surname>
          </string-name>
          ,
          <string-name>
            <surname>R. de M.S. Veras</surname>
          </string-name>
          ,
          <source>Automatic Cluster Labeling Based on Phylogram Analysis, in: IJCNN</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>8</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. R.</given-names>
            <surname>Karger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Oh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Shah</surname>
          </string-name>
          ,
          <article-title>Eficient Crowdsourcing for Multi-class Labeling</article-title>
          , in: SIGMETRICS,
          <year>2013</year>
          , pp.
          <fpage>81</fpage>
          -
          <lpage>92</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Z. C.</given-names>
            <surname>Lipton</surname>
          </string-name>
          ,
          <source>The Mythos of Model Interpretability, ACM Queue 16</source>
          (
          <year>2018</year>
          )
          <fpage>31</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <surname>A Unified</surname>
          </string-name>
          <article-title>Approach to Interpreting Model Predictions</article-title>
          ,
          <source>in: Advances in Neural Information Processing Systems</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>4765</fpage>
          -
          <lpage>4774</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M. T.</given-names>
            <surname>Ribeiro</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Guestrin</surname>
          </string-name>
          ,
          <article-title>Anchors: High-precision Model-agnostic Explanations</article-title>
          ,
          <source>in: Proc. of the 32th AAAI Conf. on Artificial Intelligence</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>V. D.</given-names>
            <surname>Cicco</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Firmani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Koudas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Merialdo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <article-title>Interpreting deep learning models for entity resolution: an experience report using LIME</article-title>
          , in: aiDM@SIGMOD,
          <year>2019</year>
          , pp.
          <volume>8</volume>
          :
          <fpage>1</fpage>
          -
          <issue>8</issue>
          :
          <fpage>4</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>