<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>MWPD2020: Semantic Web Challenge on Mining the Web of HTML-embedded Product Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ziqi Zh</string-name>
          <email>ziqi.zhang@sheffield.ac.uk</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christi</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>n Biz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>lph P</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Primp</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universitat Mannheim</institution>
          ,
          <addr-line>Schloss, 68131 Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of She eld</institution>
          ,
          <addr-line>Broomhall, She eld S10 2TG</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This paper gives an overview of the Semantic Web Challenge on Mining the Web of HTML-embedded Product Data (MWPD2020) which has been conducted as part of the International Semantic Web Conference (ISWC2020). The challenge consists of two tasks: product matching and product classi cation. In the rst task, participants need to identify o ers for the same product originating from di erent websites. The goal of the second task is to categorize o ers from di erent websites into the GS1 GPC product hierarchy. Six teams from the USA, China, Japan, and Germany participated in the challenge. The winning system in Task 1, PMap, achieved an F1 score of 86.05 using an ensemble of transformer-based language models. Task 2 was won by team Rhinobird achieving a weighted average F1 score of 88.62 using a BERT-based ensemble which considers the dependencies among di erent category levels.</p>
      </abstract>
      <kwd-group>
        <kwd>entity matching hierarchical classi cation e-commerce schema</kwd>
        <kwd>org microdata benchmarking</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Recent years have seen signi cant growth of semantic annotations on the Web,
using markup languages such as Microdata together with the schema.org
vocabulary. A particular domain that is witnessing the boom of semantic annotations
is e-commerce, where online shops are increasingly embedding schema.org
annotations into HTML-pages describing products in order to enable search engines
to easily identify product o ers and potentially drive tra c to the respective
websites. Statistics from the Web Data Commons (WDC) project3 show that,
as of November 2018, 37% of web pages, or 30% of websites contain semantic
annotations, amounting to over 30 billion facts. Among these, nearly 20% are</p>
    </sec>
    <sec id="sec-2">
      <title>Copyright c 2020 for this paper by its authors. Use permitted under Creative Com</title>
      <p>mons License Attribution 4.0 International (CC BY 4.0).</p>
      <sec id="sec-2-1">
        <title>3 http://webdatacommons.org/structureddata/</title>
        <p>
          related to products. Such structured product data on the Web have created
opportunities for new services, such as product search and integration platforms,
recommender systems [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], as well as emerging research elds such as product
knowledge graphs [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ].
        </p>
        <p>
          Many websites have started to semantically annotate product identi ers
within their pages. This enables the identi cation of o ers for the same product
on di erent websites. The resulting clusters of product descriptions can be used
as weak supervision for training product matchers, which in turn can be applied
to identify products on websites that do not provide product identi ers [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ].
However, there are also challenges associated with the annotations. For example, less
than 10% of the o ers are annotated with a product category [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], while the
used categorization systems are website-speci c and highly inconsistent across
di erent websites [
          <xref ref-type="bibr" rid="ref22">22</xref>
          ].
        </p>
        <p>The potentials as well as the challenges resulting from the wide-spread
availability of semantically annotated product data on the Web motivated the
Semantic Web Challenge on Mining the Web of HTML-embedded Product Data
(MWPD2020), as well as the speci c tasks of the challenge: product matching
and product classi cation. In the rst task, participants need to identify o ers
for the same product originating from di erent websites. The goal of the second
task is to categorize o ers from di erent websites into the GS1 GPC product
hierarchy. For both tasks, we have assembled training, validation, and test sets
consisting of semantically annotated product data from a wide variety of di erent
websites.</p>
        <p>The event attracted a total of six participating teams including research
institutions as well as commercial entities from the USA, China, Japan, and
Germany. The winning team for the product matching task represents the
National Institute of Informatics, Japan; while the winning team for the product
classi cation task represents the Tongji University of China and Tencent, China.</p>
        <p>The remainder of this paper is structured as follows. Section 2 and 3
explain the two tasks, including their objectives, datasets, and evaluation
metrics; Section 4 presents the results of the challenge and gives an overview of
the participating systems; and Section 5 concludes the paper with some lessons
learned from the challenge and a comparison of MWPD2020 to related
benchmark events. More information about MWPD2020 is found on the challenge's
website which also provides all datasets for public download4.
2</p>
        <sec id="sec-2-1-1">
          <title>Task 1 - Product Matching</title>
          <p>E-commerce websites frequently annotate product identi ers, product titles,
product descriptions, brands, and product prizes within their pages using schema.
org terms. In addition, o ers are often accompanied by speci cation tables, i.e.
HTML tables that contain product details in the form of key/value pairs. Given
the syntactic, structural and semantic heterogeneity among the o ers, it is
challenging to identify which o ers refer to the same product, a problem known as</p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>4 https://ir-ischool-uos.github.io/mwpd/</title>
        <p>product matching. In this task, product matching is handled as a binary
classi cation problem: given two product o ers, the participating systems need to
decide whether the o ers describe the same product (matching) or not
(nonmatching).
2.1</p>
        <p>Datasets
The participants of Task 1 were given a large corpus of product o ers which are
grouped into clusters referring to the same product. This corpus could be used
by the participants to assemble training sets of di erent width and depth. In
order to ease starting to work on the task, we also provide a readily assembled
example training and validation set. The test set that was used to evaluate
the participating systems was kept secret during the submission period of the
challenge and was released afterwards.</p>
        <p>
          Product Data Corpus. The WDC Product Data Corpus [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ] was released in
20185 by the Web Data Commons project and is the largest publicly available
product data corpus. It consists of 26 million product o ers originating from 70
thousand di erent e-shops. Exploiting the weak supervision found on the web
in the form of product identi ers, such as GTINs or MPNs, product o ers are
grouped into 16 million clusters. The clusters can be used to derive training sets
containing matching and non-matching pairs of o ers. The grouping of o ers
into clusters is subject to some degree of noise which is approximately 7%. The
following attributes are used for describing the product o ers in the corpus
and can be used for training: title, description, brand, price, specTableContent
which contains the content of the speci cation tables found in the website of
the product o er, keyValuePairs which are the heuristically extracted key/value
pairs from the speci cation tables and category which is one of the 25 categories
the o er was assigned to. Additionally, two identi er attributes are assigned to
every product o er: the id which is the unique identi er of the o er and the
cluster id which is the identi er of the cluster to which the o er belongs.
Example Training Set. Generating interesting matching and non-matching
pairs of o ers which can be used for training powerful matching models, is a
non-trivial and resource intense task. Therefore, we o er an example of a
training set derived from the product corpus, with the goal to additionally support
the participants of this task. Being a direct subset of the product corpus, the
example training set is subject to some inherent noise. The example training
set contains 68K pairs of matching and non-matching o ers from 772 distinct
products (clusters of o ers). We o er the example training set in JSON format.
Every JSON object in the training set describes a pair of o ers (left o er - right
o er) using the o er attributes together with their corresponding matching label.
Figure 1 shows an example of a non-matching product o er pair in the example
training set.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>5 http://webdatacommons.org/largescaleproductcorpus/v2/</title>
        <p>Validation Set. We provide a validation set consisting of 1,100 o er pairs from
the Computers and Accessories category as the ground truth for this task. The
validation set has the same structure as the example training set. The ratio of
matching to non-matching pairs is 3:8. The o ers of the validation set are
derived from 745 distinct products (clusters). Table 1 presents for the training and
validation sets, the average attribute density, the average length in characters of
the attribute values, as well as the standard deviation of the value length.</p>
        <p>With the aim to construct the validation set using a good mixture of hard
and easy matching and non-matching pairs of o ers distributed over di erent
products, we applied the following heuristic: First 150 clusters of the category
Computers and Accessories are randomly selected. Considering the clustering
scheme, we construct all possible matching and non-matching pairs. For each
o er pair we randomly pick the Jaccard similarity over the titles, the descriptions
or the average of both, as a similarity metric and calculate its similarity score.
To select matching pairs, we pick within each cluster the pair with the lowest
similarity score and one randomly chosen pair and add them in the validation
set. To select negative pairs, we take for each o er two to three pairs with high
similarity and three pairs at random. All matching and non-matching pairs of
the validation set are manually veri ed. Therefore, unlike the provided example
training set, the validation set does not contain any noisy o er pairs.
Test Set. The hidden test set which is used for evaluating and ranking the
matching systems of the participants of this task, consists of 1,500 o er pairs
from the category Computers and Accessories. The o er pairs in the test set
are carefully selected in order to cover di erent types of matching challenges.
1,100 pairs are randomly selected corner cases, meaning that they are similar
non-matching pairs and dissimilar matching pairs. For the remaining 400 pairs,
we de ne a categorization scheme consisting of four speci c types of matching
challenges. The distribution of o er pairs per type of challenge remains unknown
to the participants in the rst round and is revealed to them at the beginning of
the second round to allow them to tune their systems to the speci c challenges.
The distribution of o er pairs per type of matching challenge is shown in Table 2.</p>
        <p>Matching challenge
(SN-DM) Similar non-matches, dissimilar matches
(NP-HS) New products - high similarity to known products
(NP-LS) New products - low similarity to known products
(KP-TY) Known products - introduced typos
(KP-DR) Known products - dropped tokens</p>
        <p>With the term new product we refer to products which are contained in the
WDC Product Data Corpus but not in the provided example training set. Known
products means products from clusters that have training data in the provided
training set. The similarity for choosing similar non-matching and dissimilar
matching pairs is measured using the Jaccard similarity metric on the titles of
the o ers.
2.2</p>
        <p>
          Evaluation Metrics and Baseline
For the evaluation of the product matching task we use standard precision,
recall and F1 calculated on the positive class. As a baseline, we use the
Deepmatcher [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] framework, more speci cally the RNN module using default
parameters apart from the positive/negative ratio, which we set to the actual
distribution found in the training set. This model is trained on the training set
provided for the challenge for 15 epochs, using the attributes title, description,
brand and specTableContent. We preprocess the attributes by removing some
symbols and schema.org related terms and nally lowercasing them. Since the
model relies on pre-trained word- or character-based embeddings, we use the
fastText embeddings pre-trained on the English Wikipedia6.
3
        </p>
        <sec id="sec-2-3-1">
          <title>Task 2 - Product Classi cation</title>
          <p>Same products are often sold on di erent websites, which generally organise their
products into certain categorisation systems. However, such product
categorisations di er signi cantly for di erent websites, even if they sell similar product
ranges. This makes it di cult for product information integration services to
collect and organise product o ers on the Web. The product classi cation task deals
with assigning pre-de ned product category labels from a universal catalogue to
product instances (e.g., iPhone X is a `SmartPhone', and also `Electronics').
3.1</p>
          <p>
            Datasets
Classi cation labels. In this task, the GS1 Global Product Classi cation
standard (GPC) 7 is used to classify product instances. The GPC standard classi es
products into a hierarchy of multiple levels based on their essential properties
as well as their relationships to other products. It o ers a universal standard
for organising product catalogues of any businesses. In this task, the top three
levels of GPC are used to classify each product. Level 1 contains 40 classes
such as `Automotive' and `Clothing'. Level 2 further divides level 1 classes into
more than 100 classes such as `Automotive Accessories and Maintenance', and
`Swimwear'. Then level 3 further divides level 3 classes into over 700 classes such
as `Automotive Antifreeze' and `Beachwear/Cover Ups'
Gold standard. The creation of the gold standard is based on the earlier
product classi cation dataset released by the Web Data Commons project [
            <xref ref-type="bibr" rid="ref16">16</xref>
            ].
The original dataset contained 8,361 product instances randomly sampled from
702 vendors' websites. These were manually classi ed into the above mentioned
three levels of classi cations by human annotators.
          </p>
          <p>In MWPD2020, we further extended the original GS by adding over 7,000
product instances. However, due to the complexity of the GPC hierarchy,
annotating a random sample by checking against every class in the three classi cation
levels is a very time-consuming process. Therefore, we followed a `controlled
process' detailed below.
1. Create a Solr8 index of product instances by parsing the Product Data
Corpus. The index contains ve elds corresponding to attributes of each
product: an id eld to uniquely identify a product index; a name eld recording</p>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>6 https://fasttext.cc/docs/en/pretrained-vectors.html</title>
      </sec>
      <sec id="sec-2-5">
        <title>7 https://www.gs1.org/standards/gpc</title>
      </sec>
      <sec id="sec-2-6">
        <title>8 https://lucene.apache.org/solr/</title>
        <p>the name of the product; a description eld recording the long description
of the product; a category text eld recording the product category
information as provided by the source web page; and a provenance eld recording
the source URL from which the structured data are extracted. All these
attributes are extracted from the RDF quads where available in the dataset.
2. Given an existing product instance (i.e., the reference product) in the original</p>
        <p>GS, search for its name in the description eld of the above index;
3. Select up to 50 results (i.e., the target products) with a di erent name from
the reference product;
4. Rank the results by the Levenshtein distance between the reference product's
name and the names of the target products;
5. Select the top 10 ranked results;
6. A human annotator manually evaluates the ranked results, and selects n
products that s/he deems to belong to the same level-3 class of the reference
product;
7. The selected n products are assigned the same level-3 class, as well as the
corresponding level-2 and level-1 classes by traversing the GPC hierarchy.</p>
        <p>In step 6 above, the human annotators are presented the following
information when assessing each target product: the reference product's name,
description, level-1, 2, 3 classes, provenance, website-speci c category information or
breadcrumb if available; and the target product's name, description and
provenance. They are instructed to exercise on their own discretion to decide an
optimal n, by balancing the diversity in the already selected selected target
products in terms of their name, vendor as identi ed by their provenance, and
level-3 classes.</p>
        <p>The main reasons of allowing this exibility are two-fold. First, in the original
GS, there existed certain `di cult' and often minority classes (e.g., 93070100
Seeds/Spores) for which steps 2 and 3 hardly returned many positive matches.
Second, there also existed certain `dominant' classes that represented a very large
fraction of the original GS (e.g., 67000000 Clothing was over 40%) and were
also `easier' to nd matches by steps 2 and 3. This implies that our controlled
process runs a risk of further accentuating the already-unbalanced nature of the
original GS. Thus by exercising their discretion based on the above principle, our
goal was to control the balance in the distribution of classes in the end dataset.
In practice, n ranged between 0 (no suitable target) and 6 (typically for `di cult'
classes).</p>
        <p>The annotation was conducted by two computer science researchers and
Inter-Annotator-Agreement was studied on 100 product instances where they
both annotated. A Cohen's Kappa of 97% was obtained. The end dataset
contains 16,119 instances and is stored in JSON. In addition to the ve product
attributes described before, each instance is assigned three label, corresponding
to for the three levels of classi cation. Further, the description is truncated to
a maximum of 5,000 characters. A screenshot of an example instance is shown
in Figure 2. The dataset is split into the training (10,012), validation (3,000),
and test (3,107) sets with statistics shown in Figures 3 and 4. Same as Task
1, only the training and validation sets are revealed to the participants before
the submission of their nal system output that was created on the test set.
Although the dataset are very unbalanced, with several large classes dominating
the dataset at all three levels, it is worth noting that they follow a consistent
distribution to the original GS.
Additional resources. During the process of creating the gold standard,
additional resources9 were created with an aim support participants system
development. These include:
{ A `product data textual corpus' that contains descriptions of all product
instances from the Solr index above. A light-weight cleaning process was
applied to only keep descriptions of at least 5 tokens (separated by
whitespace characters) and 20 characters. This corpus has over 1.9 billion tokens.
{ Word embedding models (both continuous Bag-of-Words (CBow) and
Skipgram) trained using the above textual corpus, by applying the Gensim10
(version 3.4.0) implementation of Word2Vec.
3.2</p>
        <p>
          Evaluation Metrics and Baseline
For each classi cation level, the standard Precision, Recall and F1 are used and
a Weighted-Average macro-F1 (WAF1) are calculated over all classes. Then the
average of the WAF1 of the three levels are be calculated and used to rank the
participating systems. For baseline, we use a con guration based on that used
in the Rakuten Data Challenge11 [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. Speci cally:
{ it uses the same FastText algorithm and parameters as in the Rakuten Data
        </p>
        <p>Challenge;
{ it uses only product names as input text;
{ all text are lowercased and lemmatised using NLTK version 3.4.5.
4</p>
        <sec id="sec-2-6-1">
          <title>Results</title>
          <p>The competition was organised in two rounds. In order to improve their systems,
the teams were shown the leaderboard after Round 1 and were informed about
the F1 scores that their systems achieved on the speci c types of matching
challenges (see Table 2). A total of six teams representing di erent industry
organisations and research and academic institutions participated in Round 1.
Six teams participated in Task 1, while ve participated in Task 2. Two teams
continued in Round 2 for Task 1, while only one team chose to continue in
Round 2 for Task 2. Five teams submitted a paper describing their system. A
brief overview of the results for both tasks is given below. Section 4.3 provides
additional details about each team and their system afterwards.
4.1</p>
          <p>Results Task 1 - Product Matching</p>
        </sec>
      </sec>
      <sec id="sec-2-7">
        <title>9 Download from: https://bit.ly/36d0NYd</title>
        <p>
          10 https://radimrehurek.com/gensim/
11 Download from: https://github.com/ir-ischool-uos/mwpd/tree/master/prodcls
and ISCAS-ICIP very close behind. An overview of each team's methods along
selected dimensions which resulted in the nal submission can be seen in
Table 4. All of the teams, who submitted system papers, employed ne-tuning of
transformer-based pre-trained language models for the task of product
matching in one form or another, often combined with some form of ensembling. Team
ISCAS-ICIP was the only team employing an ensembling approach across
different deep matching models. The other teams, who employed ensembling,
limited themselves to ne-tuned transformer-based models like e.g. BERT [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and
RoBERTa [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-8">
        <title>Team</title>
      </sec>
      <sec id="sec-2-9">
        <title>PMap</title>
      </sec>
      <sec id="sec-2-10">
        <title>Rhinobird (Round 2)</title>
      </sec>
      <sec id="sec-2-11">
        <title>Rhinobird (Round 1)</title>
      </sec>
      <sec id="sec-2-12">
        <title>ISCAS-ICIP (Round 2)</title>
      </sec>
      <sec id="sec-2-13">
        <title>ASVinSpace</title>
      </sec>
      <sec id="sec-2-14">
        <title>ISCAS-ICIP</title>
      </sec>
      <sec id="sec-2-15">
        <title>Megagon</title>
      </sec>
      <sec id="sec-2-16">
        <title>Team ISI</title>
      </sec>
      <sec id="sec-2-17">
        <title>Baseline (Deepmatcher)</title>
        <p>P</p>
        <p>Most teams applied some form of standard pre-processing to the data before
training. Team ISCAS-ICIP also employs a feature extraction approach based
on xed vocabularies and regular expressions to extract multiple features from
the textual descriptions. For this they use the provided WDC Large-Scale
Corpus for Product Matching. They further use it to expand the provided training
set with more product pairs from the relevant product category. Also, teams
Rhinobird and ASVinSpace tried augmenting the training data with pre-built
training sets of other categories from the product corpus. Most teams tried
different combinations of features and training sets during experimentation. Team
Rhinobird also tried optimizing for focal loss in addition to cross-entropy as well
as employing a self-ensembling technique over multiple training epochs of the
same model. Rhinobird and ISCAS-ICIP further implemented a post-processing
step, where they used heuristic rules to correct some of the predicted labels, e.g.
always predicting non-match if the brands of both o ers do not match.</p>
        <p>All teams were provided with information about their performance on the
speci c matching challenges in the test set (see section 2) after Round 1. These
scores could be used to improve speci c aspects of the systems for Round 2.
The results of all teams on the speci c types of matching challenges are found
in Table 5. Team ISCAS-ICIP was able to improve on their performance for
all challenges in Round 2, signi cantly improving on new products, which were
Dimension
pre-processing</p>
        <p>PMap
remove symbols and
non-alphabet chars</p>
        <p>Rhinobird
remove stopwords
and lower-case
used attributes
title</p>
        <p>title, description
matching model
matching decision
post-processing
external resources</p>
        <p>BERT-large
RoBERTa-large
RoBERTa-base
ensemble of
transformers</p>
        <p>BERT-base
ensemble of self-ensembling
transformers with di erent
loss functions
heuristic rule
to correct predictions</p>
        <p>Team</p>
        <p>ISCAS-ICIP
remove stopwords,
alphanumeric chars
and lower-case</p>
        <p>title, price
brand (extracted)
model (extracted)</p>
        <p>MPM
HierMatcher</p>
        <p>Ditto
ensemble of di erent
model types</p>
        <p>ASVinSpace</p>
        <p>title, description,
specTableContent
DistilRoBERTa-base
single model
- to choreruercitstpicrerduilcetsions
- trfaoiunriWncgaDtdCeagtocaorirseppsuafsnronming afnodr ataWodttdbrDiiutbCiiouldntceaovlreoptxcrutaarsbianuucilstnaiegordinedsatfoar trfaoiunriWncgaDtdCeagtocaorirseppsuafsnronming</p>
        <p>Table 4. Overview of the systems submitted for Task 1
not contained in the provided training set. For Round 2 they exchanged one of
the matching models in their ensemble with a transformer-based model. This
may suggest that this kind of model is better suited for handling new products
than the previously used one (see below). Their overall result improved by 3%
F1 going from Round 1 to 2. Team Rhinobird manages to improve signi cantly
for products containing typos or dropped words while losing some performance
across the other classes. They changed one of the BERT models in their ensemble
of three to one trained with a di erent loss function for Round 2. Their overall
result improves by only 0.5% F1 from Round 1 to 2, trading 2% Precision for
4% Recall. Overall, there is no team that consistently beats the others across all
challenges, which is not surprising, as they all apply similar approaches and the
overall results of the top teams vary only within 1% F1.
4.2</p>
        <p>Results Task 2 - Product Classi cation</p>
      </sec>
      <sec id="sec-2-18">
        <title>Team</title>
      </sec>
      <sec id="sec-2-19">
        <title>PMap</title>
      </sec>
      <sec id="sec-2-20">
        <title>Rhinobird (Round 2)</title>
      </sec>
      <sec id="sec-2-21">
        <title>Rhinobird (Round 1)</title>
      </sec>
      <sec id="sec-2-22">
        <title>ISCAS-ICIP (Round 2)</title>
      </sec>
      <sec id="sec-2-23">
        <title>ASVinSpace</title>
      </sec>
      <sec id="sec-2-24">
        <title>ISCAS-ICIP (Round 1)</title>
      </sec>
      <sec id="sec-2-25">
        <title>Megagon</title>
      </sec>
      <sec id="sec-2-26">
        <title>Team ISI</title>
      </sec>
      <sec id="sec-2-27">
        <title>F1 on speci c type of matching challenge</title>
        <p>SN-DM</p>
        <p>NP-HS NP-LS KP-TY</p>
        <p>KP-DR
of existing pre-trained language models (e.g., DICE) to very complex structures
that combine 17 di erent models through ensemble (e.g., Rhinobird).</p>
        <p>
          In terms of the text input used for supervised learning, all teams used product
name and description, except ASVinSpace that also used URL. However, it is
unclear what the e ect of URL is, due to a lack of ablation test. Interesting, no
teams used the category text provided as-is by the source vendor websites, even
thought such content proves to be useful for such tasks [
          <xref ref-type="bibr" rid="ref16 ref22">16, 22</xref>
          ]
        </p>
        <p>In terms of using external resources (excluding the use of pre-trained
language models) to support the learning, Team ASVinSpace used a novel
approach that extends the training set by harvesting data from Wikidata. None
of the teams used the pre-trained embedding models or the product description
corpus. However, Table 6 demonstrates that the pre-trained embedding models
can be e ective for further enhancing the learning.</p>
        <p>Teams</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Rhinobird (Round 1)</title>
    </sec>
    <sec id="sec-4">
      <title>Rhinobird (Round 2) ISI</title>
    </sec>
    <sec id="sec-5">
      <title>ASVinSpace</title>
    </sec>
    <sec id="sec-6">
      <title>Megagon</title>
    </sec>
    <sec id="sec-7">
      <title>DICE</title>
      <p>Participating Teams
In the following, we summarize the approaches that were used by the di erent
teams. The complete details about the methods are given in the systems
papers written by the teams themselves which are contained in the MWPD2020
proceedings.</p>
      <p>
        Team Rhinobird represents the Tongji University of China and Tencent,
China, and participated in both Task 1 and 2 in both rounds. For task 1, they
rely on the BERT model while experimenting with di erent loss functions and
ensembling steps. More speci cally, after removing stopwords and lower-casing
the data, they ne-tune multiple BERT models while experimenting with di
erent training sets, features as well as focal loss [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] as a variation to the standard
cross-entropy loss function. In addition to the provided training set, they
experiment with a larger training set containing product pairs for four product
categories. They try using only the title attribute as well as the concatentation
of title and description as input features. Furthermore, Team Rhinobird
experiments with a method of self-ensembling across multiple training epochs of the
same model, namely stochastic weight averaging (SWA) [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Finally, subsets
of all the previously mentioned models are ensembled by averaging their
prediction probabilities and subsequently selecting the best performing ensemble.
Team Rhinobird also applies some simple post-processing rules to correct
predictions of the models, more speci cally, all test pairs that do not belong to the
same product category, are set to to be non-matches. Their submission for both
rounds consists of an ensemble across three ne-tuned BERT models trained
with di erent choices for the previously mentioned parameters.
      </p>
      <p>For Task 2, Rhinobird used a BERT-based ensemble model that explicitly
considers the dependencies among di erent category levels. Such hierarchical
dependency features are encoded using a dynamic masked matrix obtained based
on the hierarchical category structure. The masked matrix as a lter that
dynamically discards the child categories irrelevant to the current parent category.
The nal ensemble model combines 17 di erent BERT models to make the nal
decisions. They used both product names and descriptions.</p>
      <p>Team PMap represents the National Institute of Advanced Industrial
Science and Technology, Japan, and participated in Task 1 in Round 1 only. They
rely on using pretrained transformer-based language models, more speci cally
they ne-tune BERT-base, BERT-large, DistilBERT-base, RoBERTa-base and
RoBERTa-large, consequently ensembling the results of some of these models
to arrive at the nal matching decision. Before ne-tuning they apply simple
preprocessing, e.g. removing symbols and non-alphabet characters using a
regular expression. Team PMap uses the datasets provided during the challenge
without further additions. They furthermore use only the title attribute as
input feature. After ne-tuning each model, based on their results, they select
the BERT-large, RoBERTa-large and RoBERTa-base models for ensembling to
reach the nal matching decision.</p>
      <p>Team ASVinSpace represents the Leipzig University, Germany and the
German Aerospace Center (DLR). They participated in both Task 1 and 2 in</p>
      <p>
        Round 1. For Task 1, they employed pre-trained transformer-based language
models, namely BERT, RoBERTa and their distilled versions [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Di erent
feature combinations are tried with the input to the model consisting of the
concatenation of the used feature strings. The standard model is augmented with a
single dense and an output layer on top of the pooled output of the [CLS] token
and subsequently ne-tuned. ASVinSpace try solving the task in two ways, once
minimizing cross-entropy loss on an output layer of size two and on the other
hand framed as a regression problem with a single output and minimizing the
mean squared error. In addition to the training set provided for the challenge,
the team further experiments with additional training data from other product
categories from the same data corpus. To handle the class imbalance inherent
to the data, the team randomly drops negative examples during each training
epoch to normalize the class distribution. The nal submitted result is achieved
by a DistilRoBERTa model using the title, description and specTableContent
attributes, ne-tuned on data from four di erent product categories.
      </p>
      <p>
        For Task 2, ASVinSpace used a CNN model adapted from [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. It used
a transformer-based language model [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] as input to the CNN layers instead
of static word embedding models. The CNN model has three output layers,
each corresponding to one of the classi cation levels, thus allowing the model
to capture inter-dependencies of the di erent classi cation tasks. In addition,
ASVinSpace also proposed to use external resources. For example, names of
examples from the training set are used to retrieve relevant entities from
Wikidata. Then, the corresponding descriptions and the GPC standard are used
to disambiguate the retrieved entities in order to select only the ones that are
highly similar (using a cosine similarity metric based on Tf-IDF weighted feature
vectors) to the classi cation examples. These `expanded' entities are manually
annotated to create additional training data. In terms of the text input, they
used the concatenation of product names, descriptions, and URLs.
      </p>
      <p>
        Team ISCAS-ICIP represents the Chinese Academy of Sciences, and
participated in Task 1 in Round 1 and 2. Their approach is based on three steps:
pre-processing, entity matching and post-processing. During pre-processing the
team removes stopwords, alphanumeric characters and lower-cases the examples.
Furthermore, they apply a feature extraction approach based on vocabularies
built using the provided data corpus as well as based on regex-patterns to
extract values for the attributes brand and model. For the entity matching stage,
the team applies overall four di erent models whose results are integrated to
achieve the nal prediction using a voting mechanism. The models are MPM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
Seq2SeqMatcher [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], HierMatcher [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and Ditto [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Finally, the post-processing
module uses rules to correct predictions under certain circumstances, e.g. always
assigning the label"match if two products have an exact match on the title
attribute, or always assigning non-match if the brands of two products di er. The
team also augments the provided training and validation sets by doubling their
number using a similar sampling approach to the one used for building the
provided training sets [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. For their rst round submission, ISCAS-ICIP integrated
the results of the MPM, Seq2SeqMatcher and Hiermatcher models, while for the
second round submission they omit Seq2SeqMatcher and replace it with Ditto,
which is based on pre-trained transformer-based language models, leading to
improved performance ( 3% F1) on the evaluation test set.
      </p>
      <p>Team DICE represents the Paderborn University, Germany, and
participated in Task 2 in Round 1 only. The team used a simple adaptation of the
BERT language model, by adding on top a fully-connected layer (i.e Dense) and
using using a sigmoid activation function as replacement of the original softmax
for classi cation. They used both product names and descriptions as input.</p>
      <p>Team Megagon represents megagon.ai and Team ISI represents the
University of Southern California, USA. They participated in both Tasks in Round
1, but did not submit a system paper.
5</p>
      <sec id="sec-7-1">
        <title>Conclusion</title>
        <p>
          The systems that were successful in the challenge all employed pre-trained
transformer based language models which underlines the potential of these models for
Web data integration tasks [
          <xref ref-type="bibr" rid="ref12 ref19 ref4">4, 19, 12</xref>
          ]. Especially the good results of systems
using RoBERTa show the bene ts of transferring knowledge that has been learned
using less structured web content from diverse sources12 to integration tasks
involving structured web content, such as the matching and categorization tasks
addressed in the challenge.
        </p>
        <p>
          Several other benchmark competitions on product matching and product
classi cation have been conducted in the last years: The SIGIR 2018 eCom
Rakuten Data Challenge [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] focused on product classi cation, where individual
products are classi ed into a hierarchy of over 3,000 categories in a
companyspeci c catalogue (i.e., the Rakuten product catalogue). Compared to the Rakuten
challenge which only involved product descriptions from a single source, our
classi cation task involves more heterogeneous product descriptions from many
websites. The 2019 and 2020 workshops on Challenges and Experiences from
Data Integration to Knowledge Graphs (DI2KG2019) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]13 and (DI2KG2020)14
focus on knowledge graph creation from product speci cations which were
extracted from the Web. The workshops feature three shared tasks: entity
resolution, schema matching, attribute matching. Products are described in the
DI2KG dataset using distinct attributes such as screen size, display type, or
refresh rate. Compared to the DI2KG entity resolution task, our matching task
involves dealing with less structured textual product data.
        </p>
        <p>
          Based on the ndings from this event, we identify several remaining gaps in
the current research: First, despite the dominance of transformer based language
models, there remains a signi cant degree of variety in terms of how such models
can be adapted and/or combined for data integration tasks. There is also a lack
12 RoBERTa was pre-trained using di erent subsets of the CommonCrawl along with
several text corpora.
13 http://di2kg.inf.uniroma3.it/2019/
14 http://di2kg.inf.uniroma3.it/2020/
of systematic study of how these architectures compare under a uniform
experimental setting. Second, there is a lack of exploration into what kind of external
resources can be used to support such tasks and how they can be used to do
so. For example, our product data textual corpus could be used to ne-tune a
language model following an approach such as [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ], which showed further gain in
terms of domain-speci c tasks [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ]. Finally, in terms of mining product
information on the Web from a more general point of view, recent research [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] focused
on harvesting and cleaning structured product data on the Web. However, there
is a lack of studies on how such data could be used to enable self-supervised
learning in downstream tasks [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. We encourage future research to investigate
these directions.
        </p>
        <p>Acknowledgements This event is partially sponsored by Peak Indicators, UK15.
15 https://www.peakindicators.com/</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bai</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ge</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lu</surname>
          </string-name>
          , H.:
          <article-title>Joint interaction with context operation for collaborative ltering</article-title>
          .
          <source>Pattern Recognition</source>
          <volume>88</volume>
          ,
          <issue>729</issue>
          {
          <fpage>738</fpage>
          (
          <year>2019</year>
          ). https://doi.org/doi:10.1016/j.patcog.
          <year>2018</year>
          .
          <volume>12</volume>
          .003
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Beltagy</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lo</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cohan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Scibert: A pretrained language model for scienti c text</article-title>
          . arXiv preprint arXiv:
          <year>1903</year>
          .
          <volume>10676</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Primpeli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peeters</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Using the semantic web as a source of training data</article-title>
          .
          <source>Datenbank-Spektrum</source>
          <volume>19</volume>
          (
          <issue>2</issue>
          ),
          <volume>127</volume>
          {
          <fpage>135</fpage>
          (
          <year>2019</year>
          ). https://doi.org/10.1007/s13222-019-00313-y
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lees</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>TURL: Table Understanding through Representation Learning</article-title>
          . arXiv:
          <year>2006</year>
          .14806 [cs] (
          <year>Jun 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          : BERT:
          <article-title>Pre-training of Deep Bidirectional Transformers for Language Understanding</article-title>
          .
          <source>In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies</source>
          . pp.
          <volume>4171</volume>
          {
          <issue>4186</issue>
          (Jun
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          :
          <article-title>Challenges and innovations in building a product knowledge graph</article-title>
          .
          <source>In: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery &amp; Data Mining</source>
          . p.
          <fpage>2869</fpage>
          .
          <article-title>Association for Computing Machinery (</article-title>
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Firmani</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Crescenzi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Angelis</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazzei</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Merialdo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Proceedings of the 1st international workshop on challenges and experiences from data integration to knowledge graphs</article-title>
          .
          <source>CEUR Workshop Proceedings</source>
          , vol.
          <volume>2512</volume>
          .
          <string-name>
            <surname>CEUR-WS.org</surname>
          </string-name>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Hierarchical Matching Network for Heterogeneous Entity Resolution</article-title>
          .
          <source>In: Proceedings of the Twenty-Ninth International Joint Conference on Arti cial Intelligence</source>
          . pp.
          <volume>3665</volume>
          {
          <issue>3671</issue>
          (Jul
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Han</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
          </string-name>
          , W., et al.:
          <article-title>End-to-End Multi-Perspective Matching for Entity Resolution</article-title>
          .
          <source>In: Proceedings of the Twenty-Eighth International Joint Conference on Arti cial Intelligence</source>
          . pp.
          <volume>4961</volume>
          {
          <issue>4967</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Izmailov</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Podoprikhin</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garipov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vetrov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Wilson,
          <string-name>
            <surname>A.G.</surname>
          </string-name>
          :
          <article-title>Averaging Weights Leads to Wider Optima and Better Generalization</article-title>
          . arXiv:
          <year>1803</year>
          .05407 [cs, stat] (
          <year>Feb 2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Kim</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Convolutional neural networks for sentence classi cation</article-title>
          .
          <source>In: Proc. of the Conference on Empirical Methods in Natural Language Processing (EMNLP)</source>
          . pp.
          <volume>1746</volume>
          {
          <fpage>1751</fpage>
          .
          <article-title>Association for Computational Linguistics (ACL) (</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suhara</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Deep Entity Matching with PreTrained Language Models</article-title>
          . arXiv:
          <year>2004</year>
          .00584 [cs] (
          <year>Apr 2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>T.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Girshick</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dollar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Focal Loss for Dense Object Detection</article-title>
          .
          <source>In: Proceedings of the IEEE International Conference on Computer Vision</source>
          . pp.
          <volume>2980</volume>
          {
          <issue>2988</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Das</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Overview of the sigir 2018 ecom rakuten data challenge</article-title>
          .
          <source>In: eCOM at The 41st International ACM SIGIR Conference on Research and Development in Information Retrieval. CEUR-WS.org</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ott</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Goyal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Du</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joshi</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          , et al.:
          <article-title>RoBERTa: A Robustly Optimized BERT Pretraining Approach</article-title>
          . arXiv:
          <year>1907</year>
          .
          <volume>11692</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Meusel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Primpeli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meilicke</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paulheim</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Exploiting microdata annotations to consistently categorize product o ers at web scale</article-title>
          .
          <source>In: International Conference on Electronic Commerce and Web Technologies</source>
          . pp.
          <volume>83</volume>
          {
          <fpage>99</fpage>
          . Springer International Publishing (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mudgal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rekatsinas</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Doan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Park</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          , et al.:
          <article-title>Deep Learning for Entity Matching: A Design Space Exploration</article-title>
          .
          <source>In: Proceedings of the 2018 International Conference on Management of Data</source>
          . pp.
          <volume>19</volume>
          {
          <issue>34</issue>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Nie</surname>
            , H., Han,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          , et al.:
          <article-title>Deep Sequence-to-Sequence Entity Matching for Heterogeneous Entity Resolution</article-title>
          .
          <source>In: Proceedings of the 28th ACM International Conference on Information and Knowledge Management</source>
          . pp.
          <volume>629</volume>
          {
          <issue>638</issue>
          (Nov
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Peeters</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Glavas</surname>
          </string-name>
          , G.:
          <article-title>Intermediate Training of BERT for Product Matching</article-title>
          . In: DI2KG Workshop @ VLDB (
          <year>2020</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Primpeli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peeters</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>The WDC Training Dataset and Gold Standard for Large-Scale Product Matching</article-title>
          . In: Workshop on e-Commerce and
          <article-title>NLP (ECNLP2019</article-title>
          ),
          <source>Companion Proceedings of WWW</source>
          . pp.
          <volume>381</volume>
          {
          <issue>386</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sanh</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Debut</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaumond</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wolf</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter</article-title>
          .
          <source>In: NeurIPS EMC2 Workshop</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paramita</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Product classi cation using microdata annotations</article-title>
          . In: Ghidini,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Hartig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            ,
            <surname>Maleshkova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Svatek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            ,
            <surname>Cruz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            ,
            <surname>Hogan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Song</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Lefrancois</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Gandon</surname>
          </string-name>
          ,
          <string-name>
            <surname>F</surname>
          </string-name>
          . (eds.)
          <source>The Semantic Web { ISWC 2019</source>
          . pp.
          <volume>716</volume>
          {
          <fpage>732</fpage>
          . Springer International Publishing (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>