<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Schema.org Table Annotation Benchmark</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Keti Korini</string-name>
          <email>kkorini@uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ralph Peeters</string-name>
          <email>ralph.peeters@uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bizer</string-name>
          <email>christian.bizer@uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="editor">
          <string-name>Table Annotation, Column Type Annotation, Column Property Annotation, Schema.org</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2022</year>
      </pub-date>
      <abstract>
        <p>Understanding the semantics of table elements is a prerequisite for many data integration and data discovery tasks. Table annotation is the task of labeling table elements with terms from a given vocabulary. This paper presents the WDC Schema.org Table Annotation Benchmark (SOTAB) for comparing the performance of table annotation systems. SOTAB covers the column type annotation (CTA) and columns property annotation (CPA) tasks. SOTAB provides ∼50,000 annotated tables for each of the tasks containing Schema.org data from diferent websites. The tables cover 17 diferent types of entities such as movie, event, local business, recipe, job posting, or product. The tables stem from the WDC Schema.org Table Corpus which was created by extracting Schema.org annotations from the Common Crawl. Consequently, the labels used for annotating columns in SOTAB are part of the Schema.org vocabulary. The benchmark covers 91 types for CTA and 176 properties for CPA distributed across textual, numerical and date/time columns. The tables are split into fixed training, validation and test sets. The test sets are further divided into subsets focusing on specific challenges, such as columns with missing values or diferent value formats, in order to allow a more fine-grained comparison of annotation systems. The evaluation of SOTAB using Doduo and TURL shows that the benchmark is dificult to solve for current state-of-the-art systems.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Benchmark</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Tables containing structured data are widely used on the Web. Understanding the semantics of
tables is useful for a variety of data integration and data discovery tasks such as knowledge
base augmentation [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] or dataset search [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Table annotation is the task of annotating a table
with terms from given vocabulary, knowledge graph, or database schema. Table annotation
includes tasks such as Column Type Annotation (CTA) and Column Property Annotation (CPA).
CTA is the annotation of table columns with the type of the entities contained in a column.
CPA refers to the annotation of pairs of table columns with labels that indicate the relationship
between the main column of the table and another column. Figure 1 shows an example of a
table describing hotels with CTA labels shown above the table and CPA labels below.
      </p>
      <p>
        This paper presents the WDC Schema.org Table Annotation Benchmark (SOTAB) for comparing
the performance of table annotation systems on the CTA and CPA tasks. The CTA dataset
consists of 59,548 tables covering 17 Schema.org [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] types of which 162,351 columns have been
annotated using 91 Schema.org types and properties. The CPA dataset consists of 48,379 tables
where 174,998 column pairs are annotated using 176 Schema.org properties. The tables used
in the benchmark originate from the WDC Schema.org Table Corpus which was created by
extracting Schema.org annotations from the December 2020 version of the Common Crawl.
Each table in the corpus contains all entities of a specific Schema.org type that are provided
by a specific host, for example all movies annotated on imdb.com. The columns of the table
are the attributes that are used by the host for describing the entities. Overall, the SOTAB
tables contain data gathered from 74,215 diferent hosts which makes the benchmark data quite
heterogeneous.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Related Work</title>
      <p>
        This section provides an overview of the existing benchmarks for evaluating table annotation
systems and compares them to SOTAB in Table 1. The ToughTables, HardTables, BioDiv and
GitTables datasets are used by the Semantic Web Challenge on Tabular Data to Knowledge Graph
Matching (SemTab) which is a benchmark competition for table annotation systems that takes
place every year as part of the International Semantic Web Conference [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The ToughTables
(2T) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] dataset’s tables are divided into easily solvable tables and tough tables harder to predict
and are annotated for CTA. The Hard Tables [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] dataset was generated querying DBpedia using
SPARQL queries to create tables that resemble tables found in the Web. The GitTables [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] corpus
is made of 1M tables from GitHub annotated with DBpedia and Schema.org [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] classes and
properties. A subset of this corpus was annotated for CTA. The BioDiv [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] dataset’s tables
belong to the biodiversity domain and contain numerical data and abbreviations in their column
values. Further datasets for benchmarking table annotation systems include: T2Dv2 [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] which
consists of Web tables annotated with DBpedia properties; WikiTables-TURL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which consists
of Wikipedia tables annotated using terms from Freebase [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]; and RedTab [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] which ofers
tables belonging to the music and literature domain and being manually annotated for CPA.
      </p>
    </sec>
    <sec id="sec-4">
      <title>3. Creation of the SOTAB Benchmark</title>
      <p>This section describes the selection of the SOTAB tables from the WDC Schema.org Table
Corpus, the assignment of lables to the tables, as well as well the selection of challenging
columns into specific subset of the test set.</p>
      <p>The WDC Schema.org Table Corpus was created using Schema.org data that was extracted
from the December 2020 version of the Web Data Commons Microdata and JSON-LD corpus.
The Schema.org Table Corpus consists of 4.2 million relational tables covering 43 Schema.org
types. Each table in the corpus contains the descriptions of all entities of a specific Schema.org
type that were extracted for a specific host, e.g. all movie records from imdb.com or product
records from ebay.com. All extracted entities of one Schema.org type are collected per host and
subsequently passed through a pipeline of processing and cleansing steps. For more details
about the Schema.org Table Corpus we refer the reader to the project website1.</p>
      <p>
        Table Selection. We begin building SOTAB with a language identification phase on the
tables from the Schema.org Table Corpus. The language identification phase aims to filter out
non-English rows. For this purpose, we use the fastText language identification model [
        <xref ref-type="bibr" rid="ref13 ref14">13, 14</xref>
        ]
and keep rows from tables where the model is at least 50% confident that the language is English.
Finally, we remove all tables with less than 10 remaining rows and less than 3 columns.
      </p>
      <p>Label Generation. As the WDC Schema.org Table Corpus uses Schema.org properties as
column headers, these properties can be directly used as labels for the CPA task. We derive
the CTA label for a column from its CPA label using the Schema.org vocabulary definition
which specifies the types that are allowed as property values. In cases where the vocabulary
definition allows multiple types, a manual selection of the most appropriate type is done. Lastly,
1http://webdatacommons.org/structureddata/schemaorgtables/
we add some CTA annotations that are not included in the Schema.org vocabulary such as
IdentifierAT , MusicArtistAT and Museum/name. This is done with the purpose of including more
ifne-grained labels instead of for example simply name, so that an annotation system needs to
better understand the semantics to select the correct annotation. After assigning the labels,
another filtering step is performed based on label frequency: We only keep the columns that
have CPA and CTA labels that are used at least 50 times.</p>
      <p>Test Sets for Specific Challenges. In addition to the full test sets for both tasks, we provide
subsets of the test sets that measure how good the systems can handle specific annotation
challenges. We provide test sets for the following challenges: (i) Missing Values: This set
contains columns having a value density between 10 and 70 percent and are thus harder to
predict, (ii) Format Heterogeneity: columns whose values are represented using diferent value
formats such as date or weight columns and (iii) Corner Cases: columns that are dificult to
annotate as their values are very similar to the values of other columns. Examples of corner
cases include startDate versus endDate and currenciesAccepted versus priceCurrency.</p>
    </sec>
    <sec id="sec-5">
      <title>4. Profiling the SOTAB Benchmark</title>
      <p>The selection and labeling phase results in 46,790 tables annotated for the CTA task and 48,379
tables annotated for the CPA task which cover 17 Schema.org classes such as Person, Event
or JobPosting. The selected table columns include three data types: textual values, numerical
values and DateTime values. All CPA tables include a main (subject) column in the first column
position used in pairs with other columns for the CPA task. 91 CTA labels are used to annotate
162,351 columns in the CTA tables such as Organization, Date and Ofer , and 176 CPA labels are
used to annotate 174,998 main column/column pairs in the CPA tables like price, datePublished
and productID. Detailed statistics about the number of tables per Schema.org class as well as
number of columns per label are provided on the SOTAB project page2. We split the CTA
and CPA tables with a ratio of 80:10:10 into fixed training, validation and test sets by using
multi-label stratification to include examples of all labels in every set. We further split the
training set into a smaller subset using the same method and provide the Small training set
with the goal of comparing how methods perform when trained on less examples. Furthermore,
we provide subsets of columns in the test set for specific annotation challenges. These subsets
are created by grouping the columns in the test set by the challenges mentioned in Section 3:
2http://webdatacommons.org/structureddata/sotab/
test columns that have missing values (Test MV ), test columns that are corner cases (Test CC),
test columns that include values in diferent formats ( Test FH) and test columns that are chosen
randomly (Test RC). Statistics of all splits are given in Table 2.</p>
    </sec>
    <sec id="sec-6">
      <title>5. Benchmark Evaluation</title>
      <p>
        We evaluate the dificulty of SOTAB using three supervised methods. The first is a simple
Random Forest classifier using TF-IDF-weighted words as features. The second is TURL [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ],
which is pre-trained on a large corpus of Wikipedia tables using a combination of the Masked
Language Model objective of BERT and a Masked Entity Recovery objective introduced by the
authors. The last is Doduo [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] which uses multi-task learning for the simultaneous fine-tuning
of BERT for the CTA and CPA tasks. We fine-tune TURL for 50 epochs and Doduo for 30 epochs
using a learning rate of 5e-5. We report the micro-F1 score on the test sets for CTA and CPA in
Table 3. In both CTA and CPA tasks, there is a significant diference in the F1 scores among the
methods, with Transformer methods achieving at least 17 percentage points more than Random
Forest. This can be an indicator that incorporating table context helps the model make better
decisions for both tasks. The corner cases columns (CC) and columns with missing values (MV)
appear to be the subsets where both methods in both tasks make more prediction errors. Finally,
training with the smaller set leads to 3-8 percentage points lower scores on both tasks. The F1
scores show that SOTAB is challenging for all methods.
      </p>
    </sec>
    <sec id="sec-7">
      <title>6. Conclusion and Availability</title>
      <p>This paper introduced the WDC SOTAB benchmark. The aim of SOTAB is to complement
the set of publicly available table annotation benchmarks with a CTA and CPA benchmark
covering various entity types of general interest, e.g. products, local business, job postings,
and to provide training data from many independent data sources for these types in order to
reflect the full heterogeneity of the values that are used to describe entities. The WDC SOTAB
benchmark is available for public download on the project page. The code that was used for the
creation of the benchmark is provided on github3.
3https://github.com/wbsg-uni-mannheim/wdc-sotab</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Oulabi</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Bizer, Profiling the Potential of Web Tables for Augmenting Cross-domain Knowledge Bases</article-title>
          ,
          <source>in: Proceedings of the 25th International Conference on World Wide Web</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>261</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Simperl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Koesten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Konstantinidis</surname>
          </string-name>
          , L.
          <string-name>
            <surname>-D. Ibáñez</surname>
          </string-name>
          , et al.,
          <article-title>Dataset search: A survey</article-title>
          ,
          <source>The VLDB Journal</source>
          <volume>29</volume>
          (
          <year>2020</year>
          )
          <fpage>251</fpage>
          -
          <lpage>272</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>R. V.</given-names>
            <surname>Guha</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Brickley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Macbeth</surname>
          </string-name>
          , Schema.org:
          <article-title>Evolution of structured data on the web</article-title>
          ,
          <source>Communications of the ACM</source>
          <volume>59</volume>
          (
          <year>2016</year>
          )
          <fpage>44</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          , et al.,
          <source>Results of SemTab</source>
          <year>2021</year>
          ,
          <article-title>in: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          , volume
          <volume>3103</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2022</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>V.</given-names>
            <surname>Cutrona</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Palmonari</surname>
          </string-name>
          , Tough Tables:
          <article-title>Carefully Evaluating Entity Linking for Tabular Data</article-title>
          ,
          <source>in: Proceedings of the 19th International Semantic Web Conference</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>328</fpage>
          -
          <lpage>343</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>E.</given-names>
            <surname>Jiménez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Hassanzadeh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Efthymiou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          , K. Srinivas,
          <string-name>
            <surname>SemTab</surname>
          </string-name>
          <year>2019</year>
          :
          <article-title>Resources to Benchmark Tabular Data to Knowledge Graph Matching Systems</article-title>
          , in: The Semantic Web, Springer International Publishing,
          <year>2020</year>
          , pp.
          <fpage>514</fpage>
          -
          <lpage>530</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>M.</given-names>
            <surname>Hulsebos</surname>
          </string-name>
          , Ç. Demiralp, P. Groth, GitTables:
          <string-name>
            <given-names>A</given-names>
            <surname>Large-Scale Corpus</surname>
          </string-name>
          of Relational Tables, arXiv:
          <fpage>2106</fpage>
          .07258 (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>N.</given-names>
            <surname>Abdelmageed</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Schindler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>König-Ries</surname>
          </string-name>
          ,
          <article-title>BiodivTab: A Table Annotation Benchmark based on Biodiversity Research Data, in: Proceedings of the Semantic Web Challenge on Tabular Data to Knowledge Graph Matching</article-title>
          , volume
          <volume>3103</volume>
          ,
          <string-name>
            <surname>CEUR-WS</surname>
          </string-name>
          ,
          <year>2021</year>
          , pp.
          <fpage>13</fpage>
          -
          <lpage>18</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <surname>Matching Web Tables To DBpedia - A Feature Utility</surname>
            <given-names>Study</given-names>
          </string-name>
          ,
          <source>in: Proceedings of the 20th International Conference on Extending Database Technology</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>210</fpage>
          -
          <lpage>221</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Sun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lees</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>Yu, TURL: Table understanding through representation learning</article-title>
          ,
          <source>Proceedings of the VLDB Endowment</source>
          <volume>14</volume>
          (
          <year>2020</year>
          )
          <fpage>307</fpage>
          -
          <lpage>319</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>K.</given-names>
            <surname>Bollacker</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Evans</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Paritosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Sturge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Taylor</surname>
          </string-name>
          , Freebase:
          <article-title>a collaboratively created graph database for structuring human knowledge</article-title>
          ,
          <source>in: Proceedings of the ACM SIGMOD International Conference on Management of Data</source>
          ,
          <year>2008</year>
          , pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>S.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. F.</given-names>
            <surname>Aji</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Singh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Christodoulopoulos</surname>
          </string-name>
          ,
          <article-title>A relation extraction dataset for knowledge extraction from web tables</article-title>
          ,
          <source>in: Proceedings of the 29th International Conference on Computational Linguistics</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>2319</fpage>
          -
          <lpage>2327</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          , T. Mikolov,
          <article-title>Bag of tricks for eficient text classification</article-title>
          ,
          <source>in: Proceedings of the 15th Conference of the European Chapter of the Association for Computational Linguistics: Volume</source>
          <volume>2</volume>
          ,
          <year>2017</year>
          , pp.
          <fpage>427</fpage>
          -
          <lpage>431</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Joulin</surname>
          </string-name>
          , E. Grave,
          <string-name>
            <given-names>P.</given-names>
            <surname>Bojanowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Douze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Jégou</surname>
          </string-name>
          , et al.,
          <source>Fasttext</source>
          .zip:
          <article-title>Compressing text classification models</article-title>
          ,
          <source>arXiv preprint arXiv:1612.03651</source>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Suhara</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , Ç. Demiralp, et al.,
          <article-title>Annotating columns with pre-trained language models</article-title>
          ,
          <source>in: Proceedings of the 2022 International Conference on Management of Data</source>
          ,
          <year>2022</year>
          , pp.
          <fpage>1493</fpage>
          -
          <lpage>1503</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>