<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LinkingPark: An Integrated Approach for Semantic Table Interpretation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Shuang Chen</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alperen Karaoglu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Carina Negreanu</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tingting Ma</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jin-Ge Yao</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jack Williams</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andy Gordon</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Chin-Yew Lin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Harbin Institute of Technology</institution>
          ,
          <addr-line>Harbin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Microsoft Research Asia</institution>
          ,
          <addr-line>Beijing</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Microsoft Research Cambridge</institution>
          ,
          <addr-line>Cambridge</addr-line>
          ,
          <country country="UK">UK</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper, we present LinkingPark, our system for Semantic Web Challenge on Tabular Data to Knowledge Graph Matching (SemTab 2020). LinkingPark is an integrated approach for semantic table interpretation. Our system includes a cascaded pipeline for candidate generation, an iterative coarse-to- ne entity disambiguation algorithm, a multi-pass property linking algorithm, and a type inference algorithm tackling the issue of loose ontology in Wikidata. Results on SemTab 2020 demonstrate the e ectiveness of our approach.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>2</p>
    </sec>
    <sec id="sec-2">
      <title>Approach</title>
      <p>Input:
Access
MediaWiki</p>
      <p>API</p>
      <p>Mention
Spelling
Corrector</p>
      <p>Finegrained
Elastic</p>
      <p>Search
Property linker
Entity Property Linker</p>
      <p>Lexical Property Linker
mPeatrfcehcetr mFautzczhyer
describe key attributes of an entity. F is the set of facts which consists of a set
of RDF triples hs; p; oi, where s denotes a subject (an entity e 2 E ), p 2 P is a
property (also known as predicate or relation) and o denotes an object (an entity
e, or a data value, e.g. number, time, string etc.). The target knowledge base of
SemTab-2020 is Wikidata.5</p>
      <p>The three matching tasks of SemTab 2020 can be described as:
{ CEA (Cell Entity Annotation): to link each entity mention string tij in table</p>
      <p>T to its referent entity in E .
{ CTA (Column Type Annotation): to associate a table column cj with an
entity type t 2 T . A column may be described by multiple types and the
most speci c one is usually preferred.
{ CPA (Columns Property Annotation): to associate a pair of columns, cs and
ct with a property p 2 P.</p>
      <p>Entity linker
Candidate generation</p>
      <p>Entity disambiguation
Coarse
Phase</p>
      <p>Candidate
Pruning</p>
      <p>Fine
Phase</p>
      <p>CEA
CPA</p>
      <p>CTA</p>
      <p>Type inference</p>
      <p>By SupportCount(t)
no
By InstanceRank(t)</p>
      <p>By AverageLevel(t)</p>
      <p>MainColumn?</p>
      <p>yes
By Population(t)
5 http://wikidata.org/
the entity disambiguation module to characterise the relatedness among di erent
rows. Finally, we design a heuristic multi-pass sieve method for type inference
based on the linked entities. Next, we describe each component in detail.
2.1
The entity linker is implemented with a typical approach that consists of two
sub-modules: candidate generation and entity disambiguation.</p>
      <p>Candidate generation Given an entity mention tij , we generate its candidate
entities Eij = (eij1; : : : ; eijk) through a cascaded pipeline which includes three
core steps:
{ Accessing Wikidata MediaWiki API: we start by accessing Wikidata
MediaWiki API6. We set the largest number of candidates returned from
this API to be 50.
{ Correcting the spelling errors: The MediaWiki API does not handle
spelling errors. Following the design principles of a typical spelling
corrector7, we implement a tailored mention spelling corrector for better candidate
retrieval. Speci cally, the corrector checks all strings within one edit distance
to the original mention string, then retains the strings among the set of
Wikidata entity titles as candidates. This step is not intended for mentions with
multiple spelling errors due to the exponential complexity in the length of
edit distance.
{ Searching using ne-grained Elastic Search: In addition, we build a
ne-grained Elastic Search index using all entity titles of Wikidata. The
Elastic Search uses a weighted combination of word-based BM25 score and
trigram-based BM25 score to do fuzzy matching. This step can improve
the recall of candidate generation, but may also return more false positive
candidates compared with the rst two steps.</p>
      <p>Entity disambiguation Given an entity mention tij along with its candidate
list Eij = (eij1; : : : ; eijk), the entity disambiguation stage aims to select the
correct entity e^ij 2 Eij from its candidate list based on their contextual
information.</p>
      <p>Formally, given a table T = fft11; : : : ; t1ng; : : : ; ftm1; : : : ; tmngg, the
objective of entity disambiguation is to nd the most compatible entity assignment
for each cell tij :</p>
      <p>
        argmax
e11;e12;:::;emn2E11 E12 Emn
g(e11; e12; : : : ; emnjT ):
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where g(e11; e12; : : : ; emnjT ) is the function measuring the compatibility score of
entity assignments in table T .
6 https://www.wikidata.org/w/api.php?action=help&amp;modules=wbsearchentities
7 https://norvig.com/spell-correct.html
Algorithm 1 coarse-to- ne disambiguation algorithm
      </p>
      <p>Input: Table T with candidate lists fEij g and parameters f ; ; g</p>
      <p>
        Output: Entity assignments fe^ij g
1: Initialize ei0j = argmaxe2Eij edit dist sim(e; tij ) + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ps(ejtij )
2: while t &lt; max iter and any of the entity assignments have changed do
3: scol = m1 1 Pkm=1;k6=i coarse ent sim(e; etkj 1)
      </p>
      <p>
        ( 1 Pn
4: srow = fnent1ity(Ek=i01;;kf6=egj)max(flexical(e; tik); fentity(feg; Eik)) ieflsje=0
scol + srow +
5: sij (e) = edit dist sim(e; tij ) + (1
6: end while
7: Prune candidates based on sij (e)
8: while t &lt; max iter and any of the entity assignments have changed do
9: scol = m1 1 Pkm=1;k6=i ne ent sim(e; etkj 1)
srow = (fnen1t1ityP(fkne=it01;k16=g;jfmega)x(flexical(e; tik); fentity(feg; E^ik)) ieflsje=0
scol + srow + edit dist sim(e; tij ) + (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) ps(ejtij )
) ps(ejtij )
      </p>
      <p>
        Since the exact inference of the above objective is NP-hard, we adopt the
framework of an Iterative Classi cation Algorithm (ICA) [1] for approximate
inference. ICA is an iterative local search method which greedily re-assigns each
cell to the entity that maximises the probability conditioned on the current
entity assignments of other cells. The main assumption behind the design of the
disambiguation model is to characterise: (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) type consistency along each
column of entities, and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) property relatedness within each row of attribute values.
In other words, entities mentioned in the same column should have compatible
types, while entities or values mentioned in the same row (henceforth
describing the same entity) should be related via relational facts and satisfy lexical
constraints. Speci cally, our model includes a coarse-grained phase which tries
to lter out type-incompatible candidates and a ne-grained phase which
selects the best candidate by considering more ne-grained property values. The
pseudo-code of the disambiguation procedure is shown in Algorithm 1, which
can be described as the following four steps:
1. Initialization (line 1): Let eitj be the cell tij 's entity assignment at
iteration t. Initially, the entity assignments for all cells are independently set by
maximising local scores for each speci c cell (line 1). The score is a weighted
combination of the string similarity between the cell text and the title of the
candidate entity (edit dist sim(e; tij )8) and a prior score ps(ejtij ). The prior
1
score ps(ejtij ) is calculated as ps(ejtij ) = ranke , where ranke is the ranking
index (starting at 1) of the entity e in its candidate list Eij .
8 Implemented using the Levenshtein.ratio function in Python
2. Coarse-grained phase (lines 2-6): During the coarse phase, the candidate
entity's score sij (e) is a weighted combination of column support score scol,
row support score srow, string similarity edit dist sim(e; tij ) and prior score
ps(ejtij ). The column score scol is calculated by averaging the entity
similarity between the current candidate entity and each of the remaining cells'
entity assignments in the same column of the previous iteration (etkj 1). Speci
cally, we represent each entity as a sparse feature vector where each property
and the value of instance of (P31) / subclass of (P279) properties serve
as one feature dimension. Our basic assumption is that the properties of an
entity are also a proxy of its type besides the explicit types annotation in
the KB. The coarse ent sim( ; ) function is implemented by calculating the
cosine similarity of the above sparse feature vectors. Obviously, the features
are not equally important. We adopt a dynamic method to generate feature
weights by considering how the feature is shared along the column and how
discriminative it is for disambiguating the current cell. We use something
similar to TF-IDF weighing: the term fraction of a feature f in a column j
denoted by TFj (f ) is de ned as
      </p>
      <p>TFj (f ) = jfeitj 1jf 2 eitj 1; 1
m
i
mgj
;
which is the fraction of entities in the column of last time step consisting of
this feature. To avoid the noise of irrelevant features, we set TFj (f ) = 0 if
it is lower than 0.5. The Inverse Document Frequency (IDF) of a feature f
over one cell Tij is de ned as</p>
      <p>IDFij (f ) = log</p>
      <p>jEij j + 1
jfejf 2 e; e 2 Eij gj + 1
+ 1;
essentially treating each candidate as a document and measures the IDF over
it. Here we adopt a smoothed version of IDF to avoid zero-divisions and zero
weights. Finally, a feature over a cell Tij denoted by fij is de ned as
fij = TFj (f ) IDFij (f ):
Similar TF-IDF formulations have been used successfully in previous SemTab
participants (e.g., the Tabularisi system [7] at SemTab 2019 calculating the
ranking score). We adapt this formulation for the ICA framework to
calculate pairwise entity similarities by implementing a smoothed version of IDF
and prune features with low support to mitigate the noise.</p>
      <p>
        The row score srow is calculated by extracting the property features at both
lexical and entity level. This feature characterises the property relatedness
between current candidate entity and the remaining cells in the same row.
Speci cally, for each cell if it lies in the main column of the table, we will
calculate the support score from each remaining cell in the same row.
Otherwise, we only consider the support score from the cell in the main column.
Given the property distribution from the property linker, the support score
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(fentity( ; ) or flexical( ; )) is calculated by rst retrieving the possible
properties between the current candidate entity and the remaining cells followed
by getting the largest con dence in the corresponding property distribution.
3. Pruning (line 7): We reduce the search space at the current stage before
more ne-grained processing. For each entity we look at the candidates sorted
by their nal scores. If the di erence between the nal scores of the top-2
entities is above a threshold min di , then we only keep the top-1 candidate.
Otherwise, we only keep the top-K candidates plus candidates whose nal
score is above a certain threshold (min abs).
4. Fine-grained phase (lines 8-12): For some highly ambiguous cases, we need
to compare the speci c values of a certain property instead of looking at only
the appearance of the property elds. For example, for a column of Canadian
cities such as [\Kingston", \Montreal"], the system could know that these
are cities after the coarse-grained step, but there exist multiple cities named
\Kingston". We still have to make a choice between Kingston in Jamaica
and Kingston in Canada. In such cases, we have to further consider the
speci c values of certain key properties, such as Country = Canada. In this
ne-grained phase we extend the sparse features for calculating entity
similarity from all properties to all property values.
2.2
      </p>
      <sec id="sec-2-1">
        <title>Property linker</title>
        <p>For the property linking algorithm, we use the approach presented in the
technical report [3]. For every relational column, we start from the strings in the
cells and try to generate candidates as described in the previous section for the
coarse-grained phase. When the search does not return satisfactory results (for
example, none of the strings in the column can be matched to an entity), we
usually encounter numerical properties which contain numbers or dates and we
treat them as special columns.</p>
        <p>For columns where we can identify KB entities, we try to nd direct matches
or matches within a given edit distance with the property values of the entities
in the main column. For numerical properties, we try to nd direct matches
within unit conversion. Once we have a set of matches, each row votes to nd
a rst most-likely property. If we do not reach a certain threshold, or the
difference between the top choices is too small, we use a second re nement phase
that is more computationally expensive. For numerical properties we have
precomputed a set of characteristic statistics per type (for example, human heights
have a certain range, mean and standard deviation). For each given type that
can suitably describe the main column, we check which of the pre-computed
statistics are best matches for the numerical column that we could not identify.
For the SemTab dataset we found that just looking at ranges su ces.</p>
        <p>A common issue we encounter for Wikidata is that the entities do not have
complete information, i.e. some properties could be missing. For columns where
we can identify KB entities, we extend the ranking score by considering the
properties of similar entities. If several rows voted for a given property and
a given row does not have that property, we want to know if that property is
missing or not applicable. We extend binary scoring, a given property present for
a given entity, to a new score in (0; 1) that takes into account how many similar
entities do or do not contain the relevant property. We de ne the most similar
entities of a given entity as the set of nearest neighbours (in cosine distance)
that share the same type with the given entity in the BigGraph space [4].
2.3</p>
      </sec>
      <sec id="sec-2-2">
        <title>Type inference</title>
        <p>Our type inference algorithm is a heuristic multi-pass sieve method that is fully
dependent on the entity linking results. To predict the type of column j, we
rst acquire the entity linking results Ej = fe^ijj1 i mg from the entity
linker. Then we retrieve the entity types T (e) for each entity e 2 Ej, where we
de ne T (e) as the set of all types satisfying the SPARQL expression ?entity
wdt:P31/wdt:P279?/wdt:P279? ?types., treating the values of instance of
(P31) and subclass of (P279) as the types for each entity. Then the goal is to
nd the most common types shared by most of the entities. To do so, we de ne
the rst criterion named SupportCount(t):</p>
        <p>
          SupportCount(t) = jfeje 2 Ej; t 2 T (e)gj:
(
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
We select the type with maximum SupportCount(t), but multiple types may
have the same count. In that case, we want to prioritise the most speci c one.
We design a second criterion named AverageLevel(t) based on the type ontology
to characterise the speci city of a type t:
        </p>
        <p>
          AverageLevel(t) = AVG(fhje is instance of t via a h length path; e 2 Ejg) (
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
Since lower distance with respect to the entity nodes indicates a more speci c
type, we select the type with the minimum AverageLevel(t) to break the above
ties. However, this method does not guarantee uniqueness. In practice, we found
the following design works well on the SemTab data for tie-breaking. For the
main column, we select the type with minimum Population(t) on Wikidata,
where
        </p>
        <p>Population(t) = jfejt 2 T (e); e 2 Egj:
For relation columns, we select the type with the minimum InstanceRank.</p>
        <p>
          InstanceRank(t) = AVG(frje is instance of t at r rank ; e 2 Ejg);
where rank means the position of the type t among the statement group of the
instance of property.
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(8)
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Setup and Results</title>
      <p>Accessing the online SPARQL endpoint is very slow given the large amount of
data, so we use an o ine Wikidata dump (20200525). Our experimental pipeline
starts by calling the MediaWiki API which usually takes 2-3 days for each Round.
After we generate the entity candidates, we cache the results and extract the
relevant subset of the Wikidata dump. Our multi-threaded Python pipeline takes
at most 20-30 minutes for each Round on a Intel(R) Xeon(R) CPU E7-4860 v2
(4 processors) machine. As we do not train the hyper-parameters, we empirically
set to be 0.20, to be 0.50, to be 0.1, min di to be 0.30, min abs to be
0.50 and K to be 2.
3.1</p>
      <sec id="sec-3-1">
        <title>Results</title>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Discussion</title>
      <sec id="sec-4-1">
        <title>Synthetic data vs. Real data</title>
        <p>The evaluation datasets for the SemTab challenge use synthetic data that is
automatically generated from the knowledge base. Although in the generation
process various re nement strategies have been adopted to simulate real data,
we argue that there is still a signi cant gap.
9 https://www.cs.ox.ac.uk/isg/challenges/sem-tab/2020/results.html
{ In real data a table expresses the intent of its creator, while synthetic data
is generated through a random combination of type compatible entities,
{ The spelling errors introduced are not necessarily representative of the errors
that a table creator might produce,
{ Real data might contain much more entities, types, and relations outside the
speci ed knowledge base, making them more challenging than data
synthesised from the knowledge base.</p>
        <p>The currently available datasets curated from real-world data are either in
small scale [5, 6] or with huge noise as the data is automatically extracted from
Wikipedia [2]. In order to make progress in this eld, better datasets need to
be curated and carefully annotated to compliment the synthetic SemTab data
produced in the current way.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Type ontology in Wikidata</title>
        <p>The type ontology in Wikidata is noisy, as we can see from the example in Fig 2.
Under an ontology with such complex sub-structures, it is hard to determine
the speci city of a certain type. In order to de ne the CTA task more clearly
and more fairly on the Wikidata ontology, further cleaning is required (either
manually or automatically) to reach a more reliable structure such as the one
curated for DBpedia.</p>
      </sec>
      <sec id="sec-4-3">
        <title>Challenge design</title>
        <p>Finally, we would like to suggest to split the dataset into a development set and
a test set. The test set should be used for nal evaluation, while the development
set should be released for model design and tuning. This way participants can
try to improve their systems without having to make multiple submissions.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusion</title>
      <p>In this paper, we present LinkingPark, our system for SemTab 2020. Our pipeline
with multiple components is an integrated approach for semantic table
interpretation. Results on SemTab 2020 demonstrate the e ectiveness of our approach
for all three tasks. We hope that some parts of our solutions as well as the
observations and insights we gathered during the challenge will be bene cial for
future research e orts towards better understanding of tabular data.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bhagavatula</surname>
            ,
            <given-names>C.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Noraset</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Tabel: entity linking in web tables</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>425</volume>
          {
          <fpage>441</fpage>
          . Springer (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Efthymiou</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassanzadeh</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rodriguez-Muro</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Christophides</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>Matching web tables with knowledge base entities: from entity lookups to entity embeddings</article-title>
          .
          <source>In: International Semantic Web Conference</source>
          . pp.
          <volume>260</volume>
          {
          <fpage>277</fpage>
          . Springer (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Karaoglu</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Negreanu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fabian</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gordon</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.Y.</given-names>
          </string-name>
          :
          <article-title>Wiki2row - the in's and out's or row suggestion with a large scale knowledge base</article-title>
          .
          <source>Tech. Rep. MSR-TR-2020-37</source>
          , Microsoft (
          <year>October 2020</year>
          ), https://www.microsoft.com/en-us/research/publication/wiki2row
          <article-title>-theins-and-outs-or-row-suggestion-with-a-large-scale-knowledge-base/</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Lerer</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lacroix</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wehrstedt</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bose</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peysakhovich</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>PyTorch-BigGraph: A Large-scale Graph Embedding System</article-title>
          .
          <source>In: Proceedings of the 2nd SysML Conference</source>
          . Palo Alto, CA, USA (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Limaye</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sarawagi</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Annotating and searching web tables using entities, types and relationships</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>3</volume>
          (
          <issue>1- 2</issue>
          ),
          <volume>1338</volume>
          {
          <fpage>1347</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Matching html tables to dbpedia</article-title>
          .
          <source>In: Proceedings of the 5th International Conference on Web Intelligence, Mining and Semantics</source>
          . pp.
          <volume>1</volume>
          {
          <issue>6</issue>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Thawani</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zafar</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Divvala</surname>
            ,
            <given-names>N.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Singh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qasemi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pujara</surname>
          </string-name>
          , J.:
          <article-title>Entity linking to knowledge graphs to infer column types and properties</article-title>
          . In: SemTab@ISWC (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>