<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Estimating Missing Temporal Meta-Information using Knowledge-Based-Trust</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yaser Oulabi</string-name>
          <email>yaser@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christian Bizer</string-name>
          <email>chris@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group University of Mannheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A large number of HTML Tables on the Web contain relational data which can be used to augment knowledge bases such as DBpedia, Yago, or Wikidata. A large part of this data is time-dependent, i.e., the correctness of a fact depends on a specific temporal scope. In order to use this data for knowledge base augmentation, we need temporal meta-information. Existing methods rely on timestamps within the table itself or its context as temporal meta-information. Yet, the relationship between these timestamps and data within a table is often unclear. Additionally, timestamps are rather sparse, and there are many web tables for which no timestamps exist. Knowledge-Based-Trust (KBT) uses the overlap with ground-truth to estimate the trustworthiness of a dataset. This paper introduces TimedKBT, which overcomes the dependence on sparse and possibly misinterpreted timestamps by propagating temporal meta-information from a knowledge base to web table data using KBT. It also derives a trust score that estimates the correctness of the data and the assigned temporal meta-information. We evaluate Timed-KBT on the use case of fusing data from a large corpus of web tables for filling missing facts in a knowledge base. Our evaluation shows that Timed-KBT yields an increase in F0.25-Measure of 19.01 % when compared to KBT and 9.44 % when compared to a method that relies solely on timestamps extracted from the table and its context.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Besides free text, information on the Web might be represented in the form of
HTML tables that contain relational data, referred to as web tables [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This
relational data is potentially very useful to extend or validate multi-domain
knowledge bases, such as DBpedia, YAGO, or Wikidata, which are employed
for an increasing number of applications, including natural language processing,
web search, and question answering [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ].
      </p>
      <p>
        Many web tables contain time-dependent data, in which a fact is only valid
given a certain temporal scope. There is large potential and growing interest
in utilizing time-dependent web data [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], e.g. for knowledge base augmentation.
Slot filling is an augmentation task, where missing facts in the knowledge base
are filled [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. To perform slot filling using web table data, we need data fusion
strategies, which determine the value that should be added to the knowledge
base given the set of alternative values found in the web tables [
        <xref ref-type="bibr" rid="ref13 ref18 ref19 ref5 ref9">5, 13, 18, 19, 9</xref>
        ].
For time-dependent data we need strategies that are time-aware [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], i.e., they
can understand the temporal scopes of data during the fusion process.
      </p>
      <p>
        Time-aware fusion strategies require temporal meta-information. We define
temporal meta-information as the overall presence of temporal scopes, which are
time annotations of certain facts or values. Existing works estimate temporal
meta-information by utilizing timestamps [
        <xref ref-type="bibr" rid="ref21 ref9">21, 9</xref>
        ]. Timestamps include all
temporal expressions that exist in a table and its context. They can be extracted from
multiple locations, e.g. from page titles, text around tables, headers of columns
and cells of the table. Fusion strategies that solely make use of timestamps su↵er
from two problems. First, the relationship between timestamps and the data in
the table is often unclear. More than one timestamp can usually be extracted
per table and many of the extracted timestamps likely have no relevance to the
data in the table at all. Secondly, web tables su↵er from timestamp sparsity [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
so that for many tables we are unable to extract any timestamps.
      </p>
      <p>Our task is therefore to estimate missing temporal meta-information using
other sources than the timestamps within a webpage.</p>
      <p>
        This paper introduces Timed-KBT, an approach that estimates missing
temporal meta-information using Knowledge-Based-Trust (KBT) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. KBT estimates
the correctness of data using its overlap with ground-truth, in our case the
knowledge base. It is based on the idea that non-overlapping data shares similar
quality with neighboring overlapping data. This shared quality possibly incorporates
multiple dimensions, e.g. data, extraction and matching quality. Timed-KBT is
based on the assumption that the temporal dimension, i.e the temporal scope,
is one of the qualities shared by neighboring data. The idea is to use the
knowledge base to detect this temporal scope for overlapping values, and propagate
the scope to neighboring non-overlapping values. We further introduce and
evaluate an extension to Timed-KBT, whereby the scopes that can be propagated
are restricted to timestamps present in the table or its context.
      </p>
      <p>We evaluate Timed-KBT on the use case of data fusion using a large corpus
of web tables. As a knowledge base to be augmented we use a subset of Wikidata
that contains facts about countries, cities and athletes. We further extended the
subset with various time-dependent datasets. We find that Timed-KBT is able
to estimate missing temporal meta-information with enough quality to improve
data fusion results. By using timestamps as a restriction for Timed-KBT we are
furthermore able to derive a precision-orientated time-aware fusion method1.</p>
      <p>The next section provides a motivating example and describes the overall use
case at hand. Section 3 frames this research within related work. In Section 4 we
describe our fusion methodology and Timed-KBT itself, while our experimental
setup, is described in Section 5. The results are discussed in Section 6. Section 7
is our conclusion.
1 The methods presented in this paper are implemented as part of the publicly
available T2K Framework: http://dws.informatik.uni-mannheim.de/en/research/T2K.
entity</p>
      <p>Germany
population:
2007
82,266,372
continent:
Europe
non-time-dependent attribute (reference type)
fact
time-dependent attribute (numeric type)
2008 2009 2010
81,902,307 81,776,930</p>
      <p>
        temporal scope
We aim to augment a temporal knowledge base using web table data [
        <xref ref-type="bibr" rid="ref13 ref7">13, 7</xref>
        ].
Figure 1 shows an example of data in a temporal knowledge base. Unlike
snapshotbased knowledge bases, e.g. DBpedia, which try to reflect only the most recent
facts, temporal knowledge bases store time-dependent data as series of timed
facts. We define a timed fact as a fact that is annotated with a temporal scope.
The knowledge base tries to reflect all current and historic facts given a certain
triple. We define a triple as a combination of entity, attribute and fact or, given
a time-dependent attribute, series of facts. An entity is the subject, while an
attribute is a pre-defined property of a certain data type. In this work we deal
with reference, where other entities are referenced, and numeric types.
      </p>
      <p>A slot refers to a missing fact in the knowledge base. In this work we aim
to use web table data for targeted slot filling, where we try to fill specific slots
within a series of facts of an existing time-dependent triple. This means that the
temporal scopes of the slots are previously known and provided as targets for
fusion strategies. Fusion strategies are tasked to fill these target slots by using
matched values. We define matched values as values extracted from web tables
and matched to the knowledge base as described in the following example.</p>
      <p>Figure 2 shows an example web table with three time-dependent columns: one
leader and two population columns. The leader column corresponds to the year
2017, the first population column to 2015, a date not found on the webpage, i.e.
lacking a timestamp, and the second to 1990. The figure shows that timestamps
on the webpage are not explicitly associated with data, not all temporal scopes
are described by timestamps and timestamps unrelated to the data exist. As
such, 1990 is not assumed to be the temporal scope of the 5th column.</p>
      <p>
        Rows in the table are matched to entities in the knowledge base, while
columns are matched to attributes [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Cells are therefore matched to triples,
which for time-dependent attributes corresponds to a series of timed facts. A
second step is required to match cells to specific temporal scopes. More specific
matching is not possible, due to the lack of explicit temporal scope annotations.
      </p>
      <p>As a result, for a given target slot of the attribute population with the target
year 1990, population numbers from the 4th and 5th columns are both taken as
candidate values for fusion. The task of fusion methods is then to assign matched
values to temporal scopes before using them to fill target slots.</p>
      <p>
        Page Title: Country Data 2017
…
The following table provides information about those five countries, including capital, the national day, the current leader and current
population. In comparison we provide population numbers from the year 1990
Country Capital [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] Current leader [98] Population Population 1990
Germany Berlin Angela Merkel 81,41 M 79,43 M
France Paris Emmanuel Macron 66,81 M 58,51 M
United Kingdom London Theresa May 65,14 M 57,25 M
Japan Tokyo Shinzō Abe 127 M 123,5 M
United States Washington, D.C. Donald Trump 321,4 M 249,6 M
National day
3. October 1990
14th July 1790
11th February 660 BCE
4th July 1776
….
      </p>
      <p>© 2014 – FactsFactsFacts.com
Our work is related to three research areas: (1) utilization of ground-truth to
estimate the quality of web data, (2) methods for consolidating time-dependent
web data, and (3) time-aware fusion for web table data.</p>
      <p>
        Utilizing a ground-truth for fusion is an approach employed by many fusion
methods [
        <xref ref-type="bibr" rid="ref11 ref20">20, 11</xref>
        ], including for fusing web table data [
        <xref ref-type="bibr" rid="ref13 ref9">13, 9</xref>
        ]. We base Timed-KBT
on KBT, which was introduced by Dong et al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. In their research the authors
di↵erentiate between factual and extraction errors, which they can because they
make use of multiple extractors. In our work, this is not possible as we have only
one extraction pipeline, but we do consider the temporal dimension.
      </p>
      <p>
        For time-aware consolidation methods for web data, there is a comprehensive
exploration by Dong et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], where the authors describe the task’s requirements
and challenges. They also di↵erentiate between the identification of timestamps
and their explicit mapping as temporal scopes, but they perceive both steps to
be part of the extraction process, while we perceive the mapping, i.e.
TimedKBT, as part of the fusion process. For the identification of timestamps the
authors suggest HeidelTime [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], which is the method we also use in this work.
The authors introduce no approaches for assigning timestamps to actual values
or for generating missing temporal meta-information, which we fulfill through
Timed-KBT.
      </p>
      <p>
        In regards to time-aware fusion, Dong et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] assume that temporal
metainformation is provided, whereas we combine the generation of temporal
metainformation and fusion in the Timed-KBT method. As fusion methods the
authors mention rule-based and learning-based data fusion. In both cases, methods
are specifically geared towards the reconstruction of entities with time-dependent
attributes, e.g. by user-specified constraints and preference rules [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], or by
inference models that specifically consider features of time-dependent data [
        <xref ref-type="bibr" rid="ref10 ref3">3, 10</xref>
        ].
In comparison, with Timed-KBT, we are able to identify temporal scopes for
individual values without comprehensively creating rules or modeling entities.
      </p>
      <p>
        Finally we will introduce two works that provide time-aware fusion for web
table data. The InfoGather+ system [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] augments an input table with attributes
from a large corpus of relational HTML tables using timestamps extracted from
column headers. It handles matching and fusion using a probabilistic graphical
model and introduces the idea of propagating timestamp information between
web tables. In our own previous research [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] we introduce a method that uses
timestamps and a ground-truth to learn weighted models of the relationships
between timestamp locations and attributes. We implement and test this method
in this paper as TT-Weighting. Both time-aware fusion approaches are limited
to the presence of extractable timestamps, while Timed-KBT can generate
temporal scopes for any web table data, even that without timestamps, if it overlaps
with data in a temporal knowledge-base. Additionally, the approaches are not
able to estimate the correctness of timestamps assigned to the data, so that all
timestamps in web tables are assumed to be equally relevant.
4
      </p>
    </sec>
    <sec id="sec-2">
      <title>Methodology</title>
      <p>
        We evaluate the quality of estimated temporal meta-information for the use
case of data fusion. For this, we implement and compare five fusion strategies:
Two baselines strategies, Voting and KBT, a time-aware strategy from previous
research [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], TT-Weighting, and two Timed-KBT-based strategies, TKBT and
TKBT-Restricted. All strategies use one common fusion framework.
4.1
      </p>
      <sec id="sec-2-1">
        <title>Fusion Framework</title>
        <p>The underlying fusion methodology consists of four steps:</p>
        <p>Scoring: All fusion strategies provide scores for each matched candidate
value given the temporal scope of its target slot. The fusion strategies are
essentially scoring functions that influence the fusion process solely by scoring the
individual matched values. Scores are provided from a range 0.0 to 1.0.</p>
        <p>Filtering: Based on their scores, values are filtered using a learned threshold.
The filtering influences a possible precision/recall trade-o↵. Depending on the
quality of the fusion strategy’s score, filtering could lead to a favorable increase
in precision or an unfavorable decrease in recall. We deal with this tradeo↵ by
optimizing thresholds for a weighted F -Measure. We learn thresholds in 0.05
steps from 0.0 to 1.0 and per fusion strategy and class-attribute combination.</p>
        <p>Grouping: For a given target slot, we cluster all unfiltered equal values
into groups. Numeric-type values are considered equal if they have a normalized
similarity of at least 0.98, while reference-type values must refer to the same
entity. If multiple values from the same source, i.e. the same pay-level domain,
are present in one group, only the value with the highest score is kept. We sum
the scores of all values in a group to calculate a group score.</p>
        <p>Selection: For every target slot we select the group with the highest summed
score and extract from the group a value that is chosen as the resulting fused
fact of that target slot. For numeric values, where not all values in a group are
exactly equal, we use the median as the extracted value.
4.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Baseline Strategies</title>
        <p>
          Voting is a common baseline strategy [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], where all matched values are scored
1.0. Given a target slot, the group with the most sources is therefore chosen.
E.g. values from the table in Figure 2, which has only correct data, will share
the same score as values of a table with mostly incorrect data. As all scores
are equal, no thresholding is possible. Additionally Voting is not time-aware, so
that the fused facts for all target slots within the same triple will be the same.
        </p>
        <p>
          KBT is a fusion strategy based on Knowledge-Based-Trust [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. It uses the
correctness of data that overlaps with the knowledge base to estimate a trust
score for the remaining data. It is based on the assumption that neighboring
values share similar correctness. As data within a single web table column has
equal extraction, normalization, matching and potentially factual quality, we
compute KBT scores per web table columns, as shown in the equation below. As
KBT is not time-aware, the fused facts of all target slots within one triple will be
the same. With KBT, the values in the table in Figure 2 will e.g. have a higher
score than a table with mostly incorrect overlapping data, while both population
column will still be used as fusion candidates for any population target slot.
        </p>
        <p>KBT(column) =
# values in column with correct overlap
# values in column with overlap
(1)
4.3</p>
      </sec>
      <sec id="sec-2-3">
        <title>Timed-KBT</title>
        <p>Timed-KBT assigns explicit temporal scopes to web table data by exploiting
its overlap with a temporal knowledge base. It is based on the assumption that
neighboring values, e.g. within one column, share a common temporal scope. The
idea is to use the knowledge base to detect this scope for overlapping values, and
propagate the scope to the neighboring non-overlapping values.</p>
        <p>To generate missing temporal scopes we first find temporal scope t that
maximizes the KBTt score of a column. The KBTt score is computed by only
using values from the knowledge base that are annotated with the given temporal
scope t. We assign t to the to the web table column, while the KBTt score itself
is then used as the fusion score of the matched values. In the table in Figure 2,
Timed-KBT will e.g. be able to assign di↵erent scopes to the population columns
as the temporal scope t that maximizes KBTt will likely di↵er per column.</p>
        <p>KBTt(column) =
# values in column with correct overlap given scope t
# values in column with overlap given scope t
tcolumn = argmax KBTt(column)</p>
        <p>t2 T</p>
        <p>Timed-KBT(column) = KBTtcolumn (column)
We implement two Timed-KBT-based fusion strategies. In the first, TKBT, T
is a set of temporal scopes derived from the knowledge base. In the second,
TKBT-Restricted, we restrict T to temporal scopes that exist as timestamps
(2)
(3)
(4)
extracted from the table and its context. For the second approach, this would
mean in the specific case of the table in Figure 2, that the first population column
cannot be assigned a scope of 2015.</p>
        <p>Neighboring Scope Estimation: As we assign only one temporal scope
to a table column, its values are only used for the fusion of slots with that
assigned scope. Assuming that temporal scopes are years and that facts of certain
attributes do not completely change yearly, it would make sense to allow values
that were assigned one scope, to be used to fuse facts for neighboring scopes.
Given for example Figure 2 and that the first population column was assigned
the scope 2015, its values can be used as candidates for slots with scope 2014,
with an adapted score computed as
estimatedScore =
(neighboringScore</p>
        <p>di↵ ⇥ ↵
0
di↵ &lt;= maxDi↵
di↵ &gt; maxDi↵
(5)
, where neighboringScore equals the assigned score of the column, di↵ equals the
absolute di↵erence between the assigned year and the target year, while maxDi↵
equals the maximum di↵erence allowed. This maximum di↵erence is learned per
class-attribute combination from 0 to 10. We therefore define ↵ to be 0.1 = 1/10.
5</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experimental Setup</title>
      <p>In this section we describe the knowledge base, the web table data and how we
measure fusion performance.
5.1</p>
      <sec id="sec-3-1">
        <title>Knowledge Base</title>
        <p>
          As the knowledge base to be augmented in our evaluation we use a subset of
the temporal knowledge base Wikidata [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. The attributes in the subset were
chosen based on a profiling of the web table corpus to ensure a high overlap with
web table data. For some of the chosen time-dependent attributes, Wikidata
did not contain enough facts for a proper evaluation. We therefore extended the
subset with various datasets that cover time-dependent data2.
        </p>
        <p>Table 1 provides an overview of the classes, entities, attributes and facts in
the knowledge base. Classes are categories of entities and their corresponding
attributes. The table also shows by class from which sources datasets were used
to complement Wikidata. We acquired data from the sources either by manually
written crawlers and extractors, or through data dumps.
5.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Web Table Corpus</title>
        <p>For our experiments we use the Web Data Commons Web Table Corpus from
20153, which was extracted from the July 2015 Common Crawl. The original
2 The resulting knowledge base is publicly available as the
Time-Dependent-Ground</p>
        <p>
          Truth dataset: http://webdatacommons.org/timeddata/
3 http://webdatacommons.org/webtables/#toc2
corpus contains 1.78 billion HTML pages, whereas the web table corpus consists
of 90 million relational HTML tables [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. We use the matching component of the
T2K Framework [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ] to match the corpus to the knowledge base. Columns in
the web tables are matched to attributes, while the rows are matched to entities.
lize. Attributes of datatype reference and numeric are denoted by (R) and (N)
respectively. The column ‘Series’ lists the number of triples of an attribute for
which values from the web tables were matched. We use the term series, because
a match for a time-dependent triple is seen as a candidate for the whole series of
timed facts of that triple. The following column shows how many sources exist
per series on average. The column ‘Overlap’ measures for how many timed facts
in the knowledge base, there were candidate matched values which were equal
to the fact, i.e. correct matches. We have filtered from the corpus all sources
that were used to create the knowledge base (see Table 1). We additionally
excluded all triples with only one matched source from our experiments, because
we consider the fusion of values from one source not to be a proper fusion task.
        </p>
        <p>
          We extracted timestamps using HeidelTime4 [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. Table 3 shows the
proportion of sources that have timestamps in certain locations. Columns ‘before’ and
‘after ’ refer to timestamps found in the context before and after the table
respectively. Column ‘on page’ refers to timestamps found anywhere on the page,
while column ‘page title’ refers to those found in the page title. The following
three columns refer to timestamps extracted from table captions, column
headers and cells of the same row of a value. The final column gives the proportion
of sources for which a timestamp can be extracted from at least one location.
        </p>
        <p>Most timestamps are found in the context of the table, which could mean
that they have no explicit relation to the data in the table. Timestamps extracted
from cells of the same row could similarly describe an unrelated date attribute,
e.g. the ‘National day’ column in Figure 2. Timestamps in table captions and
column headers, which are likely to be the most relevant, are sadly also the least
present. Presence also di↵ers by class: For Country and City we find many in
the column header, while for NFL Athlete we find more in cells of the same row.
5.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Evaluation</title>
        <p>
          To test our fusion methods we make use of the Local-Closed-World-Assumption
(LCWA), where we assume that facts present in the knowledge base are correct
and can be used to determine whether fused facts are correct. The LCWA has
been used and empirically examined by research with a similar task [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>
          We use the F -Measure [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] as our performance metric. The F1-Measure has
also been used for a similar task [
          <xref ref-type="bibr" rid="ref13 ref9">9, 13</xref>
          ]. It has equal weights for both precision
and recall. For the task of slot filling we must ensure the correctness of filled facts,
so that we care primarily about precision. We therefore compute results for F
Measure at of 1.0 and 0.25, where the latter weights precision four times as high
as recall. The choice of also a↵ects the learned filtering thresholds described
in Section 4.1. We measure performance per class-attribute combination.
        </p>
        <p>F = (1 + 2) ⇥ 2 P⇥ rPecriesciiosni o⇥nR+eRcaelclall (6)</p>
        <p>As the knowledge base is used for both, learning and testing, we split the
data four times, each time placing approximately 25 % of the data in the testing
set, and the remainder in the learning set. To replicate the use-case of targeted
slot filling, where some missing slots within a series are to be filled, we split by
series of timed facts, so that some timed facts of a triple are in the testing set,
while the remaining are used for learning. To ensure that the temporal scopes
of removed facts are well distributed, we randomize how each series is split.</p>
        <p>Within this paper we define temporal scopes as years. Nonetheless, it is
possible that in web tables we will also find attributes which are more frequently
updated and would require more fine-grained temporal scopes.
4 https://github.com/HeidelTime/heideltime
In this section we will present and discuss the overall results of the implemented
fusion strategies and discuss the e↵ect of the neighborhood scope estimation.
Table 4 shows the average performance by fusion strategy. We can first of all see
that KBT outperforms Voting by a large margin for both F1 and F0.25.
Additionally KBT has the highest recall for F0.25 and among the highest for F1, which
means that any strategy that outperforms KBT, does so by increasing precision.</p>
        <p>
          For F1 the di↵erence between KBT and TT-Weighting is minimal. For F0.25,
there is an increase in the F-Measure and a larger increase in precision from KBT
to TT-Weighting. This shows that the scores computed by TT-Weighting are
relevant to the fusion precision, but also that they are only e↵ective when a drop in
recall is acceptable. The results could indicate that some timestamps are relevant
to the data in the table and that timestamp locations have certain relationships
with attributes, which is the main assumption behind TT-Weighting [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ].
        </p>
        <p>Both Timed-KBT-based approaches show an increase in performance when
compared to other methods for both F1 and F0.25. TKBT for F1 even has the
highest recall. Through this we can infer that a knowledge base can successfully
be used to generate temporal meta-information for web table data.</p>
        <p>TKBT-Restricted outperforms TKBT for F0.25. While its increase in precision
comes at the cost of recall, the decline happens at a favorable rate. TKBT is
unable to yield a higher precision for F0.25, e.g. by increasing the threshold,
without a performance drop, whereas the precision increase for TKBT-Restricted
is large enough to compensate for the drop in recall. This shows that
timestamps from the tables and their context can be relevant to the data and that
TKBT-Restricted is able to use them e↵ectively.</p>
        <p>Strategies that use timestamps, i.e. TKBT-Restricted and TT-Weighting,
generally speaking come at a large cost to recall. This could show that
timestamps in web tables are too sparse for a high recall fusion strategy.
From Table 5 we can see that incorporating neighborhood estimation into the
Timed-KBT-based strategies had a large positive e↵ect on fusion performance.
The relative increase was more than 40 % for F1 and 20 % for F0.25 for both
strategies. A rather unexpected result is that neighborhood estimation increases
precision in addition to recall. The reason for the increase in precision is likely
that matched values of neighboring temporal scopes with a high score can
outweigh low-scoring, and probably incorrect, values assigned to the target scope.
7</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>In this work we introduced Timed-KBT, an approach that exploits the
temporal meta-information in a knowledge base to generate missing temporal
metainformation for web table data. We test Timed-KBT using a large web table
corpus and an extended subset of Wikidata as a knowledge base for slot filling,
a knowledge base augmentation task that makes use of fusion methods.</p>
      <p>We find that Timed-KBT is able to assign useful explicit temporal scopes
to web table data. We also find that using scores estimated by Timed-KBT
for fusing time-dependent web table data yields a performance increases when
compared to other fusion methods. We then utilized timestamps extracted from
the web tables and their contexts as a restriction for candidate temporal scopes
used by Timed-KBT. This approach yields a higher performance in regards to
precision, and therefore a possibly more favorable performance for knowledge
base augmentation. We conclude that timestamps in the table and its context
are useful for a precision-oriented time-aware fusion strategy. Finally we show
that data with assigned explicit temporal scopes is highly useful for estimating
facts with neighboring temporal scopes.</p>
      <p>Overall we demonstrate that a temporal knowledge base can be used to
estimate missing temporal meta-information for web table data. We also show
that with Timed-KBT, we are able to perform knowledge base augmentation
from web table data for current and historic facts, instead of just for facts limited
to one point in time. Our findings enable the utilization of time-dependent web
data even when that data lacks temporal meta-information.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Alexe</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.C.</given-names>
          </string-name>
          :
          <article-title>Preference-aware integration of temporal data</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>8</volume>
          (
          <issue>4</issue>
          ),
          <fpage>365</fpage>
          -
          <lpage>376</lpage>
          (
          <year>Dec 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Cafarella</surname>
            ,
            <given-names>M.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>D.Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
          </string-name>
          , E.:
          <article-title>Uncovering the relational web</article-title>
          . In: WebDB.
          <string-name>
            <surname>Citeseer</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berti-Equille</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Srivastava</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Truth discovery and copying detection in a dynamic world</article-title>
          .
          <source>Proc. VLDB Endow</source>
          .
          <volume>2</volume>
          (
          <issue>1</issue>
          ),
          <fpage>562</fpage>
          -
          <lpage>573</lpage>
          (
          <year>Aug 2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heitz</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horn</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, W.:
          <article-title>From data fusion to knowledge fusion</article-title>
          .
          <source>Proc. VLDB</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gabrilovich</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Murphy</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dang</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horn</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lugaresi</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Zhang, W.:
          <article-title>Knowledge-based trust: Estimating the trustworthiness of web sources</article-title>
          .
          <source>Proc. VLDB</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dong</surname>
            ,
            <given-names>X.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kementsietsidis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
          </string-name>
          , W.C.
          <article-title>: A time machine for information: Looking back to look forward</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>45</volume>
          (
          <issue>2</issue>
          ),
          <fpage>23</fpage>
          -
          <lpage>32</lpage>
          (
          <year>Sep 2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lehmberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meusel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>A large public corpus of web tables containing time and context metadata</article-title>
          .
          <source>In: Proceedings of the 25th International Conference Companion on World Wide Web</source>
          . pp.
          <fpage>75</fpage>
          -
          <lpage>76</lpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.D.</given-names>
          </string-name>
          , Schu¨tze, H.,
          <string-name>
            <surname>Raghavan</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          : Introduction to information retrieval (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Oulabi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meusel</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Fusing time-dependent web table data</article-title>
          .
          <source>In: Proceedings of the 19th International Workshop on Web and Databases</source>
          . pp.
          <volume>3</volume>
          :
          <fpage>1</fpage>
          -
          <issue>3</issue>
          :
          <fpage>7</fpage>
          . WebDB '16,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Pal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rastogi</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Machanavajjhala</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bohannon</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Information integration over time in unreliable and uncertain environments</article-title>
          .
          <source>In: Proceedings of the 21st International Conference on World Wide Web</source>
          . pp.
          <fpage>789</fpage>
          -
          <lpage>798</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Pasternack</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roth</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Knowing what to believe (when you already know something)</article-title>
          .
          <source>In: Proceedings of the 23rd International Conference on Computational Linguistics</source>
          . pp.
          <fpage>877</fpage>
          -
          <lpage>885</lpage>
          . Association for Computational Linguistics (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Matching html tables to dbpedia</article-title>
          .
          <source>In: WIMS '15</source>
          . p.
          <volume>10</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Ritze</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmberg</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oulabi</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Profiling the potential of web tables for augmenting cross-domain knowledge bases</article-title>
          .
          <source>In: Proceedings of the 25th International Conference on World Wide Web</source>
          . pp.
          <fpage>251</fpage>
          -
          <lpage>261</lpage>
          . WWW '
          <volume>16</volume>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14. Str¨otgen, J.,
          <string-name>
            <surname>Gertz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Heideltime: High quality rule-based extraction and normalization of temporal expressions</article-title>
          .
          <source>In: SemEval '10</source>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15. Str¨otgen, J.,
          <string-name>
            <surname>Gertz</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>A baseline temporal tagger for all languages</article-title>
          .
          <source>In: EMNLP</source>
          <year>2015</year>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Surdeanu</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ji</surname>
          </string-name>
          , H.:
          <article-title>Overview of the english slot filling track at the tac2014 knowledge base population evaluation</article-title>
          .
          <source>In: TAC2014</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17. Vrandeˇci´c, D., Kro¨tzsch, M.:
          <article-title>Wikidata: A free collaborative knowledgebase</article-title>
          .
          <source>Commun. ACM</source>
          <volume>57</volume>
          (
          <issue>10</issue>
          ),
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          (
          <year>Sep 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Yakout</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ganjam</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chaudhuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Infogather:
          <article-title>Entity augmentation and attribute discovery by holistic matching with web tables</article-title>
          .
          <source>In: ACM SIGMOD Conference</source>
          . SIGMOD '
          <volume>12</volume>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          , Han,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>P.S.:</surname>
          </string-name>
          <article-title>Truth discovery with multiple conflicting information providers on the web</article-title>
          .
          <source>IEEE TKDE'08</source>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Yin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Semi-supervised truth discovery</article-title>
          .
          <source>In: Proc. WWW</source>
          '
          <volume>11</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chakrabarti</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Infogather+: Semantic matching and annotation of numeric and time-varying attributes in web tables</article-title>
          .
          <source>In: Proc. of the 2013 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <fpage>145</fpage>
          -
          <lpage>156</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>