<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Evaluating Approaches for Supervised Semantic Labeling</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Nataliia Rümmele</string-name>
          <email>nataliia.ruemmele@ siemens.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yuriy Tyshetskiy</string-name>
          <email>yuriy.tyshetskiy@ data61.csiro.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alex Collins</string-name>
          <email>alex.collins@ data61.csiro.au</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data61</institution>
          ,
          <addr-line>CSIRO</addr-line>
          ,
          <country country="AU">Australia</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Siemens</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Relational data sources are still one of the most popular ways to store enterprise or Web data, however, the issue with relational schema is the lack of a well-de ned semantic description. A common ontology provides a way to represent the meaning of a relational schema and can facilitate the integration of heterogeneous data sources within a domain. Semantic labeling is achieved by mapping attributes from the data sources to the classes and properties in the ontology. We formulate this problem as a multi-class classication problem where previously labeled data sources are used to learn rules for labeling new data sources. The majority of existing approaches for semantic labeling have focused on data integration challenges such as naming con icts and semantic heterogeneity. In addition, machine learning approaches typically have issues around class imbalance, lack of labeled instances and relative importance of attributes. To address these issues, we develop a new machine learning model with engineered features as well as two deep learning models which do not require extensive feature engineering. We evaluate our new approaches with the state-of-the-art.</p>
      </abstract>
      <kwd-group>
        <kwd>data integration</kwd>
        <kwd>schema matching</kwd>
        <kwd>semantic labeling</kwd>
        <kwd>ontology</kwd>
        <kwd>relational schema</kwd>
        <kwd>bagging</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. INTRODUCTION</title>
      <p>
        An important problem in database research is determining
how to combine multiple data sources that are described by
di erent (heterogeneous) schemata [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. The outcome of such
Work accomplished at Data61, CSIRO.
      </p>
      <p>Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage and that copies bear this notice and the full citation
on the first page. Copyrights for third-party components of this work must be honored.
For all other uses, contact the owner/author(s).</p>
      <p>
        TheWebConf Workshop: Linked Data on the Web (LDOW) 2018, Lyon,
France
c 2018 Copyright held by the owner/author(s).
a process is expected to be a uniform integrated view across
these data sources. Relational data sources are still one of
the most popular ways to store enterprise or Web data [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
However, the relational schema lacks a well-de ned semantic
description. To de ne the semantics of data, we can
introduce an ontology [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]. Now our goal is to map attributes
from relational data sources to classes and properties in an
ontology. We refer to this problem as semantic labeling.
      </p>
      <p>
        Semantic labeling plays an important role in data
integration [
        <xref ref-type="bibr" rid="ref14 ref6">6, 14</xref>
        ], augmenting existing knowledge bases [
        <xref ref-type="bibr" rid="ref17 ref18 ref23 ref9">9, 17,
18, 23</xref>
        ] or mapping relational sources to ontologies [
        <xref ref-type="bibr" rid="ref15 ref22">15, 22</xref>
        ].
Various approaches to automate semantic labeling have been
developed, including DSL [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and T2K [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Typically
automated semantic labeling techniques encounter several
problems. Firstly, there can be naming con icts [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], including
those cases where users represent the same data in di erent
ways. Secondly, semantically di erent attributes might have
syntactically similar content, for example, birth date versus
date of death. Thirdly, there are a considerable number of
attributes which do not have any corresponding property in
the ontology, either by accident or on purpose. The majority
of existing systems focus on the rst two problems, but do
not consider the third problem during evaluation [
        <xref ref-type="bibr" rid="ref14 ref18">18, 14</xref>
        ].
      </p>
      <p>
        To address the challenges of automated semantic labeling,
we formulate this task as a supervised classi cation problem.
A set of semantic labels known to the classi er is speci ed at
training time, e.g., from the provided domain ontology. We
also introduce a special class of attributes, called unknown.
The purpose of the unknown class is to capture attributes
which will not be mapped to the ontology. The training
data for the classi er will thus consist of source attributes
(name and content) and their semantic labels provided by
the user, including the unknown labels. Since manually
assigning labels to attributes is a costly operation, a lack of
training data is a common problem for semantic labeling
systems. Existing systems [
        <xref ref-type="bibr" rid="ref14 ref17 ref23">14, 17, 23</xref>
        ] use knowledge transfer
techniques to overcome this issue. Instead, we introduce a
sampling method similar to bagging for ensemble models [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>The bagging technique allows us to generate multiple
training instances from the user-labeled attributes, thus
overcoming the lack of labeled training data. It also allows us to
overcome the common issue of class imbalance, when some
semantic labels have more support than others among the
attributes. We can achieve this by re-balancing the training
data via preferential bagging from minority class attributes.</p>
      <p>The main contributions of this paper are:
1. We introduce a bagging approach to handle class
imbalance and the lack of training data by drawing
ranstate</p>
      <p>
        Person
worksForceo
dom subsamples from values of an attribute. This
approach can achieve meaningful diversity in the training
data and can increase the number of training instances
for under-represented semantic labels.
2. We address the issue of \unwanted" attributes, i.e.,
attributes which do not get mapped to any element in the
ontology. In cases where we have a su cient amount of
training data, our models can achieve over 80% Mean
Reciprocal Rank (MRR) on two sets of data sources
from our benchmark.
3. We construct a classi cation model DINT with
handengineered semantic labeling features to implement the
above. In addition, we design two deep learning models
CNN and MLP which use very simple features, such as
normalized character frequencies and padded character
sequences extracted from raw values of data attributes.
4. We construct a benchmark with a common evaluation
strategy to compare di erent approaches for
supervised semantic labeling. Our benchmark includes such
models as DINT, CNN, MLP and the state-of-the-art
DSL [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], and 5 sets of data sources from di erent
domains. We show that each approach has its strengths
and shortcomings, and choosing a particular semantic
labeling system depends on the use case. We have
released the implementation of the benchmark under an
open source license 1. This benchmark can be easily
extended to include other models and datasets, and
can be used to choose the most appropriate model for
a given use case.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. PROBLEM</title>
      <p>We illustrate the semantic labeling problem using a simple
domain ontology shown in Fig. 1. Assume we have three
data sources \personal-info", \businessInfo" and \Employees"
(see Tab. 1) whose attributes we choose to label according
to the example ontology (Fig. 1). We de ne a semantic label
as a tuple consisting of a domain class and its property. For
example, attribute name in the source \personal-info" (see
Tab. 1a) is labeled with (Person,name). Note that semantic
labels are xed by the ontology.</p>
      <p>The task of semantic labeling is de ned as automatically
assigning semantic labels to attributes in a data source. In
the case of supervised semantic labeling, we use existing
known semantic labels for data sources to improve the
performance when assigning semantic labels to new sources.</p>
      <sec id="sec-2-1">
        <title>1http://github.com/NICTA/serene-benchmark</title>
        <p>For example, assume we are given sources \personal-info"
and \businessInfo" with the correct semantic labels, the
system should then automatically assign labels to attributes in
the source \Employees".</p>
        <p>To build such a system, we cannot just rely on the names
of the columns. For example, columns name in (1a), ceo
in (1c) and employee in (1b) all refer to the same
property (Person,name). Using just values of the columns is
also problematic. For example, in (1a) acronyms are used
for states, while in (1c) state names are fully written.
Furthermore, values can overlap for semantically heterogeneous
columns like for founded in (1c) and birthDate in (1a).</p>
        <p>
          We can also have attributes that are not mapped to any
property in the ontology. There might be two reasons for
their existence: (1) we are not interested in the content of
an attribute and want to discard it from any future analysis;
(2) we might have overlooked an attribute by not designing
the ontology accurately. We do not di erentiate between
these two cases and mark all such attributes as unknown
class, for example, founded in (1c). The presence of
unknown class makes the task of semantic labeling more
complicated. Establishing approaches to e ciently handle such
attributes is crucial since in many real-world scenarios
relational data sources (either HTML tables [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] or domain
speci c data [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]) contain a considerable number of such
attributes.
        </p>
        <p>Machine learning techniques proved to be e cient in
building predictive models on noisy and messy data. Yet to
apply these techniques we need to represent source attributes
as feature vectors, with semantic labels (classes) attached to
these vectors. In Table 2 we show such representation for the
source Employees. We have explicitly shown only 4 possible
features, for simplicity. For example, mean string length is
the mean length of cell values for an attribute. However,
the actual size of a feature vector can be arbitrary long, and
the process of designing its components is known as feature
engineering. In the next section we will discuss the features
used in the semantic labeling system.
3.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>APPROACHES</title>
      <p>In this section we describe classi ers for the semantic
labeling problem used for evaluation. We also discuss
approaches to the problem of unknown attributes and lack of
training data.</p>
      <p>Once we have a set of labeled data sources, we construct
feature vectors for all attributes in this set and mark them
as representatives of a class corresponding to their semantic
labels. The constructed set of (feature vector, class label)
pairs is then used to train a classi er. We consider several
approaches, which we divided into 3 major groups: DINT,
Deep Learning and the state-of-the-art DSL. Each approach
trains a multi-class classi cation model that produces, at
the prediction stage, a list of class probabilities for an
attribute in a new source. The class with the highest predicted
probability is then assigned to the attribute at the decision
stage.
3.0.1</p>
      <sec id="sec-3-1">
        <title>DINT</title>
        <p>
          In our rst approach DINT (Data INTegrator) we
handengineer 26 features, which include characteristics such as
number of whitespaces and other special characters,
statistics of values in the column (e.g, mean/ max/ min string
length and numeric statistics) and many more. The
complete list of features is available in the open source
benchmark repository 2. One of the important features
characterising information content of an attribute is Shannon's
entropy of the attribute's concatenated rows. Shannon's
entropy (or information entropy [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]) of a string X is
dened as H(X) = Pi pi log2 pi; where pi is the probability
of a character, whose index in character vocabulary is i,
to appear in X, and the summation ranges over all
characters in the vocabulary. To evaluate pi in Shannon's
entropy, we evaluate normalized character frequency
distribution chardist of an attribute, as character counts in
concatenated rows of the attribute, normalized by the total length
of the concatenated rows. The vocabulary of all characters
consists of 100 printable characters (including nn). Finally,
we also add the 100-dimensional vector of pi to the attribute
feature vector.
        </p>
        <p>In addition to the above features, which can be directly
calculated from attribute values, we compute mean cosine
similarity of attribute character distribution with character
distributions of all class instances. This adds as many
additional scalar features to the full attribute feature vector as
there are classes in the training data. In our case we have
as many classes as there are semantic labels.</p>
        <p>
          One can expect that names of the attributes should also
contain useful information to determine their semantic types,
in addition to the information provided by attribute values.
To extract features from attribute names, we compute string
similarity metrics: minimum edit distance, two WordNet
based similarity measures such as JCN [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] and LIN [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ],
and k-nearest neighbors using Needle-Wunsch distance [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
The minimum edit distance between two strings s1 and s2
2serene-benchmark
is the minimum number of edit operations, such as
insertion, deletion, substitution, which are required to transform
one string into another [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. We compute the similarity
between attribute name and all class instances in the training
data. The number of thus extracted features depends on the
number of semantic labels in the training data.
        </p>
        <p>
          We choose to train a Random Forest [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ] (RF) on this set
of features. RF is quite robust on noisy data, works well
even with correlated features, and easily captures complex
nonlinear relationships between features and target.
Additionally, RF classi ers require little hyperparameter tuning,
and hence they usually work straight \out of the box", which
makes them a convenient yet versatile classi er to use.
3.0.2
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>Deep Learning</title>
        <p>
          Deep learning has gained much popularity due to its
tremendous impact in such areas as speech recognition, object
recognition, and machine translation [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. One of the biggest
advantages of deep learning is the ability to process data in
its raw form and to discover the representation needed for
classi cation, assisting with the feature engineering step.
        </p>
        <p>Broadly speaking, deep learning is an overarching term
for arti cial neural networks, where the word \deep" refers
to the depth of the network. At the basic level neural
networks are composed of perceptrons, or neural nodes. There
can be several layers of interconnected neural nodes; The
rst layer is the input layer while the last one is the output
layer. The layers in between these two are called hidden.
Neural nodes in each layer take as input the output of the
nodes from the previous layer, perform some computation
with a nonlinear activation function (e.g., tanh or RELU)
and pass the result to the next layer. There are generally no
connections between nodes in the same layer. Overall, deep
learning models improve in their performance the more data
they are trained on. The exact architecture of deep learning
models, i.e., number of layers, number of nodes in each layer,
activation functions of neurons and interconnectedness
between layers, all in uence the performance of the trained
models.</p>
        <p>
          We choose two di erent architectures for our deep
learning classi ers: (i) Multi-Layer Perceptron (MLP ) [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] and
(ii) Convolutional Neural Network (CNN ) [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]. We have
experimented with di erent designs of the MLP and CNN
networks, varying their hyperparameters that control the
number of hidden layers, the numbers of nodes/ lters per layer,
dropout probability, etc., and found that the designs,
described brie y below, work well for all the datasets in the
benchmark.
        </p>
        <p>The input layer of the MLP architecture takes the
101dimensional feature vector of character frequencies pi (chardist)
and Shannon entropy. Following the input layer, MLP has
3 fully connected hidden layers with 100 nodes per layer,
with tanh activations. After the 1st hidden layer, we
introduced a stochastic dropout layer with dropout probability of
0.5, to prevent over tting. Finally, the output layer of MLP
(the actual classi er) is a softmax layer with the number of
nodes equal to the number of semantic types (including the
`unknown' type).</p>
        <p>The CNN model takes as input the one-hot representation
of an attribute's concatenated rows in character space, then
embeds it to a dense 64-bit embedding, then passes this
embedded "image" of the attribute through two consecutive
1d convolution layers with 100 lters per layers, followed by
a 1-d max-pooling layer, a attening layer, a dropout layer
with probability of dropout 0.5, then a fully connected layer
with 100 nodes, and nally a fully connected softmax output
layer (the classi er) with the number of nodes equal to the
number of semantic types (including the `unknown' type).</p>
        <p>
          Though we cannot be sure that our nal choice for the
architectures is optimal, it seems to be a good trade-o
between complexity of the models, required computational
resources for their training, and their overall performance in
semantic labeling task. We have implemented both
models using Keras library with GPU-based TensorFlow
backend [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
3.0.3
        </p>
        <p>DSL</p>
        <p>
          The Domain-independent Semantic Labeler (DSL) has been
proposed by Pham et al [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ], where 6 feature groups based on
similarity metrics are constructed. These metrics measure
how attribute names and values are similar to the
characteristics of other attributes. This means that given 5 attributes
in the training data (i.e., already labeled instances) with
distinct semantic labels, a new attribute will be compared to
representatives of each semantic label and 30 features will
be calculated in total. The considered similarity metrics
are: attribute name similarity, standard Jaccard similarity
for textual data and a modi ed version for numerical data,
TF-IDF cosine similarity, distribution and histogram
similarity.
        </p>
        <p>Instead of building one multi-class classi er, the authors
train binary classi ers separately for each semantic label. A
binary classi er for a particular semantic label is a Logistic
Regression model trained on a set of similarity metrics with
representatives of this label. When predicting semantic
labels for a new attribute, they combine the predictions of each
classi er to produce the nal vector of probabilities. One of
the distinctive properties of this approach is the ability to
transfer the classi cation model trained in one domain to
predicting semantic labels for attributes in another domain.
We denote this enhanced approach as DSL+.
3.1</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Bagging</title>
      <p>
        To train a classi er for semantic labeling, we need data
sources to have many labeled attributes. However, the costly
operation of manually assigning labels to attributes, and the
relative small number of columns compared to data set size,
implies that lack of training data is a common problem for
semantic labeling systems. Existing systems [
        <xref ref-type="bibr" rid="ref14 ref17 ref23">14, 17, 23</xref>
        ] use
knowledge transfer techniques to overcome this issue. We
introduce a method for increasing training sample size based
on a machine learning approach known as bagging [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
      <p>
        Breiman [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] introduced the concept of bootstrap
aggregating, also known as bagging, to construct ensembles of
models to improve prediction accuracy. The method consists
in training di erent classi ers with bootstrapped replicas of
the original dataset. Hence, diversity is obtained with the
resampling procedure by the usage of di erent data subsets.
At the prediction stage each individual classi er estimates
an unknown instance, and a majority or weighted vote is
used to infer the class.
      </p>
      <p>We modify the idea of bagging for our problem. It is
clear that the semantics of columns in the table \Employees"
(Table 1b) will not change whether we have 3 or 1000 rows.
So, we can create several training instances for an attribute,
where each instance (called a bag) will contain a random
sample (with replacement) of its content. This procedure
is governed by two parameters numBags and bagSize: the
rst parameter controls how many bags are generated per
each attribute, while the latter indicates how many rows are
sampled per each bag. In such a way we address the issue
of noise by increasing diversity of the training data as well
as the issue of insu cient training data.</p>
      <p>Another common problem encountered in a wide range
of data mining and machine learning initiatives is class
imbalance. Class imbalance occurs when the class instances
in a dataset are not equally represented. In such situation,
building standard machine learning models will lead to poor
results, since they will favor classes with large populations
over the classes with small populations. To address this
issue, we have tried several resampling strategies to equalize
the number of instances per each class.
3.2</p>
    </sec>
    <sec id="sec-5">
      <title>Unknown class</title>
      <p>As mentioned previously, some attributes are not mapped
to any property in the ontology. To handle this issue, we
introduce one more class called unknown. For example,
attributes which get discarded from the integration process can
be marked as unknown. This way we can help the classi er
recognize such attributes in new sources. In addition, there
is another advantage of having the unknown class de ned
explicitly. Consider a new attribute with an unseen
semantic label, that is, a label which is not present in the training
data. Instead of picking the closest match among known
semantic labels, the classi er will mark it as unknown. The
user will then need to validate the attributes that are
classi ed as unknown. This will ensure that the unknown class
consists only of unwanted attributes. We do not introduce
another class to di erentiate between unwanted attributes
se4
t
u
b
i
tr3
t
a
f
o
r2
e
b
m
u1
n
0</p>
      <p>Domain
weather
weapons
museum
soccer
city</p>
    </sec>
    <sec id="sec-6">
      <title>EXPERIMENTS</title>
      <p>We have run all our experiments on a Dell server with
252 GiB of memory, 2 CPUs with 4 cores each, 1 Titan GPU
and 1 GeForce 1080 Ti GPU. The deep learning models have
been optimized for GPUs using Tensor ow. The benchmark
for semantic labeling system is implemented in Python and
is available under an open source license 3.
4.1</p>
    </sec>
    <sec id="sec-7">
      <title>Datasets</title>
      <p>
        We use 5 di erent sets of data sources in our
evaluation, labeled as: museum, city, weather, soccer [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and
weapons [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. Each set of data sources has been
manually mapped to a domain with a speci c set of semantic
labels. Descriptive statistics of each domain set are shown
in Table 3. As we can see, these sets di er substantially.
      </p>
      <sec id="sec-7-1">
        <title>3http://github.com/NICTA/serene-benchmark</title>
        <p>This provides us an opportunity to evaluate how di erent
approaches behave in various scenarios. The museum and
soccer domains are the only domains which have unknown
attributes. The city domain has many semantic labels and
attributes while the museum domain contains more data
sources. The number of instances per each semantic label
varies in these domains.</p>
        <p>To estimate class imbalance within each domain, we plot
the class distribution in Figure 2. The museum domain
has the highest imbalance among classes, the soccer and
weapons domains also have imbalanced classes, whereas the
weather and city domains have equally represented classes.
4.2</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Experimental setting</title>
      <p>
        We establish a common evaluation framework for the
approaches as described in Section 3. As a performance metric
we use Mean Reciprocal Rank (MRR) [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. To derive a
comprehensive estimate of performance within domains, we
implement two cross-validation techniques: leave one out and
repeated holdout.
      </p>
      <p>The leave one out strategy is de ned as using one source
as the testing sample and the rest of sources in the domain
as the training samples. This procedure is repeated as many
times as there are sources in the domain. We calculate MRR
on the testing sample and report the average MRR as the
nal performance metric for each iteration. For example,
for the domain museum we obtain 29 models in total where
each model is trained on a di erent 28 sources, and MRR
is calculated on the prediction outcome for a single source.
This strategy allows us to estimate the performance of the
di erent models given that there are enough instances per
each semantic label.</p>
      <p>In repeated holdout strategy, we randomly sample a ratio
p of sources to place in the training sample and use the
remaining sources for testing sample, and this procedure is
repeated n times. The nal MRR score is an average of MRR
scores in each iteration. We use this technique to simulate
the scenario when there is a shortage of labeled sources. We
set the ratio p = 0:2 and the number of iterations n = 10.
4.3</p>
    </sec>
    <sec id="sec-9">
      <title>Results</title>
      <p>In this section we report the results of our experiments.
In total we evaluate 13 models, and we report run times
required to train the considered models.</p>
      <p>To train MLP and CNN models, we need many training
instances, so we use bagging (presented in Section 3.1) with
parameters numBags=150 and bagSize=100 to increase the
size of the initial training set. We can train the semantic
labeling system DINT with di erent sampling strategies.
In particular, we report results when we apply no
resampling and bagging with parameters bagSize=100 and
numBags=100. We also experiment with various class imbalance
resampling strategies, including resampling to the mean or
maximum of instance counts per class. For brevity and
without loss of generality we report results only for the
resampling to mean strategy denoted as ResampleToMean. By
design DSL and DSL+ use no resampling.</p>
      <p>As mentioned in Section 3.0.1, the DINT model is built
on a set of elaborately engineered features. MLP model, on
the other hand, uses only chardist and entropy. To better
compare the performance of MLP and DINT, we create a
new model DINT base and reduce the number of features
to just chardist and entropy. In addition, we create another
model DINT base+ by using chardist and entropy and add
a feature minimum edit distance. We choose this feature as
feature importance scores produced by the random forest
algorithm rank edit distance higher than the other features
extracted from names.</p>
      <p>Table 4 reports the MRR scores for leave one out
strategy. Surprisingly, models built on just normalized character
distributions of attribute values perform in many cases very
well. Deep learning models MLP and CNN are often
comparable with DINT models, however they come usually at
a higher computational cost. Run times for training each
model are shown in Table 5.</p>
      <p>As we can see, DINT models that use bagging to sample
more training instances achieve the best results in four
domains. Remarkably, these are also the domains with higher
class imbalance and variety among data sources in terms of
number of rows and number of columns. Data sources in the
city domains have the same number of attributes. We have
also discovered that bagging needs to be performed both at
the training and prediction stages to achieve the best
performance. We have observed that this setting makes a
noticeable di erence in domains where the number of rows varies
substantially among data sources. For example, in the
museum domain number of rows ranges from 6 to 85235, and
in the soccer domain the range is from 500 to 9443.</p>
      <p>
        In terms of computation time, the best performing model
DINT all for the museum domain requires a lot of time
for training. The most computationally expensive features
are four di erent edit distances: minimum edit distance,
JCN, LIN and k-nearest neighbors. This suggests that the
DINT model with all possible features does not scale well
with the increasing number of attributes in the training set.
Considering similarity metrics used in other approaches like
DSL and T2K [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], computing TF-IDF and Jaccard's scores
may help resolve this runtime issue for DINT all.
      </p>
      <p>For class imbalance, although the ResampleToMean
strategy improves the performance of DINT models with no
sampling in the domains with the highest class imbalance (i.e.,
museum and soccer), it appears that the ResampleToMean
strategy leads to a decreased performance in the domains
with a less prominent imbalance (i.e., weapons and weather).
This leads us to the idea that a class resampling strategy
needs to be improved.</p>
      <p>One potential strategy may be in combining bagging and
resampling strategies. Instead of xing numBags for all
attributes, the parameter could be changed to be either the
mean or maximum of instance counts per each class. In such
a way we can perform a resampling strategy which does not
produce replicas of the attributes.</p>
      <p>Apart from the city and weapons domains, our newly
designed models have a similar performance to DSL. However,
the computational complexity of these models varies. For
the museum domain DINT base+ has a higher MRR than
DSL, yet DINT base+ needs half the time less for training.
It appears that attributes which contain a mixture of
textual and numeric are a bottleneck for DSL since data sources
in the city and weapons domains have multiple mixed data
columns.</p>
      <p>In cases where there are few labeled instances (repeated
holdout strategy in Table 6), we observe that DSL
performs well, especially DSL+, which leverages labeled
instances from other domains. We should be aware that in
this scenario there are many unseen labels, which makes
MRR ill-de ned. If we compare DINT models in this
scenario, it suggests that bagging is advantageous in situations
when there are few labeled attributes. Overall, enhancing
our DINT model, which uses simple features and bagging,
with DSL+ knowledge transfer capability might result in
a more stable semantic labeling system. Another
enhancement may be to introduce resampling strategies into the DSL
system.</p>
      <p>In addition, we perform experiments for the two domains
museum and soccer, where unmapped attributes cause skewed
class distributions. Here we want to establish how well
different approaches can recognize such attributes. In Tables 7
and 8 we can see that the performance of semantic
labeling systems changes considerably. Both the DSL and DSL+
performance is a ected by their inability to di erentiate
\unwanted" attributes.</p>
      <p>
        When performing bagging on attributes in the training
data, we introduce diversity by drawing many samples of
attribute values. However, we do not apply any perturbation
technique to the names of the attributes and instead use
their exact replicas. In Table 8 we observe that DINT base
performs better than DINT base+ when bagging is used. In
datasets with scarce labeled instances our DINT models tend
to over t the attribute names that are present in the training
data. This suggests that introducing a technique similar to
bagging for column headers might lead to a much better
performance. On the other hand, our results are consistent
with the observations in the work of Ritze et al.[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. Their
results indicate that comparing attribute values is crucial for
this task while attribute names might introduce additional
noise.
      </p>
      <p>Clearly, the performance of our approach DINT varies
depending on the chosen bagging parameters numBags and
bagSize. To explore this dependence, we evaluate the
performance of DINT with only chardist and entropy features
by varying one of the bagging parameters and xing the
other one. We report the results of our evaluation in
Figure 3. Here we do not consider unknown attributes and
choose the repeated holdout strategy to analyze the
behavior of bagging when there is a shortage of training data.
Interestingly, increasing the values of the bagging parameters
does not always lead to an improved performance, though
the computational time required for both the training and
prediction stages increases. The city domain is the most
sensitive to bagging parameters. We assume this is because
the city domain is the only domain with an equal
distribution of semantic labels, equal numbers of columns and rows
across data sources. It appears that in other domains,
bagging makes models more robust towards variance in these
characteristics.
5.</p>
    </sec>
    <sec id="sec-10">
      <title>RELATED WORK</title>
      <p>
        The problem of semantic labeling, as addressed in this
work, can be regarded as the problem of schema matching
in the eld of data integration [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the schema matching
problem we match elements between the source and target
schemata. In our case elements of the source schema are
attributes, and we want to map these attributes to
properties in the ontology. The semantic labeling problem is also
known in literature as attribute-to-property matching [
        <xref ref-type="bibr" rid="ref17 ref18">18,
17</xref>
        ]. Indicating semantic correspondences manually might be
appropriate if only few data sources need to be integrated,
however, it becomes tedious with the growing number of
heterogeneous schemata. Hence, automatic or semi-automatic
approaches for schema matching are being actively
developed.
      </p>
      <p>
        From machine learning perspective, we can categorize these
approaches into unsupervised techniques which compute
various similarity metrics and supervised techniques which build
a multi-class classi cation model. Unsupervised approaches
are used in SemanticTyper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], T2K [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] and its extended
version [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. In all these approaches authors design
similarity metrics for attribute names and attribute values, yet
one substantial di erence is whether additional knowledge
is used in the computation. For example, authors in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]
and [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] leverage contextual information from DBpedia.
      </p>
      <p>
        Among supervised approaches, there are probabilistic
graphical models used in the work of Limaye et al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to
annotate web tables with entities for cell values, types for
attributes and relationships for binary combinations of
attributes. Mulwad et al. [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] extend this approach by
leveraging information from Wikitology Knowledge Base (KB).
The problem with probabilistic graphical models is though
that they do not scale with the number of semantic labels
in the domain. Also, Mulwad et al. as well as Venetis et
al. [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], who used the isA database KB, extract additional
data from knowledge bases to assign a semantic label to an
0:5
attribute. Hence, these approaches are limited to domains
well represented in those knowledge bases. Our approach, on
the other hand, is not domain speci c and allows a model to
be trained on any data. However, we cannot apply a model
learnt on one domain to another, which is possible with the
DSL approach [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ].
      </p>
      <p>
        To the best of our knowledge, DSL introduced by Pham
et al.[
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] is among the top semantic labeling systems. Pham
et al. compare DSL to their previous approach
SemanticTyper [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] and T2K system [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], and achieve higher MRR
scores on a variety of datasets. Therefore, we use DSL as
the state-of-the art model in our benchmark to evaluate our
new approaches.
      </p>
      <p>
        Ritze et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] and Pham et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] mention the problem
of the unknown class. In the rst work the authors discuss
"unwanted" attributes while in the second work the authors
re ect on how to handle "unseen" attributes. In our work we
do not di erentiate between these two cases and show that
we can successfully identify such attributes when su cient
training data is available.
      </p>
    </sec>
    <sec id="sec-11">
      <title>CONCLUSION</title>
      <p>
        In this paper we have studied the problem of supervised
semantic labeling and have conducted experiments to
evaluate how di erent approaches perform at this task. Our main
nding is that our bagging sampling technique can provide
meaningful diversity to our training data to improve
performance. Additionally, this technique can overcome the lack
of labeled attributes in the domain and can increase the
number of instances for under-represented semantic labels.
We nd that given scarce training data, bagging leads to a
noticeable improvement in performance, though the
stateof-the-art system DSL [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] achieves a better precision by
leveraging information about labeled instances from other
domains. However, if we are to consider unwanted attributes
and unseen semantic labels, our new system DINT
demonstrates the best performance. Among the semantic labeling
systems in our benchmark we have observed that the
performance results are highly dependent on the use case.
      </p>
      <p>We have also shown that deep learning models, such as
CNN and MLP, can also be applied to solve this problem.
Though these models do not excel in performance in the
majority of cases, their advantage is the simplicity of features
0 10
30</p>
      <p>50
bagSize</p>
      <p>
        100
(b) xing numBags=50
weather
weapons
soccer
museum
city
extracted from attributes. For example, CNN is built on
raw sequences of attribute values. Surprisingly, we have
discovered that even random forests constructed just on
character distributions of values and entropy of attributes provide
remarkable results in many cases. This supports the
observations in literature that attribute values are crucial for
semantic labeling task [
        <xref ref-type="bibr" rid="ref17 ref18">18, 17</xref>
        ].
      </p>
      <p>
        Future work may involve exploring a combination of
bagging and class imbalance resampling strategies. We have
observed that where the domain data has high imbalance
among representatives of di erent semantic labels,
resampling can lead to an improved performance but a more
sophisticated approach is required in domains which do not
exhibit these characteristics. Another possible direction for
improvement is to introduce an equivalent of bagging for
attribute names. In addition, our experiments indicate that
the performance of systems is often a ected by the variance
in sizes of data sources and how well each semantic label
is represented in the training data. To this end, we
consider including T2KMatch [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] into our benchmark as well
as domain sets from the RODI benchmark [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
7.
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>M.</given-names>
            <surname>Abadi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Barham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Davis</surname>
          </string-name>
          , et al.
          <article-title>Tensor ow: A system for large-scale machine learning</article-title>
          .
          <source>In Proc. of OSDI</source>
          , pages
          <volume>265</volume>
          {
          <fpage>283</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Bellahsene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Bonifati</surname>
          </string-name>
          , and E. Rahm, editors.
          <source>Schema Matching and Mapping. Data-Centric Systems and Applications</source>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Bagging predictors</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>24</volume>
          (
          <issue>2</issue>
          ):
          <volume>123</volume>
          {
          <fpage>140</fpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>L.</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <article-title>Random forests</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>45</volume>
          (
          <issue>1</issue>
          ):5{
          <fpage>32</fpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>N.</given-names>
            <surname>Craswell</surname>
          </string-name>
          .
          <article-title>Mean reciprocal rank</article-title>
          .
          <source>In Encyclopedia of Database Systems</source>
          , pages
          <fpage>1703</fpage>
          {
          <fpage>1703</fpage>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Doan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Y.</given-names>
            <surname>Halevy</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Z. G.</given-names>
            <surname>Ives</surname>
          </string-name>
          .
          <article-title>Principles of Data Integration</article-title>
          . Morgan Kaufmann,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. J.</given-names>
            <surname>Jiang</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Conrath</surname>
          </string-name>
          .
          <article-title>Semantic similarity based on corpus statistics and lexical taxonomy</article-title>
          .
          <source>arXiv preprint cmp-lg/9709008</source>
          ,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          .
          <article-title>Deep learning</article-title>
          .
          <source>Nature</source>
          ,
          <volume>521</volume>
          (
          <issue>7553</issue>
          ):
          <volume>436</volume>
          {
          <fpage>444</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>G.</given-names>
            <surname>Limaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          .
          <article-title>Annotating and searching web tables using entities, types and relationships</article-title>
          .
          <source>Proc. of the VLDB Endowment</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          -2):
          <volume>1338</volume>
          {
          <fpage>1347</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.</given-names>
            <surname>Lin</surname>
          </string-name>
          et al.
          <article-title>An information-theoretic de nition of similarity</article-title>
          .
          <source>In Proc. of ICML</source>
          , volume
          <volume>98</volume>
          , pages
          <fpage>296</fpage>
          {
          <fpage>304</fpage>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>C. D. Manning</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Raghavan</surname>
          </string-name>
          , H. Schutze, et al.
          <article-title>Introduction to information retrieval</article-title>
          , volume
          <volume>1</volume>
          .
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>V.</given-names>
            <surname>Mulwad</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Finin</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Joshi</surname>
          </string-name>
          .
          <article-title>Semantic message passing for generating linked data from tables</article-title>
          .
          <source>In Proc. of ISWC</source>
          , pages
          <volume>363</volume>
          {
          <fpage>378</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>S. B.</given-names>
            <surname>Needleman</surname>
          </string-name>
          and
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Wunsch</surname>
          </string-name>
          .
          <article-title>A general method applicable to the search for similarities in the amino acid sequence of two proteins</article-title>
          .
          <source>Journal of molecular biology</source>
          ,
          <volume>48</volume>
          (
          <issue>3</issue>
          ):
          <volume>443</volume>
          {
          <fpage>453</fpage>
          ,
          <year>1970</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>M.</given-names>
            <surname>Pham</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Alse</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          .
          <article-title>Semantic labeling: a domain-independent approach</article-title>
          .
          <source>In Proc. of ISWC</source>
          , pages
          <volume>446</volume>
          {
          <fpage>462</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pinkel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Binnig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jimenez-Ruiz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Kharlamov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>May</surname>
          </string-name>
          , et al. RODI:
          <article-title>Benchmarking relational-to-ontology mapping generation quality</article-title>
          .
          <source>Semantic Web</source>
          , (Preprint):
          <volume>1</volume>
          {
          <fpage>28</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>S.</given-names>
            <surname>Ramnandan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Mittal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          .
          <article-title>Assigning semantic labels to data sources</article-title>
          .
          <source>In Proc. of ESWC</source>
          , pages
          <volume>403</volume>
          {
          <fpage>417</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Matching web tables to dbpedia - A feature utility study</article-title>
          .
          <source>In Proc. of EDBT</source>
          , pages
          <volume>210</volume>
          {
          <fpage>221</fpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D.</given-names>
            <surname>Ritze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Lehmberg</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          .
          <article-title>Matching HTML tables to DBpedia</article-title>
          .
          <source>In Proc. of WIMS, page 10</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          , and
          <string-name>
            <given-names>R. J.</given-names>
            <surname>Williams</surname>
          </string-name>
          .
          <article-title>Learning internal representations by error propagation</article-title>
          .
          <source>Technical report, DTIC Document</source>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>D.</given-names>
            <surname>Spanos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Stavrou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Mitrou</surname>
          </string-name>
          .
          <article-title>Bringing relational databases into the semantic web: A survey</article-title>
          .
          <source>Semantic Web</source>
          ,
          <volume>3</volume>
          (
          <issue>2</issue>
          ):
          <volume>169</volume>
          {
          <fpage>209</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taheriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Szekely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite</surname>
          </string-name>
          .
          <article-title>Leveraging linked data to discover semantic relations within data sources</article-title>
          .
          <source>In Proc. of ISWC</source>
          , pages
          <volume>549</volume>
          {
          <fpage>565</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>M.</given-names>
            <surname>Taheriyan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. A.</given-names>
            <surname>Knoblock</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P. A.</given-names>
            <surname>Szekely</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Ambite</surname>
          </string-name>
          .
          <article-title>Learning the semantics of structured data sources</article-title>
          .
          <source>J. Web Sem</source>
          .,
          <volume>37</volume>
          :
          <fpage>152</fpage>
          {
          <fpage>169</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>P.</given-names>
            <surname>Venetis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Madhavan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Pasca</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Shen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Wu</surname>
          </string-name>
          , G. Miao, and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wu</surname>
          </string-name>
          .
          <article-title>Recovering semantics of tables on the web</article-title>
          .
          <source>Proc. of the VLDB Endowment</source>
          ,
          <volume>4</volume>
          (
          <issue>9</issue>
          ):
          <volume>528</volume>
          {
          <fpage>538</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>