<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Increasing Quality of Austrian Open Data by Linking them to Linked Data Sources: Lessons Learned?</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tomas Knap</string-name>
          <email>knap@ksi.mff.cuni.cz</email>
          <email>t.knap@semantic-web.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Charles University in Prague, Faculty of Mathematics and Physics Malostranske nam.</institution>
          <addr-line>25, 118 00 Praha 1</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Semantic Web Company Mariahilfer Stra e 70 / 8 A - 1070 Vienna</institution>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <fpage>52</fpage>
      <lpage>61</lpage>
      <abstract>
        <p>One of the goals of the ADEQUATe project is to improve the quality of the (tabular) open data being published at two Austrian open data portals by leveraging these tabular data to Linked Data, i. e., (1) classifying columns using Linked Data vocabularies, (2) linking cell values against Linked Data entities, and (3) discovering relations in the data by searching for evidences of such relations among Linked Data sources. Integrating data at Austrian data portals with existing Linked (Open) Data sources allows to, e. g., increase data completeness and reveal discrepancies in the data. In this paper, we describe lessons learned from using TableMiner+, an algorithm for (semi)automatic leveraging of tabular data to Linked Data. In particular, we evaluate TableMiner+'s ability to (1) classify columns of the tabular data and (2) link (disambiguate) cell values against Linked Data entities in Freebase. The lessons learned described in this paper are relevant not only for the goals of the ADEQUATe project, but also for other data publishers and wranglers who need to increase quality of open data by (semi)automatically interlinking them to Linked (Open) Data entities.</p>
      </abstract>
      <kwd-group>
        <kwd>Open Data</kwd>
        <kwd>Linked Data</kwd>
        <kwd>Data quality</kwd>
        <kwd>Data linking</kwd>
        <kwd>Data integration</kwd>
        <kwd>Entity disambiguation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The advent of Linked Data [1] accelerates the evolution of the Web into an
exponentially growing information space where the unprecedented volume of
data o ers information consumers a level of information integration that has
up to now not been possible. Consumers can now mashup and readily integrate
information for use in a myriad of alternative end uses.
? This work has been supported in part by the Austrian Research Promotion Agency
(FFG) under the project ADEQUATe (grant no. 849982).</p>
      <p>In the recent days, governmental organizations publish their data as open
data (most typically as CSV les). To fully exploit the potential of such data,
the publication process should be improved, so that data are published as Linked
Open Data. By leveraging open data to Linked Data, we increase usefulness of
the data by providing global identi ers for things and we enrich the data with
links to external sources.</p>
      <p>To leverage CSV les to Linked Data3, it is necessary to 1) classify CSV
columns based on its content and context against existing knowledge bases 2)
assign RDF terms (HTTP URLs, blank nodes and literals) to the particular
cell values according to Linked Data principles (HTTP URL identi ers may be
reused from one of the existing knowledge bases), 3) discover relations between
columns based on the evidence for the relations in the existing knowledge bases,
and 4) convert CSV data to RDF data properly using data types, language tags,
well-known Linked Data vocabularies, etc.</p>
      <p>To introduce an illustrative example of leveraging CSV les to Linked Data,
if the published CSV le would contain names of the movies in the rst
column and names of the directors of these movies in the second column, the
leveraging of CSV les to Linked Data should automatically 1) classify rst
and second column as containing instances of classes 'Movie' and 'Director', 2)
convert cell values in the movies' and directors' columns to HTTP URL
resources, e. g., instead of using 'Matrix' as the name of the movie, URL http:
//www.freebase.com/m/02116f may be used pointing to Freebase knowledge
base4 and standing for 'Matrix' movie with further attributes of that movie and
links to further resources, and 3) discover relations between columns, such as
relation 'isDirectedBy' between rst and second column5.</p>
      <p>In this paper, we focus on the CSV les available at two Austrian data portals
{ http://www.data.gv.at and http://www.opendataportal.at. The rst one
is the o cial national Austrian data portal, with lots of datasets published by
the Austrian government.</p>
      <p>
        Our goal is not to nd a solution, which automatically leverages tabular data
to Linked Data, as this is really challenging and we are aware of that, but our goal
is to help data wranglers to convert tabular data to Linked Data by suggesting
them (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) concepts classifying the columns and (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) entities the cell values may be
disambiguated to. To realize these steps, we evaluate TableMiner+, an algorithm
for (semi)automatic leveraging of tabular data to Linked Data. By successfully
classifying columns and disambiguating cell values, we immediately increase the
quality of the data along the interlinking quality dimension [6].
      </p>
      <p>The main contributions of this paper are lessons learned from evaluating
TableMiner+ to classify columns and disambiguate cell values in CSV les
obtained from the national Austrian open data portal. In [7], they also evaluate
3 By leveraging the data we mean improving the way how data is published by
converting it from CSV les to Linked Data, with all the bene ts Linked Data provides [1].
4 http://freebase.com
5 The classes 'Movie' and 'Director' and the relation 'isDirectedBy' mentioned above
should be reused from some well know Linked Data vocabulary</p>
    </sec>
    <sec id="sec-2">
      <title>6 http://dbpedia.org 7 http://openrefine.org/</title>
      <p>We decided to use TableMiner+ to leverage CSV data from national Austrian
data portal to Linked Data, because it outperforms similar algorithms, such as
Tabel [4] or the algorithm presented in [3] and is available under an open license.
3</p>
      <sec id="sec-2-1">
        <title>Evaluation</title>
        <p>
          In this section, we describe the evaluation of TableMiner+ algorithm on top of
CSV les obtained from the national Austrian data portal http://data.gv.at.
First we provide basic statistics about the data we use in the evaluation and
then we describe evaluation metrics and results obtained during evaluation of
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) subject column detection, (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) classi cation, and (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ) disambiguation. We do
not evaluate in this paper the process of discovering binary relations among
columns of the input les.
        </p>
        <p>Since the standard distribution of TableMiner+ algorithm8 expects HTML
tables as the input, we extended the algorithm, so that it supports also CSV
les as the input.
3.1</p>
        <sec id="sec-2-1-1">
          <title>Data and Basic Statistics</title>
          <p>We evaluated TableMiner+ on top of 753 les out of 1491 CSV les (50.5%)
obtained from the national Austrian data portal http://data.gv.at. The les
processed were randomly selected from the les having less than 1 MB in size and
having correct non-empty headers for all columns. We processed at most rst
1000 rows from every such le. The processed les had in average 8.46 columns
and 1.47 named entity columns.
3.2</p>
        </sec>
        <sec id="sec-2-1-2">
          <title>Subject Column Detection</title>
          <p>From all the processed les, we selected those for which TableMiner+ algorithm
identi ed more than one named entity column and for those, we evaluated
precision of the subject column detection by comparing the subject column selected
by the TableMiner+ algorithm for the given le and the subject column manually
annotated as being correct by a human annotator9.</p>
          <p>Results In 97.15% of cases, the subject column was properly identi ed by the
TableMiner+ algorithm. There were couple of issues, e. g., considering column
with companies, rather than with projects as the subject column in the CSV
le containing list of projects. In case of statistical data containing couple of
dimensions and measures, every dimension (except of the time dimension) was
considered as a correctly identi ed subject column.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>8 https://github.com/ziqizhang/sti</title>
      <p>9 When talking about a human annotator here and further in the text, we always mean
a person who has at least university master degree and at least basic knowledge of
German language (to understand data within the Austrian portals).</p>
      <sec id="sec-3-1">
        <title>Classi cation</title>
        <p>In TableMiner+ algorithm, candidate concepts classifying certain column are
computed in couple of phases. First, a sample of cells of the processed column
is selected, disambiguated and the concepts of the disambiguated entities vote
for the initial winning concept classifying the column. Further, all cells within
that column are disambiguated, taking into account restrictions given by the
initial winning concept, and, afterwards, all disambiguated cells vote once again for
the concept classifying the column. If the winning concept classifying the column
changes, disambiguation and voting is iterated. Lastly, candidate concepts for
the given column are reexamined in the context of other columns and their
candidate concepts, which may once again lead to the change of the winning concept
suggested by TableMiner+ algorithm for the column. At the end, TableMiner+
algorithm reports the winning concept for every named entity column and also
further candidate concepts, together with their scores (winning concept has the
highest score).</p>
        <p>To evaluate precision of such classi cation, for each processed le and named
entity column, we marked down the candidate concepts for the classi cation
together with the scores computed by TableMiner+ algorithm, sorted by the
descending scores. Then we selected candidate concepts having 5 highest scores
(since more candidate concepts may have the same score, this may include more
than 5 candidate concepts). Afterwards, we selected a random sample of these
selected candidate concepts (containing 100 columns) and let annotators to
annotate for each le and column the classi cations suggested by the TableMiner+
{ annotators marked the suggested column classi cation either with best, good
or wrong labels. Label best means that the candidate concept is the best
concept which may be used in the given situation { it must properly describe the
semantics of the classi ed column and it must be the most speci c concept
as possible as the goal is to prefer the most speci c concepts among all
suitable concepts; for example, instead of the concept location=location, the
concept location=citytown is the preferred concept for the column containing list of
Austrian cities. Label good means that the candidate concept is appropriate (it
properly describes the semantics of the cell values in the column), but it is not
necessarily the most suitable concept. Label wrong means that the candidate
concept is inappropriate, it has a di erent semantics.</p>
        <p>Let us denote #Cols as the number of columns annotated by annotators.
Further, let us de ne function topN (c), which is equal to 1 if the candidate
concept c annotated as best for certain column was also identi ed by TableMiner+
as a concept having up to N -th highest score, N 2 1; 2; 3; 4; 5. If N = 1 and
top1(c) = 1 for certain concept c, it means that the winning concept suggested
by TableMiner+ is the same as the concept annotated as best by the annotators.
Further, let us de ne metric bestN which computes the percentage of columns
in which the candidate concept c annotated as best for certain column was also
identi ed by TableMiner as a concept having N -th highest score at worst; divided
by total number of annotated named entity columns:
bestN = 100</p>
        <p>X topN (c)=#Cols</p>
        <p>c</p>
        <p>So, for example, best1 denotes the percentage of cases (columns) for which the
concept annotated as best is also the winning concept suggested by TableMiner+.</p>
        <p>The formula above does not penalize situations when more candidate concept
share the same score. Since our goal is not to automatically produce Linked Data
or column classi cation from the result of the TableMiner+, but we expect that
user is presented with couple of candidate concepts (s)he veri es/selects from,
it is not important whether (s)he is presented with 5 or 8 concepts, but it is
important to evaluate how often the concept annotated as best is among the
highest scored concepts.</p>
        <p>Results The winning concepts (Freebase topics) discovered by the TableMiner+
algorithm running on top of all 753 les from the portal which were suggested
for at least 20 columns and the number of columns for which these concepts were
suggested as winning concepts are depicted in Table 1.</p>
        <p>Freebase Concept
location/location
music/recording</p>
        <p>music/single
organization/organization
people/person
music/artist
location/statistical region
location/dated location
base/aareas/schema/administrative area
ctional universe/ ctional character
lm/ lm character
business/employer
location/citytown
music/release track</p>
        <p>Number of Columns
478
166
51
48
45
35
34
26
25
25
25
22
22
22</p>
        <p>As we can see, majority of the columns were classi ed with the Freebase
concept location=location. Although this is correct in most cases, typically, there
is a better (more speci c) concept available, such as location=citytown. There
are also concepts, such as music=recording or f ilm=f ilm character, which are
in most cases results of the wrong classi cation due to low evidence for correct
concepts during disambiguation of the sample cells.</p>
        <p>Selected results of the bestN measure are introduced in Table 2. As we can see,
20% of concepts annotated as best were properly suggested by the TableMiner+
algorithm as the winning concepts; 36% of concepts annotated as best for certain
columns were among concepts suggested by TableMiner+ and having highest
or second highest score, etc. In other words, there is 76% probability that the
concept annotated as being best will appear within candidate concepts suggested
by TableMiner+ having 5th highest score at worst.</p>
        <p>Furthermore, in 68% of the analyzed columns, only concepts annotated as
best and good appear among concepts suggested by TableMiner+ and having
3rd highest score at worst.</p>
        <p>
          In 24% of the analyzed columns, all concept candidates suggested by
TableMiner+ were wrongly suggested. The reasons for completely wrong suggested
classi cations are typically two-fold: (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) low disambiguation recall due to low
evidence for the cell values within the Freebase knowledge base or (
          <xref ref-type="bibr" rid="ref2">2</xref>
          ) wrong
disambiguation due to short named entities having various unintended meanings.
        </p>
        <p>We did not evaluated recall of the concept classi cation, as there was always
a suggested concept classifying the column, although the precision could have
been low.</p>
        <p>N bestN (in percentage)
1 20
2 36
3 64
4 74
5 76
For selected concepts from Table 1, we computed precision and recall of the
entities disambiguation. Precision is calculated as the number of distinct entities (cell
values) being correctly linked to Freebase entities divided by the number of all
distinct entities (cell values) linked to Freebase (restricted to the given concept).
Recall is computed as the number of distinct entities being linked to Freebase
divided by number of all distinct entities (restricted to the given concept). To
know which entities were correctly linked to Freebase, we again asked annotators
to annotate, for the columns classi ed with the selected concepts, each distinct
winning disambiguation of the cell value to Freebase entity { annotators could
have marked the winning disambiguated entity either as being correct or wrong.
The disambiguation is correct if the disambiguated entity represents correctly
the semantics of the cell value. Otherwise, it is marked as wrong.
Results In case of location=citytown concept, we analyzed disambiguation of
cities in 16 les, where the concept location=citytown was suggested as the
winning concept by the TableMiner+ algorithm. The precision of the disambiguation
was 95.2%; the recall 88.1%. We also analyzed other 24 les, where there was a
column containing cities and one of the concepts (but not the winning concept)
suggested by TableMiner+ classifying that column was location=citytown
concept with the score being above 1:0. In this case, precision was 99% and recall
99.8%, taking into account more than 500 distinct disambiguated entities. It is
also worth mentioning that TableMiner+ algorithm properly disambiguates and
classi es cell values based on the context of the cell; thus, in case of the column
with the cities, the cell value Grambach is properly classi ed as the city and not
the river.</p>
        <p>We analyzed 23 les where there was a column containing districts of Austria
classi ed with the winning concept location=location. The precision was 38.3%
and recall 100%. The precision is lower because in this case, more than half of the
districts (e. g. Leibnitz, Leoben) were classi ed as cities. The reason why these
columns were classi ed with the rather generic concept location=location and not
with a more appropriate location=administrative division is that some values
within that column were disambiguated to cities and voted for location=citytown,
some were disambiguated correctly to districts and voted for the best concept
location=administrative division and, since both these types of entities also
belong to the concept location=location, this concept was chosen as the winning
one.</p>
        <p>Concept base=aareas=schema=administrative area has high precision 88%
and 100% recall, but there were only 17 distinct districts of Linz processed.</p>
        <p>Concept organization=organization has reasonable precision for columns
holding schools { it links faculties to the proper universities with precision 75%
and recall 81%. For other types of organizations, such as pharmacies, hospitals,
etc., disambiguation does not work properly, because there are no corresponding
entities to be linked in Freebase.</p>
        <p>Disambiguation of people=person concept has very low precision. The reason
for that is that vast majority of people are not in the knowledge base. Also the
precision of the concept business=employer is very low.
4</p>
        <sec id="sec-3-1-1">
          <title>Lessons Learned</title>
          <p>There is a high correlation between precision of the disambiguation and
classi cation, which is caused by the fact that initial candidate concepts for the
classi cation of a column are based on the votes of the disambiguated entities
for the selected sample set of cells.</p>
          <p>If the recall of the disambiguation is low (not much entities are
disambiguated), it does not make sense to classify the column, as it will be in most
cases misleading. In these cases, it is better to report that there is not enough
evidence for the classi cation, rather than trying to classify the column
somehow, because this ends up by suggesting completely irrelevant concepts, which
confuses users.</p>
          <p>Row context used by TableMiner+ algorithm proofed its usefulness in many
situation. For example, it allowed to properly disambiguate commonly named
cities having more than one matching entities in Freebase, i. e., the cities were
properly disambiguated w.r.t. to the countries to which they belong.</p>
          <p>If the cell values to be disambiguated are too short (e. g., abbreviations)
and the precision of the subject column disambiguation, de ning the context for
these abbreviations, is low, it does not make sense to disambiguate these short
cell values as the precision of such disambiguation will be low.</p>
          <p>Classi cation/disambiguation in TableMiner+ has higher precision when the
processed tabular data have subject column, which is further described by other
columns, thus, classi cation/disambiguation may use reasonable row context.
In case of statistical data, which merely involves measurements and dimension
identi ers, the row context is not that bene cial and the precision of the
classication/disambiguation is lower.</p>
          <p>In many cases, the generic knowledge base, such as Freebase, is not su cient
as it does not include all needed information, e. g., it does not include information
about all schools, hospitals, playgrounds, etc., in the country's states/regions/cities.
So apart from generic knowledge bases, such as Freebase, also the focused
knowledge bases should be used. Nevertheless, such focused knowledge bases must be
available or must be constructed upfront.</p>
          <p>TableMiner+ algorithm should use knowledge bases de ning hierarchy of
concepts within the knowledge base, as in many cases, more generic concepts
were denoted as the winning concepts. Using hierarchy of concepts would
improve performance and increase precision of the classi cation/disambiguation
algorithm.
4.1</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Contributions to Data Quality</title>
        <p>Paper [6] provides a survey of Linked Data quality assessment dimensions and
metrics. In this section, we discuss how successful classi cation and
disambiguation conducted by TableMiner+ contribute towards higher quality of the
resulting Linked Data along the quality dimensions introduced in [6].</p>
        <p>Successful classi cation and disambiguation increase number of links to
external (linked) data sources, thus, increase the quality of the data along the
interlinking dimension [6]. By having links to external (linked) data sources, it
is then possible to improve the quality of the data along the following quality
assessment dimensions: 10
{ Completeness : It is possible to increase completeness of the data by
introducing more facts about the entities from other (linked) data sources.
{ Semantic accuracy : It is possible to reveal discrepancies in the data by
comparing the resulting data with the data introduced in external (linked) data
sources.
{ Trustworthiness : It is possible to increasing trustworthiness of the data by
providing further evidence for the data from external (linked) data sources.
{ Interoperability : By reusing existing identi ers in external (linked) data sources,
it is possible to increase interoperability of the data set.
10 The names of the dimensions are taken from [6], where further description of the
dimensions may be found.</p>
        <sec id="sec-3-2-1">
          <title>Conclustions and Next Steps</title>
          <p>We evaluated TableMiner+ algorithm on top of the Austrian open data obtained
from the Austrian national open data portal http://www.data.gv.at.</p>
          <p>We showed that in 76% of cases the concept annotated by humans as being
the best in the given situation appears within the candidate concepts suggested
by TableMiner+ with 5th highest score at worst. This is a promising result,
as our main purpose is to provide to the data wranglers not only the winning
concepts, but also certain number of alternative concepts.</p>
          <p>
            Classi cation and disambiguation had very high precision for concept of cities
(95%+) and reasonable precision for certain other concepts, such as districts,
states, organizations. Nevertheless, for certain columns/cell values, the precision
of the classi cation/disambiguation was rather low, which was caused either by
(
            <xref ref-type="bibr" rid="ref1">1</xref>
            ) missing evidence for the disambiguated cell values in the Freebase knowledge
base or (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) by trying to disambiguate cell values which have various alternative
meanings. We showed that in 24% cases, the analyzed columns had irrelevant
classi cation, which is rather confusing for users and in these cases it would be
better not to produce any classi cation at all.
          </p>
          <p>
            Although the rst results are promising, we plan to experiment further (
            <xref ref-type="bibr" rid="ref1">1</xref>
            )
with di erent knowledge bases, such as WikiData11, and (
            <xref ref-type="bibr" rid="ref2">2</xref>
            ) also plan to improve
TableMiner+ algorithm, so that it behaves, e. g., more conservative in cases of
low evidence for the classi cation/disambiguation.
          </p>
        </sec>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>C.</given-names>
            <surname>Bizer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Heath</surname>
          </string-name>
          , and
          <string-name>
            <given-names>T.</given-names>
            <surname>Berners-Lee</surname>
          </string-name>
          .
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1 {
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>I.</given-names>
            <surname>Ermilov</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Stadler</surname>
          </string-name>
          .
          <article-title>Csv2rdf: User-driven csv to rdf mass conversion framework</article-title>
          .
          <source>Proceedings of the ISEM '13, September 04 - 06</source>
          <year>2013</year>
          , Graz, Austria,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>G.</given-names>
            <surname>Limaye</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sarawagi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Chakrabarti</surname>
          </string-name>
          .
          <article-title>Annotating and searching web tables using entities, types and relationships</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):
          <volume>1338</volume>
          {
          <fpage>1347</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>V.</given-names>
            <surname>Mulwad. TABEL - A Domain</surname>
          </string-name>
          Independent and
          <article-title>Extensible Framework for Inferring the Semantics of Tables</article-title>
          .
          <source>PhD thesis</source>
          , University of Maryland, Baltimore County,
          <year>January 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>F. M.</given-names>
            <surname>Suchanek</surname>
          </string-name>
          , G. Kasneci, and
          <string-name>
            <given-names>G.</given-names>
            <surname>Weikum.</surname>
          </string-name>
          <article-title>Yago: a core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th international conference on World Wide Web, WWW '07</source>
          , pages
          <fpage>697</fpage>
          {
          <fpage>706</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Zaveri</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Maurino</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Pietrobon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lehmann</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Auer</surname>
          </string-name>
          .
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          .
          <article-title>E ective and e cient semantic table interpretation using tableminer+</article-title>
          .
          <source>Semantic Web Journal</source>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>