<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Improving Open Data Usability through Semantics</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Sebastian Neumaier?</string-name>
          <email>sebastian.neumaier@wu.ac.at</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vienna University of Economics and Business</institution>
          ,
          <addr-line>Vienna</addr-line>
          ,
          <country country="AT">Austria</country>
        </aff>
      </contrib-group>
      <fpage>2</fpage>
      <lpage>9</lpage>
      <abstract>
        <p>With the success of Open Data a huge amount of tabular data become available that could potentially be mapped and linked into the Web of (Linked) Data. The use of semantic web technologies would then allow to explore related content and enhanced search functionalities across data portals. However, existing linkage and labeling approaches mainly rely on mappings of textual information to classes or properties in knowledge bases. In this work we outline methods to recover the semantics of tabular Open Data and to identify related content which allows a mapping and automated integration/categorization of Open Data resources and improves the overall usability and quality of Open Data.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        The Open Data movement has become a driver for publicly available data on the Web.
More and more data – from governments, public institutions but also from the private
sector – is made available online and is mainly published in so called Open Data portals.
However, with the increasing number of resources, there are a number of concerns
with regards to the quality of the data sources and the corresponding metadata, which
compromise the searchability, discoverability and usability of resources [
        <xref ref-type="bibr" rid="ref13 ref16 ref6">6, 13, 16</xref>
        ].
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Berners-Lee defines the quality of Linked Open Data by a 5-star rating: (1)
data is available on the web with an open license, (2) available as machine-readable
structured data (e.g., Excel instead of image scan of a table), (3) in a non-proprietary
format (e.g., CSV), (4) the use of URIs to denote things, and (5) linked to other data in
order to provide context. Yet, most of the data published on Open Data portals cannot be
considered as Linked (Open) Data. In fact, the findings in [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] show that current Open
Data sources mainly publish 1- to 3-star data (i.e., openly licensed data, available in
machine-readable and non-proprietary formats): most of the resources in 82 monitored
portals are CSV files (27%, i.e., 3-star data), 12% are Excel tables (2-star), and 10%
are PDF documents (1-star).1
      </p>
      <p>The fact that a considerable large amount of Open Data is available in partially
structured and tabular form motivates a further investigation of the potential of Semantic
Web technologies to integrate and interlink this data, in order to improve the overall
quality and usability, and to bring structure to these resources.
? Supervised by Axel Polleres and Ju¨rgen Umbrich. Further thanks to Josiane Xavier Parreira
for the recent successful cooperation.
1 Note, that these numbers are based on the metadata descriptions of the datasets and therefore
do not include diverging spellings or missing format descriptions (16%).</p>
    </sec>
    <sec id="sec-2">
      <title>Problem Statement</title>
      <p>
        The main idea of the here proposed work is the use of existing Semantic Web
technologies to (partially) improve existing 3-star Open Data found on Open Data portals
to 5-star data (according to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]). In this thesis we focus primarily on tabular data which
is currently the predominant format in Open Data portals.
      </p>
      <p>Achieving this overall objective involves the following sub-problems:
(i) Recovering the semantics of structured/tabular data sources:</p>
      <p>In contrast to highly-structured formats like RDF, tabular data lack a defined
vocabulary, definite schemata, and semantic labels. In particular, Open Data tables
are frequently created manually and the inherent structure can be hard to detect
and understand (e.g., multiple header rows, numerical content, additional
comment lines). We will investigate and propose mapping/labeling techniques tailored
to the Open Data domain which allow us to semantically describe these tables.
(ii) Cross-data portal classification and categorization of data sets:</p>
      <p>
        At the moment most of the (governmental) Open Data portals define and use their
own taxonomies for their data. There is no commonly shared categorization and
vocabulary [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].2 Addressing problem (i) will support the development of a
semantic classification schema for Open Data, in order to enable cross-data portal
search and integration.
(iii) Identifying related and relevant content:
      </p>
      <p>Current Open Data portals do not provide recommendation of related content
(which potentially serve as candidates to automatically integrate and link resources).
We will review existing relatedness measures for tabular data sources and, if
possible, adopt or extend these methods. This will be supported by the outcome of
the previously mentioned description and categorization approaches.
2.1</p>
      <p>
        Limited applicability of existing methods
Existing research in the area of label annotation, exploration of relatedness and linkage
of entities is not fully applicable to typical tables found on Open Data portals:
Relatedness of Tables. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] Das Sarma et al. consider and define the two most
common types of related tables: Entity Complement (i.e., different selection over the same
set of attributes) and Schema Complement (same set of entities for a different and yet
semantically related set of attributes).
      </p>
      <p>
        The authors tackled the problem of finding complementary entities by using
namedentity recognition techniques [
        <xref ref-type="bibr" rid="ref18 ref3">18, 3</xref>
        ]. For instance, given Table 1 (from [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), the
column “Name &amp; Nationality” gets the label Tennis Player assigned, based on the
shared classes of the named-entities in the column. This information is then used to
find columns with (different) entities of the same classes.
2 With DCAT-APP (https://joinup.ec.europa.eu/asset/dcat_application_profile) there
is an existing vocabulary and application profile for the use of metadata keys in Open Data
portals, however, the use of tags and categories remains very heterogeneous.
      </p>
      <p>
        Most of the existing work on relatedness of tables use Web/HTML tables as their
corpus (e.g., tables found on Wikipedia) [
        <xref ref-type="bibr" rid="ref20 ref4">4, 20</xref>
        ]. However, typical in Open Data
portals (e.g., data.gov, data.uk.gov, ...) many data sources exist where such textual
descriptions (such as column headers or cell labels) are missing or cannot be mapped
straightforwardly to known concepts or properties using linguistic approaches,
particularly when tables contain many numerical columns for which we cannot establish a
semantic mapping in such manner.
      </p>
      <p>
        Indeed, a major part of the datasets published in Open Data portals comprise tabular
data containing many numerical columns with missing or non human-readable headers
(organizational identifiers, sensor codes, internal abbreviations for attributes like
“population count”, or geo-coding systems for areas instead of their names, e.g. for districts,
etc.) [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. We verified this observation by inspecting 1200 tables collected from the
European Open Data portal and the Austrian Government Open Data Portal and attempted
to map the header values using the BabelNet service (http://babelnet.org): on
average, half of the columns in CSV files served on these portals contain numerical
values, only around 20% of which the header labels could be mapped with the BabelNet
services to known terms and concepts.3
      </p>
      <p>For instance, Table 2 shows the exemplary content of a CSV file found on data.
gv.at4 and clearly highlights the limitations of existing (entity-recognition-based)
approaches: the content is mainly numerical, the headers are non-descriptive (e.g., the
abbreviation “WHG TOTAL” stands for total number of dwellings), there are hardly any
named-entities in the document, and there exist (non-standardized) comment lines
giving additional information.</p>
      <p>
        Linking of Tables. Connecting CSV data to the Web of Linked Data involves typically
two steps, that is, (i) transforming tabular data to RDF and (ii) mapping, i.e. linking
the columns (which adhere to different arbitrary schemata) and contents (cell values)
of such tabular data sources to existing RDF knowledge bases. While a recent W3C
standard [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] provides a straightforward canonical solution for (i), the mapping step (ii)
though remains difficult.
      </p>
      <p>Mapping involves linking column headers or cell values to either properties or
classes in ontologies or instances in knowledge bases. These techniques work well e.g.
for HTML/Web tables which have rich textual descriptions, but again they are not
applicable to many Open Data CSVs. The large amount of numerical columns require new
techniques in order to semantically label numerical values.
3 We pre-processed the header by splitting the label on underscores and camel-case. We then
consider a header as mapped if we retrieved at least one BabelNet entry.
4 http://www.wien.gv.at/statistik/ogd/vie_404.csv</p>
    </sec>
    <sec id="sec-3">
      <title>Relevancy</title>
      <p>
        We can identify many areas where Open Data is used and valuable, e.g., by governments
to increase transparency and democratic control, or by private companies to encourage
innovative use of their data. However, in the current Open Data landscape we observe
the risk of isolated “data silos” due to missing data integration and decreasing data
quality within the catalogs [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Manually scanning these data silos, and the data itself
respectively, to locate relevant data sources requires substantial amount of time.
      </p>
      <p>Improving the quality of Open Data resources by recommending related content and
providing additional semantical information for tables would allow consumer to find
relevant data for their needs and would support an automated integration and linkage to
other resources. In fact, this is one of the main objectives of the ADEQUATe project,
an Austrian research project in which we are involved.5</p>
      <p>Further, due to missing semantic information current data portals lack complex
search functionalities over datasets (e.g., by types or labels/categories of columns). A
richer semantic description and relatedness measure for tabular Open Data would allow
a search and recommendation engine for Open Data resources which would also provide
search across portals (independently of the underlying portal language). For instance,
such an engine would enable geo-location based search functionalities, e.g., by labeling
a column as “postal code” and mapping the corresponding values to geo-names.</p>
      <p>
        The importance to (semantically) describe CSV data is also recognized by the W3C
in the CSV on the Web Working Group [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The objective of this group was to define
a metadata standard which allows the automatic generation of RDF out of CSV files.
4
      </p>
    </sec>
    <sec id="sec-4">
      <title>Related Work</title>
      <p>
        There exists an extensive body of research to derive semantic labels for attributes in
structured data sources (such as columns in tables) which are used to (i) map the schema
of the data source to ontologies or existing semantic models or (ii) categorize the
content of a data source (e.g., a table talking about politicians). The majority of these
approaches [
        <xref ref-type="bibr" rid="ref1 ref11 ref12 ref14 ref18 ref19 ref5">14, 11, 18, 1, 12, 19, 5</xref>
        ] assume well-formed relational tables, rely on textual
information, such as table headers and string cell values in the data sources, and
apply common, text-based entity linkage techniques for the mapping (see [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ] for a good
survey). Moreover, typical approaches for semantic labeling such as [
        <xref ref-type="bibr" rid="ref1 ref18 ref19">18, 1, 19</xref>
        ] recover
the semantics of Web tables by considering as additional information the, again
textual, “surrounding” (section headers, paragraphs) of the table and leverage a database
of class labels and relationships automatically extracted from the Web.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] the authors define different types of relatedness of tables on the Web and
propose and evaluate their algorithms (cf. section 2). This work is based on previous
work on semantic labeling of (textual) columns [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ].
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>Research Questions &amp; Hypotheses</title>
      <p>The research questions of this PhD proposal directly derive from the problem statement
and can be stated as follows:
5 http://www.adequate.at, aims at improving the data quality of Austrian Open Data.
Q1 How far do we get using existing Semantic Web technologies and what are the
limitations and upcoming challenges?
Existing work on semantic enrichment of tables mainly assume different data sources
and domains; therefore the techniques may not be directly applicable.
Q2 How to assign semantic labels to numerical columns (lacking textual information)?
A semantic labeling of numerical values based on descriptive features and the
distribution of the values is currently not part of existing labeling approaches for tables.
Q3 How to define and measure relatedness of tabular Open Data in a meaningful way?
Current Open Data portals lack recommendation systems, which would
significantly increase the usability of such portals.</p>
      <p>To address these research questions we have to test the following main hypothesis:
H Given a corpus of tabular Open Data resources, the usability can be increased by
semantically analyzing Open Data CSVs, assigning semantic labels to CSV columns,
and ideally generating 5-star linked data.</p>
      <p>This implicitly includes testing the following sub-hypotheses (which relate to research
questions Q1, Q2, and Q3):
H1 A report and analysis of current tabular Open Data resources allows us to select
(and also filter out) existing mapping/linking methods which can be applied in the
later steps of this work.</p>
      <p>H2 The labeling of numerical data sources is currently not addressed in the
literature. However, based on preliminary results of H1, we know that Open Data tables
contain many numerical columns with non human-readable headers. We propose
the construction of a background knowledge graph which can be used for labeling
columns based on the distribution of their values.</p>
      <p>H3 Open Data tables are only partially similar to HTML/Web tables (e.g., found on
Wikipedia). By defining a suitable measure for relatedness of Open Data tables we
will achieve better results for search and recommendation of related content.
Expecting full mappings is unrealistic for many CSVs due to the lack of structure,
however, finding partial mappings of columns will already allow a categorization
and relatedness measure for resources and therefore enable improved search
functionalities.</p>
      <p>Consistent with James A. Hendler’s hypothesis “a little semantics goes a long way”,6
this work intends to significantly increase the usability of Open Data by partially
enriching and relating tabular resources using Semantic Web technologies.
6</p>
    </sec>
    <sec id="sec-6">
      <title>Preliminary Results</title>
      <p>
        Initial results include a large-scale quality assessment and monitoring of (meta)data
published on Open Data portals [
        <xref ref-type="bibr" rid="ref16 ref17">17, 16</xref>
        ] and the profiling and analysis of CSVs
available on these portals [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. A more extensive journal version of this quality assessment
work is to appear [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Regrading the labeling of numerical columns in Open Data
tables, we got a research track paper accepted at this year’s ISWC [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] (see section 7 for
a detailed description). Further, we actively contributed to the W3C’s CSV on the Web
working group and provided an implementation of the recent W3C standard.7
6 http://www.cs.rpi.edu/˜hendler/LittleSemanticsWeb.html
7 https://github.com/sebneu/csvw-parser
      </p>
    </sec>
    <sec id="sec-7">
      <title>Approach</title>
      <p>We propose to approach the problem in 4 steps:
(i) Monitoring and analyzing Open Data:</p>
      <p>
        In order to identify possible improvement/integration strategies, we periodically
measure and assess what actual information is available; for instance, we assess
the distribution of column types (e.g., by XSD data types, date formats, tokens) or
the readability of the headers.
(ii) Evaluate the applicability of existing entity linkage techniques:
To tackle hypothesis H1 - applicable technologies to label and link tabular Open
Data - we review and evaluate existing methods in the literature. Due to the
differing characteristics of Web tables and Open Data tables (cf. section 2), we have
to identify which approaches are applicable to our corpus of data. For instance,
considering Table 2, the column headers and cell values cannot be used for
namedentity recognition systems (as used in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]).
(iii) Labeling and classification of numerical values:
      </p>
      <p>
        Open Data tables typically contain a
large portion of numerical columns dbo:height
and/or non-textual headers; therefore V=[v1,v2,…,vn]
solutions that solely focus on textual a dbo:Person a dbo:Building
“cues” are only partially applicable for V* ⊂ V V^ ⊂ V
omfapcpoinnfigr msuicnhg dhaytaposothuersciess.HA2s, awmeeadnes- … a dbo:BasketballPlayer …
velop a method to find and rank can- V** ⊂ V*
didates of semantic context descriptions dbo:league db:National_…
for a given bag of numerical values [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>For instance, given table 2, we want V*** ⊂ V**
to label column 4 by its corresponding
semantic label, i.e., district code/postal Fig. 1: Hierarchical background knowledge
code, based on similar distributed values found in a background knowledge base.
To this end, we apply a hierarchical clustering over information taken from
DBpedia to build a background knowledge graph of possible “semantic contexts” for
bags of numerical values. For instance, considering Figure 1, we do not only want
to label a bag of numerical values as height, but instead we want to identify that
the values represent the heights of basketball players who played in the NBA, or
that the values represent the heights of buildings. We assign the most likely
contexts by performing a k-nearest neighbor search to rank the most likely candidates
in our knowledge graph.
(iv) Open Data tables recommendation and linkage:</p>
      <p>We verify hypothesis H3 - a relatedness measure for Open Data tables - by
incorporating the results of (ii) and (iii) into a recommendation and linkage/integration
system which allows an automatic enrichment of resources published on Open
Data portals. For instance, considering again table 2, we want to be able to
recommend content which describes data for the same regions, based on the NUTS
identifiers8 in the document, but also based on the distribution of the district codes.
8 http://ec.europa.eu/eurostat/web/nuts/overview</p>
    </sec>
    <sec id="sec-8">
      <title>Evaluation Plan</title>
      <p>
        Most commonly evaluations for semantic label annotation and linking of tables are
based on experiments over a (manually created) gold standard datasets, which is either
the result of a crawl of the Web [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], or a domain specific dataset [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ], e.g. IMDB or
MusicBrainz. Regarding the relatedness of tables, [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] evaluates the experimental results
by manual user ratings.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] we evaluate our labeling of numerical values by cross-validating over a
sample of DBpedia data generated from the most widely used numeric properties and their
associated domain concepts: the evaluation shows that this approach can assign
finegrained semantic labels, when there is enough supporting evidence in the background
knowledge graph. In other cases, our approach can nevertheless assign high level
contexts to the data, which could potentially be used in combination with other approaches
to narrow down the search space of possible labels. Additionally, we tested our approach
“in the wild” on tabular data extracted from Open Data portals and reported valuable
insights and upcoming challenges which we have to tackle in order to successfully label
numerical data from the Open Data domain.
      </p>
      <p>
        As a test-data corpus for our further evaluations serves the set of tables monitored
by our Open Data Portal Watch framework [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ], which currently monitors over 200
data portals worldwide and therefore allows large-scale analyses and evaluations.
9
      </p>
    </sec>
    <sec id="sec-9">
      <title>Reflections</title>
      <p>As we understand and know the challenges of automated integration and linkage, we do
not believe that it is possible to fully map all CSV tables and generate high quality RDF
out of them (e.g., prevented by the absence of appropriate ontologies). However, we
are convinced that partial mappings already will have an high impact on the usability
of Open Data. In fact, current Open Data hold a substantial potential for improvements
by semantic web technologies: resources found on Open Data portals are typically
created and curated by “non-technicians”, e.g., office employees in public administration.
Therefore, the principles for producing linked open data sources are possibly ignored
(or rather not known) even though the data would allow high quality linked/integrated
data. This hypothesis is supported by early results of the ADEQUATe project, in which
we are currently involved. In the course of this project we collected feedback by users
and providers regarding the current state of Open Data: we identified the potential (but
also the demand) for standardization and machine-processable data and a high interest
in the outcome of this work.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Adelfio</surname>
            ,
            <given-names>M.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Samet</surname>
          </string-name>
          , H.:
          <article-title>Schema extraction for tabular data on the web</article-title>
          .
          <source>Proceedings of the VLDB Endowment</source>
          <volume>6</volume>
          (
          <issue>6</issue>
          ),
          <fpage>421</fpage>
          -
          <lpage>432</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked data</article-title>
          ,
          <year>2006</year>
          . http://www.w3.org/DesignIssues/ LinkedData.html (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Bollacker</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Evans</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paritosh</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taylor</surname>
          </string-name>
          , J.:
          <article-title>Freebase: A collaboratively created graph database for structuring human knowledge</article-title>
          .
          <source>In: Proceedings of the 2008 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <fpage>1247</fpage>
          -
          <lpage>1250</lpage>
          . SIGMOD '08,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Das</given-names>
            <surname>Sarma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Fang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            ,
            <surname>Gupta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            ,
            <surname>Halevy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            ,
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            ,
            <surname>Xin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            ,
            <surname>Yu</surname>
          </string-name>
          ,
          <string-name>
            <surname>C.</surname>
          </string-name>
          :
          <article-title>Finding related tables</article-title>
          .
          <source>In: Proceedings of the 2012 ACM SIGMOD International Conference on Management of Data</source>
          . pp.
          <fpage>817</fpage>
          -
          <lpage>828</lpage>
          . ACM (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Ermilov</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>User-driven semantic mapping of tabular data</article-title>
          .
          <source>In: Proceedings of the 9th International Conference on Semantic Systems</source>
          . pp.
          <fpage>105</fpage>
          -
          <lpage>112</lpage>
          . I-SEMANTICS '
          <fpage>13</fpage>
          ,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Kucera</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chlapek</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Necask y´,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Open government data catalogs: Current approaches and quality perspective</article-title>
          . In:
          <article-title>Technology-Enabled Innovation for Democracy, Government</article-title>
          and Governance - Second
          <source>Joint International Conference on Electronic Government and the Information Systems Perspective, and Electronic Democracy</source>
          , EGOVIS/EDEM 2013, Prague, Czech Republic. pp.
          <fpage>152</fpage>
          -
          <lpage>166</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Lopez</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kotoulas</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sbodio</surname>
            ,
            <given-names>M.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stephenson</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gkoulalas-Divanis</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aonghusa</surname>
            ,
            <given-names>P.M.:</given-names>
          </string-name>
          <article-title>Queriocity: A linked data platform for urban information management</article-title>
          .
          <source>In: The Semantic Web - ISWC 2012</source>
          . pp.
          <fpage>148</fpage>
          -
          <lpage>163</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Mitlo¨hner, J.,
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Characteristics of Open Data CSV Files</article-title>
          .
          <source>In: 2nd International Conference on Open and Big Data (August</source>
          <year>2016</year>
          ), invited paper
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Parreira</surname>
            ,
            <given-names>J.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>: Multi-level semantic labelling of numerical values</article-title>
          .
          <source>In: The 15th International Semantic Web Conference. Kobe, Japan (October</source>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Automated quality assessment of metadata across open data portals</article-title>
          .
          <source>ACM Journal of Data and Information Quality (JDIQ)</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ramnandan</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          :
          <article-title>Assigning semantic labels to data sources</article-title>
          .
          <source>In: ESWC 2015</source>
          . pp.
          <fpage>403</fpage>
          -
          <lpage>417</lpage>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Rastan</surname>
          </string-name>
          , R.:
          <article-title>Towards generic framework for tabular data extraction and management in documents</article-title>
          .
          <source>In: Proceedings of the Sixth Workshop on Ph.D. Students in Information and Knowledge Management</source>
          . pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . PIKM '13,
          <string-name>
            <surname>ACM</surname>
          </string-name>
          , New York, NY, USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Reiche</surname>
            ,
            <given-names>K.J.</given-names>
          </string-name>
          , Ho¨fig, E.,
          <string-name>
            <surname>Schieferdecker</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          :
          <article-title>Assessment and Visualization of Metadata Quality for Open Government Data</article-title>
          .
          <source>In: Proceedings of the International Conference for EDemocracy and Open Government, CeDEM14</source>
          ,
          <year>2014</year>
          , Krems, Austria, May
          <volume>21</volume>
          -23,
          <year>2014</year>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Taheriyan</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Knoblock</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szekely</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ambite</surname>
            ,
            <given-names>J.L.</given-names>
          </string-name>
          :
          <article-title>A Scalable Approach to Learn Semantic Models of Structured Sources</article-title>
          .
          <source>In: Proceedings of the 8th IEEE International Conference on Semantic Computing (ICSC</source>
          <year>2014</year>
          )
          <article-title>(</article-title>
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Tandy</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Herman</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kellogg</surname>
          </string-name>
          , G.:
          <article-title>Generating RDF from Tabular Data on the Web (Dec</article-title>
          <year>2015</year>
          ), https://www.w3.org/TR/csv2rdf/, W3C Recommendation
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Quality assessment &amp; evolution of open data portals</article-title>
          .
          <source>In: The International Conference on Open and Big Data</source>
          . pp.
          <fpage>404</fpage>
          -
          <lpage>411</lpage>
          . IEEE, Rome, Italy (
          <year>August 2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Umbrich</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Neumaier</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Towards assessing the quality evolution of open data portals</article-title>
          .
          <source>In: Proceedings of ODQ2015: Open Data Quality: from Theory to Practice Workshop</source>
          , Munich, Germany (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Venetis</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Halevy</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Madhavan</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pasca</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shen</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miao</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Recovering semantics of tables on the web</article-title>
          .
          <source>PVLDB</source>
          <volume>4</volume>
          (
          <issue>9</issue>
          ),
          <fpage>528</fpage>
          -
          <lpage>538</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>K.Q.</given-names>
          </string-name>
          :
          <article-title>Understanding tables on the web</article-title>
          .
          <source>In: Conceptual Modeling - 31st International Conference ER</source>
          <year>2012</year>
          , Florence, Italy,
          <source>October 15-18</source>
          ,
          <year>2012</year>
          . Proceedings. pp.
          <fpage>141</fpage>
          -
          <lpage>155</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Start small, build complete: Effective and efficient semantic table interpretation using tableminer</article-title>
          .
          <source>Semantic Web Journal</source>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Towards efficient and effective semantic table interpretation</article-title>
          .
          <source>In: ISWC 2014, Lecture Notes in Computer Science</source>
          , vol.
          <volume>8796</volume>
          , pp.
          <fpage>487</fpage>
          -
          <lpage>502</lpage>
          . Springer International Publishing (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>