<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linked Data Quality</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Conservatoire National des Arts et Metiers</institution>
          ,
          <addr-line>CEDRIC</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>The wides pread of semantic web technologies such as RDF, SPARQL and OWL enables individuals to build their databases on the web, write vocabularies, and de ne rules to arrange and explain the relationships between data according to the Linked Data principles. As a consequence, a large amount of structured and interlinked data is being generated daily. A close examination of the quality of this data could be very critical, especially if important researches and professional decisions depend on it. Several linked data quality metrics have been proposed, and they cover numerous dimensions of linked data quality such as completeness, consistency, conciseness and interlinking. In this work, we are interested in linked data quality dimensions, especially the completeness and conciseness of linked datasets. A set of experiments were conducted on a real-world dataset (DBpedia) to evaluate our proposed approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>LOD</kwd>
        <kwd>Linked Data Quality</kwd>
        <kwd>Completeness</kwd>
        <kwd>Conciseness</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Because a large amount of information is being generated daily, and
information needs to be of high quality to be useful, the need for quality assessment of
this data on the internet is more urgent ever before. On the other hand, Linked
Open Data 1 (LOD) has appeared as a result of the development of semantic
web technologies, such as RDF, SPARQL and OWL. A research of the quality of
information has been successfully applied on the traditional information system,
with the rational databases having a positive impact on the used organizational
processes [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This raises the question of the applicability of this approach in the
context of web of data. Zaveri et al. [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] surveyed 18 di erent linked data quality
dimensions that can be applied to assess the quality of Linked Data. The goal
of this work is to propose approaches that focus on guring whether this
information completely represents the real world and that is logically consistent in
itself. Our objective is not measuring an absolute completeness and conciseness
but rather measuring their aspects.
1.1
      </p>
    </sec>
    <sec id="sec-2">
      <title>Completeness</title>
      <p>1 http://5stardata.info/en/</p>
      <p>
        Completeness is a data quality measure that refers to the degree to which
all required information is present in a particular dataset [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]. We illustrate in
this section the main idea behind our approach through an example that shows
the issues and the di culties encountered in the calculation of a dataset
completeness. Let us consider the set of scientists described in the well-known open
linked dataset, DBpedia. We would like to calculate the completeness of a
scientist description (e.g. Albert Einstein), which will be the proportion of properties
used in the description of this scientist to the total number of properties in
Scientist Schema. In DBpedia, the Scientist 2 class has a list of 4 properties
(e.g. doctoralAdvisor ), but these properties are not the only ones used in the
description of a scientist (e.g. the birthdate property is not present in this list).
Indeed, the Scientist class has a super class called Person. So, the description
of a scientist may also take into account properties of the Scientist class and all
its ancestors.
      </p>
      <p>Scientist Schema = fProperties on Scientist g [
fProperties on Persong [ fProperties on Agent g [
fProperties on Thing g</p>
      <p>such that: Scientist v P erson v Agent v T hing</p>
      <p>However, we can obtain the size of Scientist Schema, which is equal to 664
(ABox properties) in the case of DBpedia with a simple SPARQL query3. Thus, the
completeness of the description of Albert Einstein could be calculated as follows:</p>
      <p>
        Although, the property weapon is in Scientist Schema, but it is not relevant
for the Albert Einstein instance. As a conclusion, we can nally say that the
completeness as calculated here does not provide us with the relevant value regarding
the real representation of scientists in the DBpedia dataset. Hence, we need to
overcome this issue by inventing and exploring to get an idea about how they
are actually described and which properties are used. In [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], the authors propose
a new approach to compute the completeness of instances based on similar ones
in Wikidata. For each instance, they nd the most frequent properties among
instances that have the same type, and nd the percentage of missed
properties to calculate the completeness. This approach sometimes does not work well
when the instance has several values of the property instance of as a class such
as Writer and Player.
2 http://mappings.dbpedia.org/server/ontology/classes/
3 Performed on: http://dbpedia.org/sparql
ent identi ers or names. The eliminating of the synonymously used predicates
aims to optimize the dataset to speed up processing.
      </p>
      <p>
        Our research on conciseness dimension was inspired by the existing Synonym
Analysis for Predicate Expansion [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. However, Abedjan et Naumann proposed a
data-driven synonym discovery algorithm for a predicate expansion by applying
both schema analysis and range content ltering.
      </p>
      <p>Range content ltering aims to represent a transaction as a distinct object
with several predicates. For example, the object Lyon city is connected with
several predicates such as (birthPlace, deathPlace and location). The authors
suppose that synonym predicates have a similar sense. They also share a similar
group of object values. For this reason, the proposed approach nds that the
frequent sets pattern of predicates is dominated by object values.</p>
      <p>Thus, it is not su cient to discover the predicates that are used synonymously
depending on Range Content Filtering alone. For example, the predicates
birthPlace and deathPlace share the signi cant co-occurrences with the same object
values but they are de nitely used di erently. However, the authors have
proposed another lter in order to overcome this problem and to nd the synonym
predicates more correctly. They expect that the synonym predicates should not
co-exist for the same instance. According to schema analysis, transactions of
distinct subjects with several predicates are represented. By applying negative
association rules, the synonym predicates appear in di erent transactions. For
instance, the subject Michael Schumacher does not have two synonymously used
predicates such as born and birthPlace in the same dataset.</p>
      <p>Now, our objective is to discover synonym predicates by applying the
proposed approach. We clarify its drawbacks through applying the next example
(see Table 1), and we would like to apply the previous approaches on a sample
of facts from DBpedia to discover the synonym predicates.</p>
      <p>
        Based on range content ltering, all predicates will be gathered into groups
by each distinct object. Thus, results can be as illustrated in Table 2 in order to
retrieve the frequent candidates. As a result, we can see that the nationality and
sourceCountry predicates are already in the same transaction. By applying
FPgrowth algorithm [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], frequent itemsets have been mined, thus nationality and
sourceCountry are the consequences. The next step is applying schema analysis
as a subject in a context and we will get the following transactions (see Table
3). We can notice that by applying negative association rules, there is no
cooccurrence between sourceCountry and nationality predicates.
      </p>
      <p>SAAAWWWudddhhhbaaaiiijmmmttteeeectHHHRRRaaaiiivvvdddeeewwwrrriiinnn tbrssnPyitoiarrvapuetteteidrherociPMnceaCalatoleoicutuetynhtry LCCOOGMaboaanoklnnjtofeaeaascPSrddetiulaaJoapayewerrior LCOOGMaboanoklnjtofeaeascPSrdetiulaJoapayewerrior tbrsnPyitiarrvapetteteidheroiPMncaalatoleicutetyh, sourceCountry
Table 1: Facts in SPO structure Table 2: Range Content
Filterfrom DBpedia ing
Subject Predicate
Adam Hadwin type, birthPlace , nationality
White River sourceCountry,riverMouth,state</p>
      <p>Table 3: Schema analysis</p>
      <p>Therefore, the algorithm proposed the nationality and sourceCountry as
synonym predicate pairs, which is not correct because we cannot replace nationality
predicate that is related to Person class as its Domain with sourceCountry
predicate, which is related to Stream class as its Domain.</p>
      <sec id="sec-2-1">
        <title>Relevancy</title>
        <p>
          Nowadays we are witnessing an increase in data accessible on the internet.
There are large amounts of data being generated daily. It plays a crucial role
in companies, organization and individual decisions. This data, although rich in
content, is often incomplete, lacks metadata or even su ers from redundancy. As
our goal is to improve Linked Data quality, the problem is relevant for Linked
Data publishers, contributors and consumers. Users look forward to getting
information with a high quality which means that data is \ tness to use" [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
3
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Related work</title>
        <p>
          Several metrics and tools have been proposed to assess Linked Data and
improve its quality [
          <xref ref-type="bibr" rid="ref12 ref13 ref4">4,13,12</xref>
          ]. Unfortunately, there were obstacles related to the
absence of a clear de nition of the word \Quality" since it has di erent meaning
from a domain to another. However, data quality is commonly conceived to
suite our use so that it has several aspects or dimensions, such as accuracy,
completeness and interlinking. In 2016, Zaveri et al. [
          <xref ref-type="bibr" rid="ref17">17</xref>
          ] identi ed a set of 18
di erent data quality dimensions, each dimension has at least one indicator or
a metric to assess the given dimension. Some of the proposed approaches deal
with one dimension [
          <xref ref-type="bibr" rid="ref15 ref7">7,15</xref>
          ] or several dimensions [
          <xref ref-type="bibr" rid="ref12 ref13">13,12</xref>
          ].
        </p>
        <p>
          Completeness is one of the essential dimensions of data quality, which refers to
the amount of the presented information. Pipino et al. [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] divided completeness
into: Schema completeness that is the degree to which classes and properties are
not missing in a schema, property completeness which is the extent of the missing
property values of a speci c kind of property, and population completeness that
refers to the ratio of objects represented to real-world objects. Since several works
provide metrics for the three completeness classi cations [
          <xref ref-type="bibr" rid="ref13 ref5">13,5</xref>
          ], their de ned
metrics evaluate the completeness by comparing it with a prede ned schema
that could not provide an accurate value of dataset completeness.
        </p>
        <p>
          On the other hand, Mendes et al. [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ] categorized conciseness dimension into
intensional and extensional conciseness. The rst type, which is the intensional
conciseness, measures a number of unique dataset elements to the total number
of schema elements, thus this measurement is represented on the schema level.
In a similar manner but on the instance level, extensional conciseness measures
the number of unique objects to the real number of objects in the dataset. In
the similar sense but with another naming \uniqueness", Fuber et Hepp [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]
dened the elements of representation like classes, properties and objects. Their
de nition suggested uniqueness of breadth at the schema level and uniqueness of
depth at the instance level. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ], the authors proposed an algorithm to discover
the synonym predicates for query expansions. They depended on mining
similar predicates according to their subjects and objects. However, their approach
works well when dealing with a dataset that has a limited number of instances.
        </p>
        <p>Our goal is to enhance the dimensions of linked data quality that do not
have enough metrics (i.e. completeness and conciseness). We aim to propose new
metrics from di erent perspectives such as inferring a reference schema from data
source and using semantic analysis to understand the meaning of predicates.</p>
      </sec>
      <sec id="sec-2-3">
        <title>Research questions</title>
        <p>Completeness calculation requires a reference schema to be compared with.
The gold-standard or prede ned schema does not always represent a good
reference. So, there is a need to explore instances to have a suitable reference schema
(ontology). Also, dataset ontology contains semantic features which represent an
explanation of the meaning of each predicate.</p>
        <p>{ Completeness dimension: Is it possible to calculate completeness values
using inferred schema from data source? How can we assess the completeness
of Linked Data?
{ Conciseness dimension: Can we enhance the conciseness dimension of
liked datasets by analyzing the semantics of predicates?
5</p>
      </sec>
      <sec id="sec-2-4">
        <title>Hypotheses</title>
        <p>Our hypotheses are derived directly from the questions above:</p>
        <p>H1 Exploring instances to get an idea about how they are actually described
and which properties, besides considering the importance of each one, are used.
This provides more suitable schema to use as a reference one in order to calculate
completeness value of a dataset.</p>
        <p>H2 A deep semantic analysis of data, beside to the statistical analysis, can
enhance the conciseness of linked datasets by discovering repeated predicates.
Where the semantic analysis will reduce the false positive results.</p>
      </sec>
      <sec id="sec-2-5">
        <title>Preliminary results</title>
        <p>Completeness assessment</p>
        <p>On the basis of our belief that a suitable schema (e.g. a set of properties)
needs to be inferred from the data source, the experiments were performed on the
well-known real-world datasets, DBpedia, publicly available on the Linked Open
Data (LOD). DBpedia, is a large knowledge base composed of structured
information extracted collaboratively from Wikipedia. It describes currently about
14 million things.</p>
        <p>
          In [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], for evaluating the completeness of di erent versions of DBpedia,
we chose three relatively distant versions. The rst one (v3.6) was generated
in March/April 2013, the second one (v2015-04) in February/March 2015 and
the third one (v2016-10) in October 2016. For each dataset, we have chosen a
couple of classes from di erent natures. We studied the completeness of resources
that have classes as the following ones: C = fFilm, Organisation, Scientist,
PopulatedPlaceg. For the properties used in the resources descriptions, we have
chosen the English datasets "mapping-based properties", "instance types" and
"labels".
        </p>
        <p>The experiments revealed that datasets completeness could increase or
decrease due to changes made to existing data or to the new added data. We also
noticed that often this evolution does not bene t from the initial data cleaning as
the set of properties continue evolving over time. Our approach could be helpful
for data source providers to improve, or at least to keep a certain completeness of
their datasets over di erent versions. It could be particularly useful for datasets
constructed collaboratively by applying some rules for contributors when they
update or add new resources.
6.2</p>
        <p>The automatic generation of RDF knowledge bases might lead to several
semantic and syntactic errors in addition to incomplete metadata.</p>
        <p>
          Because publisher commonly do not respect the semantics including misuses
of ontological term and unde ned classes and properties [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ], this leads to the lack
of semantic features in DBpedia ontology as illustrated in Table 4, only the
property domain-range restrictions can be applied. Unfortunately, only 30 functional
properties have been de ned. Furthermore, DBpedia ontology neither de nes
min and max cardinality nor the functional predicates nor transitive
properties nor the symmetric ones. In addition, we noted that according to the last
version of DBpedia (October-2016), 16.3% of predicates are represented
without domains and 10.2% of predicates are without ranges in DBpedia ontology.
For this reason, based on the approach that has been proposed in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], we infer
missed domains (and/or ranges) of predicates in DBpedia ontology. In case of
instances which have more than one rdf:type, only the class with the highest
value will be de ned as the domain (or range) of the property if this value is
greater than a selected threshold. When the highest value is smaller than the
threshold, owl:Thing will be selected as the domain (or range). We applied our
approach on the last version of DBpedia ontology v2016-10. Table 5 shows top 10
results from the DBpedia dataset ranked by schema analysis. We chose support
thresholds 0.1% for the content ltering part.
        </p>
        <p>
          We represent, in this section, our approach [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ] that addresses a completeness
aspect of linked data by posing the problem as an itemset mining problem. In
fact, the completeness at the data level assesses missing values [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ]. This vision
requires a schema (e.g. a set of properties) that needs to be inferred from the
data source. However, it is not relevant to be considered for a subset of resources.
However, the schema is as the union of all properties used in their description
as seen in Section 1.1. Indeed, this vision neglects the fact that missing values
can express inapplicability.
        </p>
        <p>
          Our mining-based approach includes two steps:
1. Properties mining: Given a dataset D, we rst represent the properties,
used for the description of the D instances, as a transaction vector. We then
apply the well-known FP-growth algorithm [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] for mining frequent itemsets
(we chose FP-growth for e ciency reasons, any other itemset mining
algorithm could obviously be used). Only a subset of these frequent itemsets,
called "Maximal" [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ], is captured. This choice is motivated by the fact that,
on one hand, we are interested in important properties for a given class that
should appear often, and on the other hand, the number of frequent patterns
could be exponential when the transaction vector is very large.
2. Completeness calculation: Once the set of maximal frequent itemsets
MF P is generated, we use the apparition frequency of items (properties)
in MF P to give each of them a weight that re ects how important the set
of properties is considered for the description of instances. Weights are then
exploited to calculate the completeness of each transaction (regarding the
presence or absence of properties) and, hence, the completeness of the whole
dataset.
        </p>
        <p>De nition 1. ( Completeness) Let I0 a subset of instances, T the set of
transactions constructed from I0, and MF P a set of maximal frequent
pattern. The completeness of I0 corresponds to the completeness of its
transaction vector T obtained by calculating the average of the completeness of T
regarding each pattern in MF P. Therefore, we de ne the completeness CP
of a subset of instance I0 as follows:</p>
        <p>CP(I0) =
1 jT j jMFPj (E(tk); P^j )</p>
        <p>X X
jT j k=1 j=1</p>
        <p>jMF Pj
such that: P^j 2 MF P , and (E(tk); P^j ) = n 1 if P^j E(tk)
0 otherwise
(1)
7.2</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conciseness assessment</title>
      <p>The objective of the semantic analysis is to nd the meaning of the
predicate. Depending on systematic analysis alone is not su cient to discover the
synonymously used predicates, also too many false positive results are
represented, especially when we deal with a large dataset. As the previous example
illustrated in Section 1.2, the predicates nationality and sourceCountry can have
the same object like Canada. They also never appear or co-occur together for
the same subject. Obviously, the nationality is a predicate of Person class and
sourceCountry is a predicate of Stream class.</p>
      <p>We add an important extension to the previous work by studying the
meaning of each candidate. In addition, we study some conditions to examine them
exploring their meanings so that we mathematically prove on a basis of
Description Logic that a predicate cannot be a synonym of another predicate if they
have disjoint domains or ranges. Through taking the same previous example of
nationality and sourceCountry predicates, we will analyze the domain and range
of each one of them. On one hand, the predicate nationality has a domain as
Person class and a range as Country class, and on the other hand, the
predicate sourceCountry has a domain as Stream class and a range as Country class.
According to DBpedia ontology Stream class is a subclass of Place class, as well
as Place and Person classes are completely disjointed. Consequently, we cannot
consider nationality and sourceCountry as synonym predicates.</p>
      <p>To promote our arguments, we will prove that a predicate cannot be a
synonym of other predicate in some cases according to the semantic features of each
one, such as: Disjoint properties based on their domains and ranges,
Symmetric/Asymmetric Property, Inverse Functional property, Functional property and
max cardinality. We illustrate these arguments using Description Logic
formalization.</p>
      <sec id="sec-3-1">
        <title>8 Evaluation plan</title>
        <p>
          Our goal is to compare the completeness and conciseness of dataset with our
presented approach to the state of the art such as Sieve [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. In the future, we
plan to enrich our investigation with other data sources such as Yago, IMDB,
etc. In addition, for conciseness we would compare our approach to the Abedjan
approach [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to judge the importance of the semantic analysis, we aim to prove
that the excluded candidates cannot be synonyms predicates. As the results show
that DBpedia dataset misses lots of metadata, we plan to nd an approach to
infer the features of the predicates since we believe that we can help to improve
the use of the semantic part.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>9 Re ections</title>
        <p>Since we believe that poor-quality data a ects negatively on the decision that
can lead to catastrophic consequences, improving the quality of linked data is our
research main aim because Web of Data is worthlessness without good quality.
We are concerned to concentrate on two dimensions, which do not have a lot
of metrics or indications to be evaluated, according to what Zaveri suggested.
We believe that extracting a reference schema from data source is more suitable
to calculate the completeness of dataset, beside, we prove the importance of
semantic part in addition to the statistical one to enhance the conciseness of
dataset. Our proposed approach takes into account only properties disjoint and
functional proprieties because semantic features are not su ciently present as
explained in Section 6.2. Therefore, our plan is to infer all possible semantic
features of LOD datasets too.</p>
        <p>Acknowledgements I would like to thank my supervisors Prof. Dr. Samira
Si-said Cher and Dr. Faycal Hamdi for their support and the opportunity for
the realization of this work.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Abedjan</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Naumann</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Synonym analysis for predicate expansion</article-title>
          .
          <source>In: Extended Semantic Web Conference</source>
          . pp.
          <volume>140</volume>
          {
          <fpage>154</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Balaraman</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Razniewski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nutt</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          :
          <article-title>Recoin: Relative completeness in wikidata</article-title>
          .
          <source>In: Companion of the The Web Conference 2018 on The Web Conference</source>
          <year>2018</year>
          . pp.
          <volume>1787</volume>
          {
          <fpage>1792</fpage>
          .
          <string-name>
            <surname>International World Wide Web Conferences Steering Committee</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Batini</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cappiello</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Francalanci</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Methodologies for data quality assessment and improvement</article-title>
          .
          <source>ACM Computing Surveys (CSUR) 41(3)</source>
          ,
          <volume>16</volume>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Debattista</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lange</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Luzzu{a framework for linked data quality assessment</article-title>
          .
          <source>In: Semantic Computing (ICSC)</source>
          ,
          <source>2016 IEEE Tenth International Conference on</source>
          . pp.
          <volume>124</volume>
          {
          <fpage>131</fpage>
          .
          <string-name>
            <surname>IEEE</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Furber,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Hepp</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          :
          <article-title>Swiqa-a semantic web information quality assessment framework</article-title>
          .
          <source>In: ECIS</source>
          . vol.
          <volume>15</volume>
          , p.
          <volume>19</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Grahne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>E ciently using pre x-trees in mining frequent itemsets</article-title>
          .
          <source>In: Proceedings of the ICDM 2003 Workshop on Frequent Itemset Mining Implementations, 19 December</source>
          <year>2003</year>
          , Melbourne, Florida, USA (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Gueret</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Groth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stadler</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Assessing linked data mappings using network measures</article-title>
          .
          <source>In: Extended Semantic Web Conference</source>
          . pp.
          <volume>87</volume>
          {
          <fpage>102</fpage>
          . Springer (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Han,
          <string-name>
            <given-names>J</given-names>
            .,
            <surname>Pei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            ,
            <surname>Yin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            ,
            <surname>Mao</surname>
          </string-name>
          , R.:
          <article-title>Mining frequent patterns without candidate generation: A frequent-pattern tree approach</article-title>
          .
          <source>Data Min. Knowl. Discov</source>
          .
          <volume>8</volume>
          (
          <issue>1</issue>
          ),
          <volume>53</volume>
          { 87 (Jan
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Hogan</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Harth</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Passant</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Decker</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Polleres</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Weaving the pedantic web</article-title>
          .
          <source>LDOW</source>
          <volume>628</volume>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Issa</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , Paris, P.,
          <string-name>
            <surname>Hamdi</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Assessing the completeness evolution of dbpedia: A case study</article-title>
          .
          <source>In: Advances in Conceptual Modeling - ER 2017 Workshops AHA</source>
          , MoBiD, MREBA, OntoCom, and
          <string-name>
            <surname>QMMQ</surname>
          </string-name>
          , Valencia, Spain, November 6-
          <issue>9</issue>
          ,
          <year>2017</year>
          , Proceedings. pp.
          <volume>238</volume>
          {
          <issue>247</issue>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Joseph</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>J.</surname>
          </string-name>
          , Richard S.,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Frank</surname>
          </string-name>
          <string-name>
            <surname>M.</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          :
          <article-title>The Quality Control Handbook</article-title>
          .
          <source>Rainbow-Bridge</source>
          ,
          <volume>3</volume>
          <fpage>edn</fpage>
          . (
          <year>1974</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Kontokostas</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Westphal</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hellmann</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cornelissen</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Test-driven evaluation of linked data quality</article-title>
          .
          <source>In: Proceedings of the 23rd international conference on World Wide Web</source>
          . pp.
          <volume>747</volume>
          {
          <fpage>758</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Mendes</surname>
            ,
            <given-names>P.N.</given-names>
          </string-name>
          , Muhleisen, H.,
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Sieve: linked data quality assessment and fusion</article-title>
          .
          <source>In: Proceedings of the 2012 Joint EDBT/ICDT Workshops</source>
          . pp.
          <volume>116</volume>
          {
          <fpage>123</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pipino</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>Y.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>R.Y.</given-names>
          </string-name>
          :
          <article-title>Data quality assessment</article-title>
          .
          <source>Communications of the ACM</source>
          <volume>45</volume>
          (
          <issue>4</issue>
          ),
          <volume>211</volume>
          {
          <fpage>218</fpage>
          (
          <year>2002</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Ruckhaus</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baldizan</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vidal</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          :
          <article-title>Analyzing linked data quality with liquate</article-title>
          .
          <source>In: OTM Confederated International Conferences" On the Move to Meaningful Internet Systems"</source>
          . pp.
          <volume>629</volume>
          {
          <fpage>638</fpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16. Topper, G.,
          <string-name>
            <surname>Knuth</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sack</surname>
          </string-name>
          , H.:
          <article-title>Dbpedia ontology enrichment for inconsistency detection</article-title>
          .
          <source>In: Proceedings of the 8th International Conference on Semantic Systems</source>
          . pp.
          <volume>33</volume>
          {
          <fpage>40</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Zaveri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rula</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Maurino</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pietrobon</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehmann</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Auer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Quality assessment for linked data: A survey</article-title>
          .
          <source>Semantic Web</source>
          <volume>7</volume>
          (
          <issue>1</issue>
          ),
          <volume>63</volume>
          {
          <fpage>93</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>