<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Lattice-Based View Access: A way to Create Views over SPARQL Query for Knowledge Discovery</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universite de Lorraine) BP 239, Vandoeuvre-les-Nancy</institution>
          ,
          <addr-line>F-54506</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The data published in the form of RDF resources is increasing day by day. This mode of data sharing facilitates the exchange of information across the domains. Although it provides easier ways in the use of data such as through SPARQL queries. These queries over semantic web data usually produce list of tuples as answers which may be huge in number or may require further manipulation so that it can be understood and interpreted. Accordingly, this paper introduces a new clause View By in the SPARQL query for creating semantic views over the raw SPARQL query answers. This approach namely, Lattice-Based View Access (LBVA), is a framework based on Formal Concept Analysis (FCA). It provides a classi cation of the answers of SPARQL queries based on a concept lattice, that can be navigated for retrieving or mining speci c patterns in query results w.r.t. user constraints. In this way, the concept lattice can be considered as a materialized view of the data resulting from a SPARQL query.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>SPARQL Query Views</kwd>
        <kwd>Lattice-Based Views</kwd>
        <kwd>SPARQL</kwd>
        <kwd>Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        A considerable amount of Semantic Web (SW) data is already available on the
web. Thus many agents are looking for more and more data present in the form
of ontologies and RDF triples. Linked Open Data (LOD) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is a huge source of
RDF resources interlinked with each other to form a cloud. SPARQL queries are
used in order to make these data usable by domain experts and software agents.
Sometimes queries are executed which may generate huge amount of results
giving rise to the problem of information overload [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. A typical example is the
answers retrieved by search engines, which may mix between several meanings
of one keyword. In case of huge results, many results are navigated to nd the
interesting links, which may be overwhelming without any navigation tool. Same
is the case with the answers obtained by SPARQL queries, which may be huge
in number and it may be harder to extract the most interesting patterns. This
problem of information overload raises new challenges for data modeling and
analysis and calls for improving data access, information retrieval and knowledge
discovery w.r.t web querying.
      </p>
      <p>
        In order to deal with the problem of information overload, this paper proposes
a new approach based on Formal Concept Analysis (FCA [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]), which provides a
lattice-based classi cation of the results obtained by SPARQL queries w.r.t user
constraints. This new framework, namely Lattice Based View Access (LBVA),
allows the classi cation of SPARQL query results into a concept lattice, referred
to as views, for data analysis, knowledge discovery and information retrieval
purposes. Based on one SPARQL query several views can be generated from
di erent perspectives. In addition, LBVA allows for navigation over SPARQL
query results. Here after, we describe how the views (a view corresponds to a
concept lattice) can be designed from a SPARQL query and the result which is
returned. Moreover, the analysis and the interpretation of the views is totally
supported by the concept lattice. In case of large data only a part of the lattice
can be considered for the analysis using the technique of iceberg lattices.
      </p>
      <p>
        The intuition of classifying results obtained by querying LOD is inspired by
web clustering engines [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] such as Carrot21. The general idea behind web
clustering engines is to group the results obtained by query posed by the user based
on the di erent meanings of the terms related to a query. Such systems deal with
unstructured textual data on web. However, there are some studies conducted to
deal with structured RDF data. In [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], the authors target the problem of
managing large amounts of results obtained by conjunctive queries with the help of
subsumption hierarchy present in the knowledge base. On the other hand,
latticebased views provide classi cation based on the formal concepts and a partially
ordered organization of the results. It also opens possibilities for navigation or
information retrieval by traversing the concept lattice and for data analysis by
allowing the extraction of association rules from the lattice. Additionally, unlike
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], LBVA also deals with data that has no schema (which is often the case with
linked data).
      </p>
      <p>The concept lattice provides a well founded structure on which a series of
interpretations can be carried out. This framework is general and does not depend
on any particular domain and may be used in addition with external resources,
e.g. domain knowledge.</p>
      <p>The paper is structured as follows: Section 1 gives a brief overview of Linked
Open Data and gives the motivating example. Section 2 de nes LBVA and gives
the overall architecture of the framework. Section 3 brie y described the
experimentation setting. Finally, Section 4 concludes the paper.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Linked Open Data</title>
      <p>
        Linked Open Data (LOD) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] is the way of publishing structured data which
helps in the connection between several resources through their schema. LOD
      </p>
      <sec id="sec-2-1">
        <title>1 http://project.carrot2.org/index.html</title>
        <p>represents data in the form of RDF graphs. Given a set of URIs U , blank nodes
B and literals L, an RDF triple is represented as t = (s; p; o) 2 (U [ B)
U (U [ B [ L), where s is a subject, p is a predicate and o is an object. A
nite set of RDF triples is called as RDF Graph G such that G = (V; E), where
V is a set of vertices and E is a set of labeled edges and G 2 G, such that
G = (U [ B) U (U [ B [ L). Each pair of vertices connected through a
labeled edge keeps the information of a statement. Each statement is represented
as hsubject; predicate; objecti referred to as an RDF Triple. V includes subject
and object while E includes the predicate.</p>
        <p>SPARQL2 is the standard query language for RDF. In the current work we
will focus more on the type of queries whose output performs value selection
over the variables matching the patterns (queries containing SELECT clause).
Now let us assume that there exists a set of variables V disjoint from U in
the above de nition of RDF, then (U [ V ) (U [ V ) (U [ V ) is a graph
pattern called a triple pattern. If a variable ?X 2 V and ?X = c then c 2 U .
Given U , V and a triple pattern t a mapping (t) would be the triple obtained
by replacing variables in t with U . [[:]]G takes an expression of patterns and
returns a set of mappings. Given a mapping : V ! U and a set of variables
W V , is represented as jW , which is described as a mapping such that
dom( jW ) = dom( ) \ W and jW (?X) = (?X) for every ?X 2 dom( ) \ W .
Finally, the SPARQL SELECT query is de ned as follows:
De nition 1. A SPARQL SELECT query is a tuple (W; P ), where P is a graph
pattern and W is a set of variables such that W var(P ). The answer of (W; P )
over an RDF graph G, denoted by [[(W; P )]]G , is the set of mappings:
[[(W; P )]]G = f jW j 2 [[P ]]Gg</p>
        <p>
          In the above de nition var(P ) is the set of variables in pattern P and
variables W in SELECT clause of SPARQL query3. Further details on the
formalization and foundations of RDF databases are discussed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
        </p>
        <p>Example 1. Consider a query all the bands which play di erent stringed
instruments along with their origin. This example will continue in the rest of this
paper. Let us name this query Q, then Q can not be answered by standard search
engines as it generates a separate list of bands and stringed instruments
requiring multiple resources to be integrated. However, Q can be answered by SPARQL
queries over LOD. For example, let us consider the SPARQL query in Listing 1.1
over DBpedia4. DBpedia is the central hub of LOD which extracts data from
Wikipedia and makes it available in the structured format.
2 http://www.w3.org/TR/rdf-sparql-query/
3 In the rest of the paper we denote W as V to avoid overlap between the attribute
values W in many-valued context.
4 http://dbpedia.org/sparql
(a) Classes of Bands w.r.t. Musical Instruments
and Countries, e.g., the concept on the top right
corner with the attribute Cuatro contains all the
bands which play Cuatro.
(b) Classes of Musical Instruments w.r.t Bands
and their Origin</p>
        <p>Listing 1.1: SPARQL Query Q
SELECT ? band ? i n s t r u m e n t ? o r i g i n WHERE f
? band r d f : t y p e dbpedia owl : Band .
? band dbpprop : o r i g i n ? o r i g i n .
? band dbpedia owl : bandMember ?member .
?member dbpedia owl : i n s t r u m e n t ? i n s t r u m e n t .</p>
        <p>? i n s t r u m e n t d c t e r m s : s u b j e c t C a t e g o r y : S t r i n g i n s t r u m e n t s .
GROUP BY ? i n s t r u m e n t ? o r i g i n</p>
        <p>The above SPARQL query returns a list of bands along with the instruments
they play and their origin as an answer. An excerpt of the answers is shown
below:
dbpedia5:RHCP6 dbpedia:Banjo dbpedia:US
dbpedia:Disturbed dbpedia:Bass Guitar dbpedia:US,
dbpedia:The Solution dbpedia:Banjo dbpedia:Sweden.</p>
        <p>In case of too many origins GROUP BY clause will lead to many small groups
which would be hard for the user to observe with respect to origin or instrument,
failing in the task of grouping. A classi cation technique can be used for
navigation or interpretation. For example, Figure 1(a) shows a concept lattice for a
small part of query answers. Here we can see classes such as the concept which
contains all the bands which play Cuatro. If the search is more speci ed then
the origin of each of the bands can also be retrieved. It is possible to retrieve
bands which play Cuatro and are from UK, here Chrome Hoof is the band which
plays Cuatro in the current small example. On the other hand, Figure 1(b)
shows a concept lattice where musical instruments are classi ed with respect to
bands and their origin, giving a totally di erent perspective over the same set
of answers.</p>
      </sec>
      <sec id="sec-2-2">
        <title>5 http://dbpedia.org/resource/ 6 Red Hot Chilli Peppers</title>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Lattice-Based View Access</title>
      <p>In this paper, we propose an approach called Lattice-Based View Access which
generates a concept lattice referred to as view. This view provides users with
classi cation, navigation and analysis capabilities over these results. In the scenario
of LOD, query processing procedure can not be controlled, so, in our algorithm
we do not process the SPARQL query. The views are de ned over RDF data by
processing the set of tuples returned by the SPARQL query.
2.1</p>
      <sec id="sec-3-1">
        <title>SPARQL Queries with Classi cation Capabilities</title>
        <p>The idea of introducing a VIEW BY clause is to provide classi cation of the
results and add a knowledge discovery aspect to the results w.r.t the variables
appearing in VIEW BY clause. For example, we have a SPARQL SELECT query
Q = SELECT ?X ?Y ?Z WHERE fpattern Pg VIEW BY ?X then the set of
variables V = f?X; ?Y; ?Zg. According to the de nition 1 the answer of the tuple
(V; P ) is represented as [[(f?X; ?Y; ?Zg; P )]] = i where i 2 f1; : : : ; kg and k is
the number of mappings obtained for the query Q. Here, dom( i) = f?X; ?Y; ?Zg
which means that (?X) = Xi, (?Y ) = Yi and (?Z) = Zi. Finally, a complete
set of mappings can be given as ff?X ! Xi; ?Y ! Yi; ?Z ! Zigg.</p>
        <p>Now, variables appearing in the VIEW BY clause are referred to as object
variable7 and is denoted as Ov such that Ov 2 V . In the current scenario Ov =
f?Xg. The remaining variables are referred to as attribute variables and are
denoted as Av where Av 2 V such that Ov [ Av = V and Ov \ Av = ;.
Example 2. An alternate query for the query in Listing 1.1 with the VIEW BY
clause can be given as:
SELECT ?band ?instrument ?origin WHERE f
?band rdf:type dbpedia-owl:Band.
?band dbpprop:origin ?origin.
?band dbpedia-owl:bandMember ?member .
?member dbpedia-owl:instrument ?instrument .</p>
        <p>?instrument dcterms:subject dbpedia8:Category:String instruments.g
VIEW BY ?band
Here, V=f?band, ?instrument, ?origing then the evaluation of the SELECT
query [[(f?band; ?instrument; ?origing; P )]] will generate the mappings shown
in Table 1. Accordingly, dom( i) = f?band; ?instrument; ?origing. Here, 1(?band)
= RHCP , 1(?instrument) = Banjo and 1(?origin) = U S. In the current
example, we have, Ov = f?bandg because it appears in the VIEW BY clause
and Av = f?instrument; ?origing. Figure 1a shows the generated view when
Ov = f?bandg and in Figure 1b, we have; Ov = f?instrumentg.
7 The object here refers to the object in FCA.
8 http://dbpedia.org/resource/
... ...</p>
        <p>?band ?instrument ?origin
1 RHCP Banjo US
2 Disturbed Bass Guitar US</p>
        <p>... ...
The results obtained by the query are in the form of set of tuples, which are then
organized as a many-valued context. If Ov = f?Xg then (?X) = fXigi2f1;:::;kg,
where Xi denote the values obtained for the object variable and the
corresponding mapping is given as ff?X ! Xigg. Finally, G = (?X) = fXigi2f1;:::;kg. Let
Av = f?Y; ?Zg then M = Av and the attribute values W = f (?Y ); (?Z)g =
ffYig; fZiggi2f1;:::;kg. The corresponding mapping for attribute variables are
ff?Y ! Yi; ?Z ! Zigg Consider an object value gi 2 G and an attribute
value wi 2 W then we have (gi; \?Y 00; wi) 2 I i ?X(gi) = wi, i.e., the value of
gi for attribute ?Y is wi, i 2 f1; : : : ; kg as we have k values for ?Y .
Obtaining Binary Context (G; M; I): Afterwards, a conceptual scaling used for
binarizing the many-valued context, in the form of (G; M; I). Finally, we have
G = fXigi2f1;:::;kg, M = fYig [ fZig where i 2 f1; : : : ; kg for object variable
Ov = f?Xg.</p>
        <p>
          Example 3. In the example Ov = fbandg, Av = finstrument; origing. The
answers obtained by this query are organized into a many-valued context as
follows: the distinct values of the object variable ?band are kept as a set of
objects, so G = fRHCP; Disturbed; : : : g, attribute variables provide M =
finstrument; origing, W1 = fBanjo; BassGuitar; : : : g and W2 = fU S; U K;
F rance : : : g in a many-valued context. The obtained many-valued context is
shown in Table 2. Following the above de ned procedure a many-valued
context is conceptually scaled to obtain a binary context shown in Table 3. The
corresponding concept lattice is shown in Figure 1(a).
Once the context is designed, the concept lattice can be built using an FCA
algorithm. This step is straight forward as soon as the context is provided. In
the current implementation we use AddIntent [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] which is an incremental
concept lattice construction algorithm. In case of large data iceberg lattices can be
considered [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. The use of VIEW BY clause activates the process of LBVA, which
transforms the SPARQL query answers (tuples) to a formal context Kanswers
through which a concept lattice is obtained which is referred to as a Lattice-Based
View. A view on SPARQL query in section 1, i.e, a concept lattice corresponding
to Table 3 is shown in Figure 1a. At the end of this step the concept lattice is
built and the interpretation step can be considered.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.4 Interpretation Operations over a Concept Lattice</title>
        <p>Navigation Operation and Knowledge Discovery: The obtained concept
lattice can be navigated for searching and accessing particular LOD elements.
It is possible to drill down from general to speci c concepts according to some
constraints. For example, in order to search for bands in US playing Banjo, the
concept lattice in Figure 1(a) is explored levelwise. First the broader concept
contains all the bands from US, RHCP, The Solution, Disturbed. Then, the
children concepts contain more speci c concepts with the instruments Banjo
and Bass Guitar. According to the initial constraint, the attribute concept of
Banjo can be selected returning two objects namely RHCP, The Solution. Next,
to check which instruments are played in music originating from US, another
concept lattice can be explored, where objects correspond to instruments shown
in Figure 1(b). The results in this case is the set of objects Bass Guitar, Banjo.</p>
        <p>FCA provides a powerful means for data analysis and knowledge discovery.
Iceberg lattices provide the top most part of the lattice ltering out only general
concepts. The concept lattice is still explored levelwise depending on a given
threshold. Then, only concepts whose extent is su ciently large are explored,
i.e., the support of a concept corresponds to the cardinal of the extent. If further
speci c concepts are required the support threshold of the iceberg lattices can
be lowered and the resulting concept lattice can be explored levelwise.</p>
        <p>Another way of interpreting the data is provided by Duquenne-Guigues
basis of implications which takes into account a minimal set of implications which
represent all the association rules that can be generated for a given formal
context. For example, DG-basis of implications according to the formal context in
Table 3 state that all the bands which play Banjo are from US (rule: Banjo !
US). Moreover, the rule Venezuela ! Cuatro suggests that all the bands from
Venezuela play Cuatro. This rule states that Cuatro is widely used in the folk
music of Venezuela.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimentation</title>
      <p>
        Several experiments were conducted on real datasets. These datasets include
DBpedia, Yago [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], which is a knowledge base automatically extracted from
Wikipedia (infoboxes, categories), Wordnet and Geonames. The experiment was
also tested on the biomedical data such as Sider9 and Drugbank10. Sider keeps
the information about the medicines along with their side e ects. Drugbank
keeps the detailed information about the drugs such as drug category and target
proteins. These experiments provide qualitative and quantitative evaluation to
our approach. These experiments are not discussed in the current paper due to
lack of space. However, the software, a detailed technical report along with the
visualization of the SPARQL query views can be accessed online11.
4
      </p>
    </sec>
    <sec id="sec-5">
      <title>Conclusion and Discussion</title>
      <p>In LBVA, we introduce a classi cation framework for the set of tuples obtained as
a result of SPARQL queries over LOD. We introduce a classi cation framework
based on FCA for organizing a view, i.e., the set of tuples resulting from a
SPARQL query. In this way, the view is organized as a concept lattice that
can be navigated where information retrieval and knowledge discovery can be
performed. For future work, we are interested in working with several object
variable allowing to deal with more complex relations, with the help of Relational
Concept Analysis (RCA). In addition, here only binary contexts are taken into
account. It is possible to go beyond this limitation in using another variation of
FCA which is the formalism of pattern structures.
9 http://sideeffects.embl.de/
10 http://www.drugbank.ca/
11 http://webloria.loria.fr/~alammehw/lbva/</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Claudio Gutierrez, and
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Perez</surname>
          </string-name>
          .
          <article-title>Foundations of rdf databases</article-title>
          .
          <source>In Reasoning Web</source>
          , pages
          <volume>158</volume>
          {
          <fpage>204</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Carpineto</surname>
          </string-name>
          , Stanislaw Osinski, Giovanni Romano, and
          <string-name>
            <given-names>Dawid</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>A survey of web clustering engines</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>41</volume>
          (
          <issue>3</issue>
          ):
          <volume>17</volume>
          :1{
          <fpage>17</fpage>
          :
          <fpage>38</fpage>
          ,
          <year>July 2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Claudia d'Amato</surname>
            ,
            <given-names>Nicola</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
            , and
            <given-names>Agnieszka</given-names>
          </string-name>
          <string-name>
            <surname>Lawrynowicz</surname>
          </string-name>
          .
          <article-title>Categorize by: Deductive aggregation of semantic web query results</article-title>
          .
          <source>In ESWC (1)</source>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer, Berlin/Heidelberg,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>G.</given-names>
            <surname>Stumme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Taouil</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bastide</surname>
          </string-name>
          , and
          <string-name>
            <given-names>L.</given-names>
            <surname>Lakhal</surname>
          </string-name>
          .
          <article-title>Conceptual clustering with iceberg concept lattices</article-title>
          .
          <source>In Proc. GI-Fachgruppentre en Maschinelles Lernen (FGML'01)</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Fabian</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Suchanek</surname>
            , Gjergji Kasneci, and
            <given-names>Gerhard</given-names>
          </string-name>
          <string-name>
            <surname>Weikum</surname>
          </string-name>
          .
          <article-title>Yago: A core of semantic knowledge</article-title>
          .
          <source>In Proceedings of the 16th International Conference on World Wide Web, WWW '07</source>
          , pages
          <fpage>697</fpage>
          {
          <fpage>706</fpage>
          , New York, NY, USA,
          <year>2007</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dean van der Merwe</surname>
          </string-name>
          , Sergei A.
          <string-name>
            <surname>Obiedkov</surname>
          </string-name>
          , and
          <string-name>
            <surname>Derrick</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kourie. Addintent</surname>
          </string-name>
          :
          <article-title>A new incremental algorithm for constructing concept lattices</article-title>
          .
          <source>In ICFCA</source>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>