<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>De ning Views with Formal Concept Analysis for Understanding SPARQL Query Results</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Mehwish Alam</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Amedeo Napoli</string-name>
          <email>amedeo.napoli@loria.frg</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>CNRS, LORIA, UMR 7503, Vandoeuvre-les-Nancy</institution>
          ,
          <addr-line>F-54506</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Inria, Villers-les-Nancy</institution>
          ,
          <addr-line>F-54600</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Universite de Lorraine, LORIA, UMR 7503, Vandoeuvre-les-Nancy</institution>
          ,
          <addr-line>F-54506</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>SPARQL queries over semantic web data usually produce list of tuples as answers that may be hard to understand and interpret. Accordingly, this paper focuses on Lattice-Based View Access (LBVA), a framework based on FCA. This framework provides a classi cation of the answers of SPARQL queries based on a concept lattice, that can be navigated for retrieving or mining speci c patterns in query results. In this way, the concept lattice can be considered as a materialized view of the data resulting from a SPARQL query.</p>
      </abstract>
      <kwd-group>
        <kwd>Formal Concept Analysis</kwd>
        <kwd>SPARQL Query Views</kwd>
        <kwd>Lattice-Based Views</kwd>
        <kwd>SPARQL</kwd>
        <kwd>Classi cation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        At present, Web has become a potentially large repository of knowledge, which is
becoming main stream for querying and extracting useful information. In
particular, Linked Open Data (LOD) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] provides a method for publishing structured
data in the form of RDF resources. These RDF resources are interlinked with
each other to form a cloud. SPARQL queries are used in order to make these
resources usable, i.e., queried. In some cases, queries in natural language against
standard search engines can be simple to use but sometimes they are complex
and may require integration of data sources. Then the standard search engines
will not be able to easily answer these queries, e.g., Currencies of all G8
countries. Such a complex query can be formalized as a SPARQL query over data
sources present in LOD cloud through SPARQL endpoints for retrieving answers.
Moreover, users may sometimes execute queries which generate huge amount of
results giving rise to the problem of information overload [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. A typical example
is given by the answers retrieved by search engines, which mix between several
meanings of one keyword. In case of huge results, user will have to go through
a lot of results to nd the interesting ones, which can be overwhelming
without any speci c navigation tool. Same is the case with the answers obtained by
SPARQL queries, which are huge in number and it may be harder to extract
the most interesting patterns. This problem of information overload raises new
challenges for data access, information retrieval and knowledge discovery w.r.t
web querying.
      </p>
      <p>
        Accordingly, this paper proposes a new approach based on Formal Concept
Analysis (FCA [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ])s. It describes a lattice-based classi cation of the results
obtained by SPARQL queries by introducing a new clause VIEW BY in SPARQL
query. This framework, called Lattice-Based View Access (LBVA), allows the
classi cation of SPARQL query results into a concept lattice, referred to as a
view, for data analysis, navigation, knowledge discovery and information retrieval
purposes. This new clause VIEW BY which enhances the functionality of already
existing GROUP BY clause in SPARQL query by adding sophisticated classi cation
and Knowledge Discovery aspects. Here after, we describe how a lattice-based
view can be designed from a SPARQL query. Afterwards, a view is accessed for
analysis and interpretation purposes which are totally supported by the concept
lattice. In case of large data only a part of the lattice [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] can be considered for
the analysis. In this way, this paper investigates also the capabilities of FCA to
deal with semantic web data.
      </p>
      <p>
        The intuition of classifying results obtained by SPARQL queries is inspired
by web clustering engines [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] such as Carrot24. The general idea behind web
clustering engines is to group the results obtained by query posed by the user
based on the di erent meanings of the terms related to a query. Such systems
deal with unstructured textual data on web. By contrast, there are some
studies conducted to deal with structured RDF data. In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], the authors introduce
a clause Categorize By to target the problem of managing large amounts of
results obtained by conjunctive queries with the help of subsumption hierarchy
present in the knowledge base. By contrast, the VIEW BY clause generates
latticebased views which provide a mathematically well-founded classi cation based on
formal concepts and an associated concept lattice. Moreover, it also paves way
for navigation or information retrieval by traversing the concept lattice and for
data analysis by allowing the extraction of association rules from the lattice.
Such data analysis operations allow discovery of new knowledge. Additionally,
unlike Categorize By, VIEW BY can deal with data that has no schema (which
is often the case with linked data). Moreover, VIEW BY has been evaluated over
very large set of answers (roughly 100,000 results) obtained over real datasets. In
case of larger number of answers, Categorize By does not provide any pruning
mechanism while this paper describes how the views can be pruned using iceberg
lattices.
      </p>
      <p>The paper is structured as follows: Section 2 introduces a motivating
example. Section 3 gives a brief introduction of the state of the art while Section 4
de nes LBVA and gives the overall architecture of the framework. Section 5
discusses some experiments conducted using LBVA. Finally, Section 6 concludes
the paper.</p>
      <sec id="sec-1-1">
        <title>4 http://project.carrot2.org/index.html</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>In this section we introduce a motivating example focusing on why LOD should
be queried and why the SPARQL query results need classi cation. This scenario
will continue in the rest of the paper. Let us consider that a query Q searching for
museums where the exhibition of some famous artists is taking place along with
the location of the museum. Here, we do not discuss the interface aspects and we
will assume that SPARQL queries are provided. A standard query engine is not
adequate for answering such kind of questions and a direct query over LOD will
give better results. One of the ways to obtain such an information is to query
LOD through its SPARQL endpoint. This query will generate a huge amount of
results, which will need further manual work to group the interesting links.
3
3.1</p>
    </sec>
    <sec id="sec-3">
      <title>Background</title>
      <sec id="sec-3-1">
        <title>Linked Open Data</title>
        <p>
          Linked Open Data (LOD) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] is the way of publishing structured data in the
form of RDF graphs. Given a set of URIs U, blank nodes B and literals L, an
RDF triple is represented as t = (s; p; o) 2 (U [ B) U (U [ B [ L), where
s is a subject, p is a predicate and o is an object. A nite set of RDF triples is
called as RDF Graph G such that G = (V; E), where V is a set of vertices and E
is a set of labeled edges and G 2 G, such that G = (U [ B) U (U [ B [ L).
Each pair of vertices connected through a labeled edge keeps the information
of a statement. Each statement is represented as hsubject; predicate; objecti
referred to as an RDF Triple. V includes subject and object while E includes
the predicate.
        </p>
        <p>SPARQL5 is the standard query language for RDF. In the current work we
will focus on the queries containing SELECT clause. Let us assume that there
exists a set of variables V disjoint from U in the above de nition of RDF, then
(U [ V) (U [ V) (U [ V) is a graph pattern called a triple pattern. If a
variable ?X 2 V and ?X = c then c 2 U . Given U , V and a triple pattern t
a mapping (t) would be the triple obtained by replacing variables in t with
U . [[:]]G takes an expression of patterns and returns a set of mappings. Given
a mapping : V ! U and a set of variables W V , is represented as
jW , which is described as a mapping such that dom( jW ) = dom( ) \ W and
jW (?X) = (?X) for every ?X 2 dom( ) \ W . Finally, the SPARQL SELECT
query is de ned as follows:
De nition 1. A SPARQL SELECT query is a tuple (W; P ), where P is a graph
pattern and W is a set of variables such that W var(P ). The answer of (W; P )
over an RDF graph G, denoted by [[(W; P )]]G , is the set of mappings:
[[(W; P )]]G = f jW j 2 [[P ]]Gg</p>
        <sec id="sec-3-1-1">
          <title>5 http://www.w3.org/TR/rdf-sparql-query/</title>
          <p>
            In De nition 1, var(P ) is the set of variables in pattern P and W is the set
of variables in SELECT clause. Here, P includes the triple patterns containing
variables. This triple pattern is then evaluated against the RDF Graph G given
as [[P ]]G. It returns a set of mappings with respect to the variables in var(P ).
Finally a projection over is done w.r.t. the variables in W . The projected set of
mappings obtained as represented as jW . Further details on the formalization
and foundations of RDF databases are discussed in [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
          </p>
          <p>Example 1. Continuing the scenario in section 2, following is the SPARQL query:
1 SELECT ?museum ?country ?artist WHERE f
2 ?museum rdf:type dbpedia-owl:Museum .
3 ?museum dbpedia-owl:location ?city .
4 ?city dbpedia-owl:country ?country .
5 ?painting dbpedia-owl:museum ?museum .</p>
          <p>6 ?painting dbpprop:artist ?artistg
7 GROUP BY ?country ?artist
This query retrieves the list of museums along with the artists whose work is
exhibited in a museum along with the location of a museum. Lines 5 and 6
retrieve information about the artists whose work is displayed in some museum.
More precisely, the page containing the information on a museum (?museum) is
connected to the page of the artists (?artist) through a page on the work of
artist (?painting) displayed in the museum. In order to integrate these three
resources, two predicates were used dbpedia-owl:museum and dbpprop:artist.
An excerpt of the answers obtained by Group by clause is shown below:
Pablo Picasso Musee d'Art Moderne France
Leonardo Da Vinci Musee du Louvre France
Raphael Museo del Prado Spain</p>
          <p>The problem encountered while browsing such an answer is that there are too
many statements to navigate through. Even after using the GROUP BY clause the
answers are not organized in any ordered structure. By contrast, the clause VIEW
BY activates the LBVA framework, where the user will obtain a classi cation
of the statements as a concept lattice where statements are partially ordered
(see Figure 1a). To obtain the museums in UK displaying the work of Goya, all
the museums displaying the work of Goya can be retrieved and then the speci c
concept containing Goya and UK is obtained by navigation. The answer obtained
is National Gallery in the example.
3.2</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>Formal Concept Analysis (FCA)</title>
        <p>
          As the basics of Formal Concept Analysis (FCA) [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] are well known, we only
introduce some of the concepts which are necessary to understand this paper.
FCA is a mathematical framework used for a number of purposes, among which
classi cation and data analysis, information retrieval and knowledge discovery
[
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In some cases we obtain a huge number of concepts. In order to restrict the
(a) Classes of Museums w.r.t Artists and Countries, e.g., the
concept on the top left corner with the attribute France contains
all the French Museums, i.e., Musee du Louvre (Louvre) and
Musee d'Art Moderne (MAM). (VIEW BY ?museum)
(b) Classes of Artists w.r.t Museums and
Countries. (VIEW BY ?artist)
number of concepts, iceberg concept lattices can be used [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Iceberg concept
lattices contain only the top most part of the lattice. Along with iceberg lattices
a stability index [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] is also used for ltering the concepts. The stability index
shows how much the concept intent depends on particular objects of the extent.
        </p>
        <p>
          FCA also allows knowledge discovery using association rules. An implication
over the attribute set M in a formal context is of the form B1 ! B2, where
B1; B2 M . The implication holds i every object in the context with an
attribute in B1 also has all the attributes in B2. For example, when (A1; B1)
(A2; B2) in the lattice, we have that B1 ! B2. Duquenne-Guigues (DG) basis
for implications [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] is the minimal set of implications equivalent to the set of all
valid implications for a formal context K = (G; M; I). Actually, the DG-basis
contains all information lying in the concept lattice.
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Lattice-Based View Access</title>
      <sec id="sec-4-1">
        <title>SPARQL Queries with Classi cation Capabilities</title>
        <p>The idea of introducing a VIEW BY clause is to provide classi cation of the
results and add a knowledge discovery aspect to the results w.r.t the
variables appearing in VIEW BY clause. Let Q be a SPARQL query of the form Q
= SELECT ?X ?Y ?Z WHERE fpattern Pg VIEW BY ?X then the set of variables
V = f?X; ?Y; ?Zg 6. According to the de nition 1 the answer of the tuple (V; P )
is represented as [[(f?X; ?Y; ?Zg; P )]] = i where i 2 f1; : : : ; kg and k is the
number of mappings obtained for the query Q. For the sake of simplicity, jW
is given as . Here, dom( i) = f?X; ?Y; ?Zg which means that (?X) = Xi,</p>
        <sec id="sec-4-1-1">
          <title>6 As W represents set of attribute values in the de nition of a many-valued formal</title>
          <p>context, we represent the variables in select clause as V to avoid confusion.
(?Y ) = Yi and (?Z) = Zi. Finally, a complete set of mappings can be given
as ff?X ! Xi; ?Y ! Yi; ?Z ! Zigg.</p>
          <p>The variable appearing in the VIEW BY clause is referred to as object variable7
and is denoted as Ov such that Ov 2 V . In the current scenario Ov = f?Xg.
The remaining variables are referred to as attribute variables and are denoted as
Av where Av 2 V such that Ov [ Av = V and Ov \ Av = ;, so, Av = f?Y; ?Zg.
Example 2. Following the example in section 2, an alternate query with the VIEW
BY clause can be given as:
SELECT ?museum ?artist ?country WHERE f
?museum rdf:type dbpedia-owl:Museum .
?museum dbpedia-owl:location ?city .
?city dbpedia-owl:country ?country .
?painting dbpedia-owl:museum ?museum .</p>
          <p>?painting dbpprop:artist ?artistg
VIEW BY ?museum</p>
          <p>?museum ?artist ?country
1 Musee d'Art Moderne Pablo Picasso France
2 Museo del Prado Raphael Spain
... ... ... ...</p>
          <p>Here, V =f?museum, ?artist, ?countryg and P is the conjunction of
patterns in the WHERE clause then the evaluation of [[(f?museum; ?artist; ?countryg
; P )]] will generate the mappings shown in Table 1. Accordingly, dom( i) =
f?museum; ?artist; ?countryg. Here, 1(?museum) = M usee d0Art M oderne,
1(?artist) = P ablo P icasso and 1(?country) = F rance. We have Ov =
f?museumg because it appears in the VIEW BY clause and Av = f?artist;
?countryg. Figure 1a shows the generated view when Ov = f?museumg and
in Figure 1b, we have; Ov = f?artistg and Av = f?museum; ?countryg.
The results obtained by the query are in the form of set of tuples, which are
then organized as a many-valued context.</p>
          <p>Obtaining a Many-Valued Context (G; M; W; I): As described previously, we
have Ov = f?Xg then (?X) = fXigi2f1;:::;kg, where Xi denote the values
obtained for the object variable and the corresponding mapping is given as
ff?X ! Xigg. Finally, G = (?X) = fXigi2f1;:::;kg. Let Av = f?Y; ?Zg then
M = Av and the attribute values W = f (?Y ); (?Z)g = ffYig; fZiggi2f1;:::;kg.
The corresponding mapping for attribute variables are ff?Y ! Yi; ?Z ! Zigg.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>7 The object here refers to the object in FCA.</title>
          <p>In order to obtain a ternary relation, let us consider an object value gi 2 G and
an attribute value wi 2 W then we have (gi; \?Y 00; wi) 2 I i ?Y (gi) = wi, i.e.,
the value of gi for attribute ?Y is wi, i 2 f1; : : : ; kg as we have k values for ?Y .
Obtaining Binary Context (G; M; I): Afterwards, a conceptual scaling used for
binarizing the many-valued context, in the form of (G; M; I). Finally, we have
G = fXigi2f1;:::;kg, M = fYig [ fZig where i 2 f1; : : : ; kg for object variable
Ov = f?Xg. The binary context obtained after applying the above
transformations to the SPARQL query answers w.r.t to object variable is called the formal
context of answer tuples and is denoted by Ktuple.</p>
          <p>Example 3. In the example Ov = f?museumg, Av = f?artist; ?countryg. The
answers obtained by this query are organized into a many-valued context as
follows: the distinct values of the object variable ?museum are kept as a set of
objects, so G = fM useeduLouvre; M useodelP rado; : : : g, attribute variables
provide M = fartist; countryg, W1 = fRaphael; LeonardoDaV inci; : : : g and
W2 = fF rance; Spain; U K; : : : g in a many-valued context. The obtained
manyvalued context is shown in Table 2. Finally, the obtained many-valued context
is conceptually scaled to obtain a binary context shown in Table 3.</p>
          <p>Museum
Musee du Louvre
Musee d'Art Moderne
Museo del Prado
National Gallery</p>
          <p>Artist Country
fRaphael, Leonardo Da Vinci, Caravaggiog fFranceg</p>
          <p>fPablo Picassog fFranceg
fLeonfaRrdaophDaael,VCinacria,vCagagraiov,agFgraion,cFisrcaoncGisocyoagGoyag fUKg</p>
          <p>fSpaing</p>
          <p>
            The organization of the concept lattice is depending on the choice of
object variable and the attribute variables. Then, to group the artists w.r.t the
museums where their work is displayed and the location of the museums, the
object variable would be ?artist and the attribute variables will be ?museum
and ?country. Then, the scaling can be performed for obtaining a formal
context. In order to complete the set of attribute, domain knowledge can also be
taken into account, such as the the ontology related to the type of artists or
museums. This domain knowledge can be added with the help of pattern structures,
an approach linked to FCA, on top of many-valued context without having to
perform scaling. For the sake of simplicity, we do not discuss it in this paper.
Once the context is designed, the concept lattice can be built using an FCA
algorithm.There are some very e cient algorithms that can be used [
            <xref ref-type="bibr" rid="ref11 ref7">7, 11</xref>
            ]. However,
in the current implementation we use AddIntent [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ] which is an incremental
concept lattice construction algorithm. In case of large data iceberg lattices can
be considered [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ]. The use of VIEW BY clause activates the process of LBVA,
which transforms the SPARQL query answers (tuples) to a formal context Ktuples
through which a concept lattice is obtained which is referred to as a Lattice-Based
View. A view on SPARQL query in section 2, i.e, a concept lattice corresponding
to Table 3 is shown in Figure 1a.
          </p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.4 Interpretation Operations over Lattice-Based Views</title>
        <p>A formal context e ectively takes into account the relations by keeping the
inherent structure of the relationships present in LOD as object-attribute
relation. When we build a concept lattice, each concept keeps a group of terms
sharing some attribute (i.e., the relationship with other terms). This concept
lattice can be navigated for searching and accessing particular LOD elements
through the corresponding concepts within the lattice. It can be drilled down
from general to speci c concepts or rolled up to obtain the general ones which
can be further interpreted by the domain experts. For example, in order to search
for the museums where there is an exhibition of the paintings of Caravaggio,
the concept lattice in Figure 1(a) is explored levelwise. It can be seen that the
paintings of Caravaggio are displayed in Musee du Louvre, Museo del Prado
and National Gallery. Now it can be further ltered by country, i.e., look
for French museums displaying Caravaggio. The same lattice can be drilled
down and Musee du Louvre as an answer can be retrieved. Next, to check the
museums located in France and Spain, the roll up operation from the French
Museums to the general concept containing all the museums with Caravaggio's
painting can be applied and then the drill down operation to Museums in France
or Spain displaying Caravaggio can be performed. The answer obtained will be
Musee du Louvre and Museo del Prado.</p>
        <p>A di erent perspective on the same set of answers can also be retrieved,
meaning that the group of artists w.r.t museums and country. For selecting
French museums according to the artists they display, the object variable will be
Ov = f?artistg and attribute variables will be Av = f?museum; ?countryg. The
lattice obtained in this case will be from Artist's perspective (see Figure 1b).
Now, it is possible to retrieve Musee du Louvre and Musee d'Art Moderne,
which are the French museums and to obtain a speci c French museum displaying
the work of Leonardo Da Vinci a speci c concept can be selected which gives
the answer Musee du Louvre.</p>
        <p>FCA provides a powerful means for data analysis and knowledge discovery.
VIEW BY can be seen as a clause that engulfs the original SPARQL query and
enhances it's capabilities by providing views which can be reduced using
iceberg concept lattices. Iceberg lattices provide the top most part of the lattice
ltering out only general concepts. The concept lattice is still explored levelwise
depending on a given threshold. Then, only concepts whose extent is su ciently
large are explored, i.e., the support of a concept corresponds to the cardinal of
the extent. If further speci c concepts are required the support threshold of the
iceberg lattices can be lowered and the resulting concept lattice can be explored
levelwise.</p>
        <p>Knowledge Discovery: Among the means provided by FCA for knowledge
discovery, the Duquenne-Guigues basis of implications takes into account a
minimal set of implications which represent all the implications (i.e., association
rules with con dence 1) that can be obtained by accessing the view i.e., a
concept lattice. For example, implications according to Figure 1(a) state that all the
museums in the current context which display Leonardo Da Vinci also display
Caravaggio (rule: Leonardo Da Vinci ! Caravaggio). It also says that
only the museums which display the work of Caravaggio display the work of
Leonardo Da Vinci Such a rule can be interesting if the museums which
display the work of both Leonardo Da Vinci and Caravaggio are to be retrieved.
The rule Goya, Raphael, Caravaggio ! Spain suggests that there exists a
museum which have works of Goya, Raphael, Caravaggio only in Spain, more
precisely Museo Del Prado. (These rules are generated from only the part of
SPARQL query answers shown as a context in Table 3).
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Experimentation</title>
      <p>The experiments were conducted on real dataset. Our algorithm is implemented
in Java using Jena8 platform and the experiments were conducted on a laptop
with 2.60 GHz Intel core i5 processor, 3.7 GB RAM running Ubuntu 12.04. We
extracted the information about the movie with their genre and location using
SPARQL query enhanced with VIEW BY clause. The experiment shows that even
though the background knowledge (ontological information) was not extracted
the views reveal the hidden hierarchical information contained in the SPARQL
query answers and can be navigated accordingly. Moreover, it also shows that
useful knowledge is extracted from the answers through the the views using
DG Basis of implications. We also performed quantitative analysis where we
discussed about the sparsity of the semantic web data. We also tested how our
method scales with growing number of results. The number of answers obtained
by YAGO were 100,000. The resulting view kept the classes of movies with
respect to genre and location.
5.1</p>
      <sec id="sec-5-1">
        <title>YAGO</title>
        <p>The construction of YAGO ontology is based on the extraction of instances and
hierarchical information from Wikipedia and Wordnet. In the current
experiment, we sent a query to YAGO with the VIEW BY clause.</p>
        <p>PREFIX rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PREFIX yago: http://yago-knowledge.org/resource/
SELECT ?movie ?genre ?location WHERE f</p>
        <sec id="sec-5-1-1">
          <title>8 https://jena.apache.org/</title>
          <p>?movie rdf:type yago:wordnet movie 106613686 .
?movie yago:isLocatedIn ?location .</p>
          <p>?movie rdf:type ?genre . g
VIEW BY ?movie</p>
          <p>While querying YAGO it was observed that the genre and location
information was also given in the ontology. The rst level of the obtained view over the
SPARQL query results over YAGO kept the groups of movies with respect to
their languages. e.g., the movies with genre Spanish Language Films. However,
as we further drill down in the concept lattice we get more speci c categories
which include the values from the location variable such as Spain, Argentina
and Mexico. There were separate classes obtained for movies based on novels
which were then further specialized by the introduction of the country attribute
as we drill down the concept lattice. Finally with the help of lattice-based views,
it can be concluded that the answers obtained by querying YAGO provides a
clean categorization of movies by making use of the partially ordered relation
between the concepts present in the concept lattice.</p>
          <p>DG-Basis of Implications: DG-Basis of Implications for YAGO were
calculated. The implications were ltered in three ways. Firstly, pruning was
performed naively with respect to support threshold. Around 200 rules were
extracted on support threshold of 0.2%. In order, to make the rules observable,
the second type of ltering based on number of elements in the body of the
rules was applied. All the implications which contained one item set in the body
were selected. However, if there still are large number of implications to be
observed then a third type of pruning can be applied which involved the selection
of implications with di erent attribute type in head and body, e.g., in rule#1
head contains United States which is of type country and body contains the
wikicategory. Such kind of pruning helps in nding attribute-attribute relations.</p>
          <p>Table 4 contains some of the implications. Calculating DG Basis of
implications is actually useful in nding regularities in the SPARQL query answers
which can not be discovered from the raw tuples obtained. For example, rule#1
states that RKO picture films is an American lm production and distribution
company as all the movies produced and distributed by them are from United
States. Moreover, rule#2 says that all the movies in Oriya language are from
India. This actually points to the fact that Oriya is one of many languages that
is spoken in India. This rule also tells that Oriya language is only spoken in
India. Rule#3 shows a link between a category from Wikipedia and Wordnet,
which clearly says that the wikicategory is more speci c than the wordnet
category as remake is more general than Film remakes.</p>
          <p>Impl. ID Supp. Implication
1. 96 wikicategory RKO Pictures films ! United States
2. 46 wikicategory Oriya language films ! India
3. 64 wikicategory Film remakes ! wordnet remake
in% .52
tx 0
eno
t
iltfrsenaoDCm ..015020
yoF
0
1
.0 20
0
2
1
)sdn 001
o
iscne 80
(
ineTm 60
o
i
txceu 40
E 0
2
0 20
40</p>
          <p>60
Number of Tuples in %
80
100
40</p>
          <p>60
Number of Tuples in %
80
100
(a) Density of KY AGO (b) Runtime for Building LY AGO</p>
          <p>Fig. 2: Experimental Results.
5.2</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>Evaluation</title>
        <p>Besides the qualitative evaluation of LBVA, we performed an empirical
evaluation. The characteristics of the dataset are shown in Table 5. These concepts were
pruned with the help of iceberg lattices and stability for qualitative analysis.</p>
        <p>The plots for the experimentation are shown in Figure 2. Figure 2(a) shows a
comparison between the number of tuples obtained and the density of the formal
context. The density of the formal context is the proportion of pairs in I w.r.t
the size G M . It has very low range for both the experiments, i.e., it ranges
from 0.14% to 0.28%. This means in particular that the semantic web data is
very sparse when considered in a formal context and deviates from the datasets
usually considered for FCA (as they are dense). Here we can see that as the
number of tuples increases the density of the formal context is decreasing which
means that sparsity of the data also increases.</p>
        <p>We also tested how our method scales with growing number of results. The
number of answers obtained by YAGO were 100,000. Figure 2(b) illustrate the
execution time for building the concept lattice w.r.t the number of tuples
obtained. The execution time ranges from 20 to 100 seconds, it means that the
the concept lattices were built in an e cient way and large data can be
considered for these kinds of experiments. Usually the computation time for building
concept lattices depends on the density of the formal context but in the case
of semantic web data, as the density is not more than 1%, the computation
completely depends on the number of objects obtained which de nitely increase
with the increase in the number of tuples (see Table 5).
In LBVA, we introduce a classi cation framework based on FCA for the set of
tuples obtained as a result of SPARQL queries over LOD. In this way, a view
is organized as a concept lattice built through the use of VIEW BY clause that
can be navigated where information retrieval and knowledge discovery can be
performed. Several experiments show that LBVA is rather tractable and can be
applied to large data.</p>
        <p>
          For future work, we are interested in extending the VIEW BY clause by
including the available background knowledge of the resources using the formalism
of pattern structures [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ]. Moreover, we intend to use implications for
completing the background knowledge. We also intend to use pattern structures with a
graph description for each considered object, where the graph is the set of all
triples accessible w.r.t reference object.
        </p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>Marcelo</given-names>
            <surname>Arenas</surname>
          </string-name>
          , Claudio Gutierrez, and
          <string-name>
            <given-names>Jorge</given-names>
            <surname>Perez</surname>
          </string-name>
          .
          <article-title>Foundations of rdf databases</article-title>
          .
          <source>In Sergio Tessaris</source>
          , Enrico Franconi, Thomas Eiter, Claudio Gutierrez, Siegfried Handschuh,
          <string-name>
            <surname>Marie-Christine Rousset</surname>
          </string-name>
          , and Renate A. Schmidt, editors,
          <source>Reasoning Web</source>
          , volume
          <volume>5689</volume>
          of Lecture Notes in Computer Science, pages
          <volume>158</volume>
          {
          <fpage>204</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>Christian</given-names>
            <surname>Bizer</surname>
          </string-name>
          , Tom Heath, and
          <string-name>
            <surname>Tim</surname>
          </string-name>
          Berners-Lee.
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst.</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ):1{
          <fpage>22</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Carpineto</surname>
          </string-name>
          , Stanislaw Osinski, Giovanni Romano, and
          <string-name>
            <given-names>Dawid</given-names>
            <surname>Weiss</surname>
          </string-name>
          .
          <article-title>A survey of web clustering engines</article-title>
          .
          <source>ACM Comput. Surv.</source>
          ,
          <volume>41</volume>
          (
          <issue>3</issue>
          ):
          <volume>17</volume>
          :1{
          <fpage>17</fpage>
          :
          <fpage>38</fpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>Claudio</given-names>
            <surname>Carpineto</surname>
          </string-name>
          and
          <string-name>
            <given-names>Giovanni</given-names>
            <surname>Romano</surname>
          </string-name>
          .
          <article-title>Concept data analysis - theory and applications</article-title>
          . Wiley,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Claudia d'Amato</surname>
            ,
            <given-names>Nicola</given-names>
          </string-name>
          <string-name>
            <surname>Fanizzi</surname>
            , and
            <given-names>Agnieszka</given-names>
          </string-name>
          <string-name>
            <surname>Lawrynowicz</surname>
          </string-name>
          .
          <article-title>Categorize by: Deductive aggregation of semantic web query results</article-title>
          . In Lora Aroyo, Grigoris Antoniou, Eero Hyvonen, Annette ten Teije, Heiner Stuckenschmidt, Liliana Cabral, and Tania Tudorache, editors,
          <source>ESWC (1)</source>
          , volume
          <volume>6088</volume>
          of Lecture Notes in Computer Science, pages
          <volume>91</volume>
          {
          <fpage>105</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Sergei O.</given-names>
            <surname>Kuznetsov</surname>
          </string-name>
          .
          <article-title>Pattern structures and their projections</article-title>
          .
          <source>In Harry S. Delugach and Gerd Stumme</source>
          , editors,
          <source>ICCS</source>
          , volume
          <volume>2120</volume>
          of Lecture Notes in Computer Science, pages
          <volume>129</volume>
          {
          <fpage>142</fpage>
          . Springer,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Ganter</surname>
          </string-name>
          and
          <string-name>
            <given-names>Rudolf</given-names>
            <surname>Wille</surname>
          </string-name>
          .
          <source>Formal Concept Analysis: Mathematical Foundations</source>
          . Springer, Berlin/Heidelberg,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>J.-L. Guigues</surname>
            and
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Duquenne</surname>
          </string-name>
          .
          <article-title>Familles minimales d'implications informatives resultant d'un tableau de donnees binaires</article-title>
          .
          <source>Mathematiques et Sciences Humaines</source>
          ,
          <volume>95</volume>
          :5{
          <fpage>18</fpage>
          ,
          <year>1986</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Sergei</surname>
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Kuznetsov</surname>
          </string-name>
          .
          <article-title>On stability of a Formal Concept</article-title>
          . Ann. Math. Artif. Intell.,
          <volume>49</volume>
          (
          <issue>1-4</issue>
          ):
          <volume>101</volume>
          {
          <fpage>115</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Gerd</surname>
            <given-names>Stumme</given-names>
          </string-name>
          ,
          <article-title>Ra k Taouil, Yves Bastide, and Lot Lakhal. Conceptual clustering with iceberg concept lattices</article-title>
          . In R. Klinkenberg, S. Ruping,
          <string-name>
            <given-names>A.</given-names>
            <surname>Fick</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Henze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Herzog</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Molitor</surname>
          </string-name>
          , and O. Schroder, editors,
          <source>Proc. GI-Fachgruppentre en Maschinelles Lernen (FGML'01)</source>
          , Universitat Dortmund 763,
          <year>October 2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Dean van der Merwe</surname>
          </string-name>
          , Sergei A.
          <string-name>
            <surname>Obiedkov</surname>
          </string-name>
          , and
          <string-name>
            <surname>Derrick</surname>
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Kourie. Addintent</surname>
          </string-name>
          :
          <article-title>A new incremental algorithm for constructing concept lattices</article-title>
          . In Peter W. Eklund, editor,
          <source>ICFCA, Lecture Notes in Computer Science</source>
          , pages
          <volume>372</volume>
          {
          <fpage>385</fpage>
          . Springer,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>