<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Online Learning-based Efficient Search System for Sufficient SPARQL Endpoints using Extended Multi-armed Bandit Algorithm</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Yoshiaki Kadono</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Naoki Fukuta</string-name>
          <email>fukuta@cs</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Graduate School of Informatics, Shizuoka University</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>On searching and integrating various kinds of open data, one of crucial issues is to efficiently find sufficient SPARQL endpoints that store sufficient data to be retrieved in the query. In this demonstration, we will present our online learning-based efficient search algorithm and our prototype implementation based on an extended multi-armed bandit approach, which takes into account how the demanded data could be stored in each SPARQL endpoint by the results obtained from a series of test queries.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        At the beginning of 2014, we have over 500 of information sources known as endpoints
of Linked Open Data(LOD)∗. To utilize such endpoints effectively, we often make some
federated queries to some of useful endpoints available in the world. It is not an easy
task to accurately predict which endpoints could be helpful to be used for the queries,
since there are so many endpoints and also those endpoints are dynamically updating
their data but little information are given about how their stored data are useful for the
queries[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>To examine how an endpoint is useful and appropriate for a given query, we need to
pre-check how each endpoint has useful data as well as how such data can be effective
to retrieve further data stored in other endpoints. However, when we try to examine
all the endpoints to be accessed at one time, it will cause serious network congestions
due to its heavy network loads, as well as unwanted server loads in those endpoints.
Therefore, we need to effectively predict the performance of possible endpoints to be
used in a query. In this demonstration, we present a conceptual model of our online
learning-based efficient search algorithm and our prototype implementation based on
an extended multi-armed bandit approach, which takes into account how the demanded
data could be stored in each SPARQL endpoint by the results obtained from a series of
test queries.
∗http://sparqles.okfn.org/
the situation, the problem seeks how an algorithm can choose the best arm to obtain
maximal revenues as well as the cost for estimating the revenue distributions of arms.</p>
      <p>
        The Budget-Limited Multi-Armed Bandits(BLMAB) model[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ][
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] extends the MAB
model to handle limited budgets and costs for pulling arms to estimate revenue
distributions. Here, each arm i has its own cost ci to pull the arm, and the payment will be
withdrawn from the total budget B.
      </p>
      <p>The Extended BLMAB model for the end-point search problem(ESP) is an extended
BLMAB model, which introduces a two-dimension cost constraint, i.e., the cost for
network load and the cost for query execution time. In our extended BLMAB model,
the player can choose an arbitrary number of arms from the total K arms to pull in each
step. Each arm i has its cost ci to obtain the revenue based on an unknown distribution
µi. The player has budget B so that the cost to be payed should be in the budget, and also
the maximum steps is limited to be T ∈ N. The objective of the player is maximizing
its revenue by choosing arm i with the limited budget B within the limited T steps.
Therefore, we extend the original BLMAB constraints to the following:
(1)
(2)
3</p>
    </sec>
    <sec id="sec-2">
      <title>Algorithm for Extended BLMAB</title>
      <p>On the end-point search problem, we can assign each endpoint as each arm in the
extended BLMAB model. Here, often each endpoint has a limited number of variations in
its effective exploration queries. Therefore, the extended BLMAB model should satisfy
that, let Q be the number of variations in its exploration queries, an arm i should satisfy
NiA(B) ≤ Q.</p>
      <p>While the main objective of MAB is to maximize the total revenue, the main
objective in end-point search problem is to find the best arm to be used for the final (but not
exploration) query. Therefore, let i(A) be the best arm obtained from A, i(A∗) be the
theoretically best arm, we re-define the regression function for our extended BLMAB
model as follows:</p>
      <p>P
! K
" NiA(B)ci ≤ B ∩ t ≤ T
#</p>
      <p>= 1
i</p>
      <p>
        Here, as described in [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], we consider NiA(B) be the random variable that represents
the number of pulls of arm i by A, with respect to the budget limit B. Note that, as we
mentioned, the goal is to discover the best arm to be used, rather than total revenues
obtained by exploration queries.
      </p>
      <p>R(A) = i(A∗) − i(A)
4</p>
    </sec>
    <sec id="sec-3">
      <title>Implementation and Preliminary Evaluation</title>
      <p>difference of objectives, since our model is based on an extended BLMAB Model. In
Figure 4, the vertical axis shows the regression value defined in Equation 2. Notice that,
less regression value is better in this context.</p>
      <p>Figure 1 shows the overview of our prototype system, called SPARQL-MAB. We are
implementing the two algorithms in the system ∗. Further evaluations and refinements
about the models and algorithms are our future work.</p>
      <p>
        For the preparation of the target query to be examined, an abstract query[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] is used
which can include some unknown nodes, denoted by “&lt;&lt;” and “&gt;&gt;”. Figure 2 shows
an example of abstract query.
      </p>
      <p>In Figure 3, two example queries are shown. Those queries were generated from the
user’s query shown in Figure 2.</p>
      <p>∗A demonstration is available at http://whitebear.cs.inf.shizuoka.ac.jp/sparql mab/</p>
      <p>In this evaluation, we prepared a hundred of pseudo endpoints each of which stores
10 thousands of triples. Each triple is composed from a randomly selected concepts
generated from a hundred kind of nouns or concepts in the given ontologies.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>T.</given-names>
            <surname>Fujino</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fukuta</surname>
          </string-name>
          ,
          <source>Utilizing Weighted Ontology Mappings on Federated SPARQL Querying, In, Proc. the 3rd Joint International Semantic Technology Conference</source>
          , pp.
          <fpage>331</fpage>
          -
          <lpage>347</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>H.</given-names>
            <surname>Robbins</surname>
          </string-name>
          ,
          <article-title>Some aspects of the sequential design of experiments</article-title>
          ,
          <source>Bulletin of the AMS</source>
          <volume>55</volume>
          :
          <fpage>527</fpage>
          -
          <lpage>535</lpage>
          ,
          <year>1952</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3. L.
          <string-name>
            <surname>Tran-Thanh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chapman</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. Munoz De Cote Flores Luna</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          , Rogers and
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Jennings Epsilon-First Policies for Budget-Limited Multi-Armed Bandits</article-title>
          ,
          <source>In, Proc. Twenty-Fourth AAAI Conference on Artificial Intelligence(AAAI-10)</source>
          , pp.
          <fpage>1211</fpage>
          -
          <lpage>1216</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4. L.
          <string-name>
            <surname>Tran-Thanh</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          , Rogers and
          <string-name>
            <surname>N.</surname>
          </string-name>
          <article-title>Jennings Knapsack based optimal policies for budget-limited multi-armed bandits</article-title>
          ,
          <source>In, Twenty-Sixth AAAI Conference on Artificial Intelligence (AAAI-12)</source>
          , pp.
          <fpage>1134</fpage>
          -
          <lpage>1140</lpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>H.</given-names>
            <surname>Noguchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Fujino</surname>
          </string-name>
          , and
          <string-name>
            <given-names>N.</given-names>
            <surname>Fukuta</surname>
          </string-name>
          ,
          <article-title>On Implementing SPARQLoid and its Query Coding Support Framework - Querying with Weighted Ontology Mappings</article-title>
          ,
          <source>In, Proc. the 3rd Joint International Semantic Technology Conference (JIST2013)</source>
          ,
          <year>2013</year>
          .(demonstration)
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>