<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Building Service Composition based on Statistics of the Services Use</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Fedorov Roman</string-name>
          <email>fedorov@icc.ru</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Matrosov Institute for System Dynamics and Control Theory of Siberian Branch of Russian Academy of Sciences</institution>
          ,
          <addr-line>Irkutsk 664033</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>A distributed computing environment is being formed within the geoportal IDSTU SB RAS, which is mainly used for processing spatial data. Processing services that implement the WPS standard, and data storage services, a catalog of WPS services, and service execution systems and their compositions have been developed. In the system for executing services statistics data is collected on user-made service calls. The analysis of the obtained statistical data is carried out in order to identify the data transfer between services calls and build service compositions.</p>
      </abstract>
      <kwd-group>
        <kwd>SOA</kwd>
        <kwd>WPS</kwd>
        <kwd>spatial data processing</kwd>
        <kwd>semantic network</kwd>
        <kwd>composition of services</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>
        tion, programming languages are actively used to define service compositions. There
are a number of workflow systems that allow you to create service compositions and
have achieved significant results in various subject areas: Pegasus [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], Kepler [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ],
Swift [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], KNIME [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], Taverna [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], Galaxy [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], Trident [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] and Triana [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ],
Everest (Mathcloud) [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], CLAVIRE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. For processing spatial data, Geo-Processing
Workflows is actively used [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. The generally accepted standard for determining the
composition of services is Directed acyclic graph (DAG) [16, 17], in which the
vertices are the service calls, and the arcs are the dependencies between services according
to the data. Creating a service composition is a complex process that requires
programming skills from the user. The authors propose a method that automates the
creation of service compositions based on statistical data of the services use by the user.
This method allows you to connect single service calls to each other, to determine the
dependencies from the data.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Statistics collection</title>
      <p>Within the framework of the geoportal IDSTU SB RAS, a distributed computing
environment is being formed, which is mainly used for processing spatial data.
Processing services that implement the WPS standard, and data storage services, a
catalog of WPS services, and service execution systems have been developed. The
interaction between processing services and data storage services is implemented, which
allows simplifying the use of services by the user. In the system for executing services
and their compositions, statistics of service calls is collected. The statistics data
includes the name of the service and its address, values of input and output parameters,
service execution time, successful execution, execution errors, etc.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Definitions</title>
      <p>We introduce the following notation necessary to describe the operation of the
method.</p>
      <p>s = &lt;name, I, O&gt; is the service, where «name» is the name of the service, I is the
set of input parameters, O is the set of output parameters of the service. Below we
will denote s.I and s.O parameters that belong to a particular service. A user service
call has input parameter values specified by the user and output parameter values.
t = &lt;VI, VO&gt; - service s call with values VI, VO of parameters s.I and s.O.</p>
      <p>Log is the set of successful service calls made by users. The cause of the
unsuccessful service call may be incorrect parameter values. Therefore, statistics are filtered
and only successful calls are left. Based on the Log data, it is necessary to form a set
of service compositions in the form of DAG.</p>
    </sec>
    <sec id="sec-4">
      <title>Defining data transfer between two service calls</title>
      <p>To recognize a service composition, it is necessary to determine the presence of
data transfer between two service calls. The determination of the relationship between
two service calls can be made based on the analysis of parameter values. Parameters
of service call can be divided into input and output. It is assumed that if the value of
the output parameter matches the value of the input parameter, then there may be data
communication between service calls. Parameter values can be string, numerical type,
or resource URIs. Usually these are URLs to files, data services, etc. Each URI used
to receive data uniquely identifies the transmitted data. Accordingly, using URI
parameters, data transferring from one service call to another can be determined.
Analysis of string and numerical parameters is more difficult, because of matching the
values may be casual. Currently, string and numeric parameters are not considered.</p>
      <p>Here is an algorithm for checking data transfer between two service calls.
Input: ti, tj - two service calls
Output: flag - connection between service calls and by what parameters
function islinked
flag &lt;- False
for each vm in ti.VO:
for each vn in tj.VI:</p>
      <p>If (typeof vm = file and typeof vn = file and vm = vn) then
flag &lt;- True
vn.linkedwith &lt;- vm
end if
end for
end for
return flag</p>
      <p>This algorithm compares the output parameters of one service call with the input
parameters of another service call. If the URL values of the parameters match, the
connection between service calls is established. In addition, it is determined by what
parameters they are connected.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Building service compositions based on statistics</title>
      <p>To analyze the set of successful calls to Log services, the following algorithm is used,
at the output of which we obtain a general directed acyclic graph DAGgen, where calls
to ti services are connected by data. Additionally, we obtain the values of the
parameters of services entered by the user.</p>
      <p>Input: Log
Output: list - semantic relationships between services
for each ti in Log:
for each tj in Log:</p>
      <p>If islinked (ti, tj) then</p>
      <p>add (ti, tj) into list
end for
end for</p>
      <p>A user can perform a data-connected sequence of service calls that forms a
connected subgraph Di (graph connectivity components) in DAGgen that is not connected
to other subgraphs. Each such subgraph Di is a special case of the implementation of
the composition of services. To search for Di subgraphs, the width search algorithm is
used. Moreover, the complexity of the search is linear. It depends from the sum of the
number of vertices and the number of edges of the graph DAGgen.</p>
      <p>The user can often repeat the same sequence of service calls. This leads to the
creation of several Di subgraphs in which calls of the same services are made, but the
parameter values differ, i.e. creating isomorphic subgraphs Di, considering only
services and data transfer. Checking for isomorphism of graphs is carried out by
traversing oriented subgraphs.</p>
      <p>As a result of the execution of the algorithm, we obtain several sets of isomorphic
IDAGi subgraphs. The number of Dj subgraphs in IDAGi is an estimate of the
frequency of use of this services composition. Next, filtering isomorphic IDAGi
subgraphs is carried out, for which the number of vertices is more than one and there is at
least one call to the service for publishing data. The resulting composition of services
is shown in Fig. one.
The proposed method, in contrast to existing approaches, uses statistics to build
service compositions. To build a composition, the user only needs to sequentially
perform the necessary services. The method automatically connects single service calls
to each other, determines data dependencies. The application of the method is focused
on the automation of frequently repeated user actions.</p>
      <p>This work was supported in part by the Russian Federal Property Fund (grant
1807-00758-a, 17-57-44006-mong-a, 17-47-380007-r), the program of the Presidium of
the Russian Academy of Sciences No. 2, the program “Information and
telecommunication platform for digital monitoring of Lake Baikal, based on end-to-end
technologies”, Shared Equipment Centers of ISDCT SB RAS.
16. Kwok, Y.-K. Static scheduling algorithms for allocating directed task graphs to
multiprocessors / Y.-K. Kwok, I. Ahmad // ACM Computing Surveys. 1999. Vol. 31, № 4. P. 406 –
471.
17. Xie, G. A High-Performance DAG Task Scheduling Algorithm for Heterogeneous
Networked Embedded Systems / G. Xie, R. Li, X. Xiao, Y. Chen // Proc. of IEEE 28th
International Conference Advanced Information Networking and Applications. 2014. P. 1011–
1016.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Ontologies and the Semantic Web / S.</article-title>
          <string-name>
            <surname>Grimm</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Abecker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          <string-name>
            <surname>Volker</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <string-name>
            <surname>Studer</surname>
          </string-name>
          <article-title>// Handbook of semantic web technologies: foundations and technologies, VOLS 1 and 2</article-title>
          . -
          <fpage>2011</fpage>
          . -P.
          <fpage>507</fpage>
          -
          <lpage>579</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Schut</surname>
          </string-name>
          , P. OpenGIS ® Web Processing Service / P. Schut // Open Geospatial Consortium.
          <article-title>-</article-title>
          <year>2007</year>
          . -
          <fpage>№</fpage>
          6. - P.
          <fpage>1</fpage>
          -
          <lpage>3</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Pautasso</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          <article-title>RESTful Web service composition with BPEL for REST / C</article-title>
          . Pautasso // Data knowledge.
          <source>- 2009</source>
          . - Vol.
          <volume>68</volume>
          , №
          <fpage>9</fpage>
          . - P.
          <fpage>851</fpage>
          -
          <lpage>866</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hoffmann</surname>
            , J. Web Service Composition / J. Hoffmann,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Weber // Encyclopedia of Social Network Analysis and Mining</article-title>
          . - Springer-Verlag,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Deelman</surname>
            ,
            <given-names>E. Pegasus,</given-names>
          </string-name>
          <article-title>a workflow management system for science automation</article-title>
          / E. Deelman,
          <string-name>
            <given-names>K.</given-names>
            <surname>Vahi</surname>
          </string-name>
          , G. Juve // Future Generation Computer Systems 46C. P.
          <volume>17</volume>
          -
          <fpage>35</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Ludäscher</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          <article-title>Scientific Workflow Management and the Kepler System / B</article-title>
          . Ludäscher, Altintas,
          <string-name>
            <given-names>C.</given-names>
            <surname>Berkley</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Higgins</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Jaeger-Frank</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Jones</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tao</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y</surname>
          </string-name>
          . Zhao // Special Issue:
          <article-title>Workflow in Grid Systems</article-title>
          .
          <source>Concurrency and Computation: Practice &amp; Experience</source>
          .
          <year>2006</year>
          . Vol.
          <volume>18</volume>
          (
          <issue>10</issue>
          ). P.
          <volume>1039</volume>
          -
          <fpage>1065</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Wilde</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>Swift: A language for distributed parallel scripting / M.</article-title>
          <string-name>
            <surname>Wilde</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Hategan</surname>
          </string-name>
          , J.M. Wozniak // Parallel Computing.
          <year>2011</year>
          . Vol.
          <volume>37</volume>
          (
          <issue>9</issue>
          ). P.
          <volume>633</volume>
          -
          <fpage>652</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Berthold</surname>
            ,
            <given-names>M.R.</given-names>
          </string-name>
          <article-title>The konstanz information miner /</article-title>
          <string-name>
            <surname>M.R. Berthold</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          <string-name>
            <surname>Cebron</surname>
          </string-name>
          , F. Dill // SIGKDD Explorations 11.
          <year>2009</year>
          . P.
          <volume>26</volume>
          -
          <fpage>31</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Wolstencroft</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <article-title>The Taverna workflow suite: designing and executing workflows of Web Services on the desktop, web or in the cloud /</article-title>
          K. Wolstencroft,
          <string-name>
            <given-names>R.</given-names>
            <surname>Haines</surname>
          </string-name>
          , D. Fellows // Nucleic Acids Research.
          <year>2013</year>
          . Vol.
          <volume>41</volume>
          (
          <issue>W1</issue>
          ). P.
          <volume>557</volume>
          -
          <fpage>561</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Blankenberg</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Galaxy: A Web-Based Genome Analysis Tool for Experimentalists</article-title>
          . / D. Blankenberg,
          <string-name>
            <given-names>G.V.</given-names>
            <surname>Kuster</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Coraor</surname>
          </string-name>
          . Wiley.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Simmhan</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          <article-title>Building the trident scientific workflow workbench for data management in the cloud</article-title>
          / Y. Simmhan,
          <string-name>
            <given-names>R.</given-names>
            <surname>Barga</surname>
          </string-name>
          , C. Ingen // Advanced Engineering Computing and Applications in
          <source>Sciences (ADVCOMP)</source>
          .
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Churches</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <article-title>Programming scientific and distributed workflow with Triana services</article-title>
          : Research articles / D. Churches,
          <string-name>
            <given-names>G.</given-names>
            <surname>Gombas</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          . Harrison // Concurrency and Computation:
          <source>Practice and Experience</source>
          . Vol.
          <volume>18</volume>
          (
          <issue>10</issue>
          ). P.
          <volume>1021</volume>
          -
          <fpage>1037</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <article-title>Integration and Combined Use of Distributed Computing Resources with Everest / S.</article-title>
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          <string-name>
            <surname>Sukhoroslov</surname>
          </string-name>
          , S. Volkov // Procedia Computer Science.
          <year>2016</year>
          . Vol.
          <volume>101</volume>
          . P.
          <volume>359</volume>
          -
          <fpage>368</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Boukhanovsky</surname>
            ,
            <given-names>A.V.</given-names>
          </string-name>
          <article-title>CLAVIRE: Perspective Technology for Second Generation Cloud Computing / A</article-title>
          .V.
          <string-name>
            <surname>Boukhanovsky</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          <string-name>
            <surname>Vasilev</surname>
            ,
            <given-names>V.N.</given-names>
          </string-name>
          <string-name>
            <surname>Vinogradov</surname>
            ,
            <given-names>D.Y.</given-names>
          </string-name>
          <string-name>
            <surname>Smirnov</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          <string-name>
            <surname>Sukhorukov</surname>
          </string-name>
          , T.G. Yapparov // Scientific and technical journal «Priborostroenie».
          <year>2011</year>
          . Vol.
          <volume>54</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Chen</surname>
          </string-name>
          , NC.
          <article-title>Geo-processing workflow driven wildfire hot pixel detection under sensor web environment</article-title>
          / N.C.
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>L.P.</given-names>
          </string-name>
          <string-name>
            <surname>Di</surname>
            ,
            <given-names>G.N.</given-names>
          </string-name>
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , J.Y. Gong // Computers &amp; geosciences.
          <source>2010</source>
          . Vol.
          <volume>36</volume>
          ,
          <issue>№</issue>
          : 3. P.
          <volume>362</volume>
          -
          <fpage>372</fpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>