=Paper=
{{Paper
|id=Vol-2037/paper35
|storemode=property
|title=Collective Intelligence Support for Data Service Exploration and Retrieval
|pdfUrl=https://ceur-ws.org/Vol-2037/paper_35.pdf
|volume=Vol-2037
|authors=Devis Bianchini,Valeria De Antonellis,Michele Melchiori
|dblpUrl=https://dblp.org/rec/conf/sebd/BianchiniAM17
}}
==Collective Intelligence Support for Data Service Exploration and Retrieval==
<pdf width="1500px">https://ceur-ws.org/Vol-2037/paper_35.pdf</pdf>
<pre>
    Collective Intelligence support for data service
               exploration and retrieval
                   (discussion paper)

           Devis Bianchini, Valeria De Antonellis, Michele Melchiori

              Dept. of Information Engineering University of Brescia
                      Via Branze, 38 - 25123 Brescia (Italy)
     {devis.bianchini|valeria.deantonellis|michele.melchiori}@unibs.it


      Abstract. Agile design of data intensive web applications can rely on
      existing and well-tested third parties data services as components pro-
      viding access methods to valuable data sources. Producers of real data
      services are more and more publishing their services in repositories ac-
      cording to light semantic profiles with simple tag-based descriptions. In
      this paper, we discuss techniques for data service exploration and re-
      trieval that consider service co-usage in existing applications, patterns
      of tags co-usage, and ratings declared by developers who used a data
      service in their own development experiences.

      Keywords: data service, exploratory search, collective intelligence, web-
      oriented architecture


1    Introduction
Nowadays, we have been observing a growth of research efforts for explorative
techniques and methods to deal with the increasing volume and heterogeneity
of information made available over the Web [1]. Building web applications, that
integrate information available over the web, increasingly requires frameworks to
support the discovery of available services providing access to web data sources.
    In literature, service recommendations approaches have been proposed to
consider service co-occurrence in existing applications as an implicit measure
of relatedness of services [2], or suggest services chosen by similar users in a
social network based on their usage information [3]. Considering this scenario,
in this paper we discuss techniques for data service exploration and retrieval
based on selecting and organizing suitable collective intelligence made available
in developers communities. In particular, we consider service co-usage in existing
applications, patterns of tags co-usage, and ratings declared by developers who
used a data service in their own development experiences. Our contribution with
respect to existing service recommendation approaches, including our previous
work in [4], is that here we adopt an explorative viewpoint, enabling developers to
iteratively discover services of interest and progressively increase their knowledge
on available services.
    The paper is organized as follows. In Section 2 provides some comparison
with related work to underline the specific features of our approach. Section 3
      Experience Framework       Data Service Framework
                                      Terminological (tag) perspective


                     creates
                                      Aggregation perspective
       developer’s
       credibility


                                      Service perspective
                       aggregation-
                        contextual
                          rating


            Fig. 1. Overview of the multi-perspective data service model.
describes a multi-perspective data service model. In Section 4 we describe the
data service exploration process. Section 5 discusses some preliminary validation.
Finally, Section 6 closes the paper with some final remarks.

2   Related work
There are several approaches related to our work. In particular, we mention
recent Web service recommendation approaches, that rely on lightweight de-
scriptions of services, such as the ones featuring public repositories (e.g., Pro-
grammableWeb or Mashape.com): categories, tags or semantic tags, with the
application of advanced IR techniques to enhance topic-based service recom-
mendation [5], natural language API description [6], the number of times a
service has been used in the past and the co-occurrence of services in exist-
ing applications [2], latent factors (e.g., related to the perceived QoS) that affect
users to make service selection, identified mainly using matrix factorization tech-
niques [7]. In this context, approaches like [8] leverage factors to estimate past
experiences of data service usage are considered, such as votes/ratings assigned
by users to services. These approaches overcome the complexity of traditional
state-of-the-art approaches on service discovery (see [9] for a recent survey), that
are hampered by the availability of complex, structured service descriptions (e.g.,
WSDL, WADL and semantic web service formalisms [10]). Compared to these
contributions, our aim is to define an explorative approach, exploiting specific
collective intelligence, for service recommendation.
    Our proposal supports a conversation between the system, used to search for
services, and the developer, who is designing a new web application through the
selection and aggregation of services.

3   Multi-perspective data service model
We model data services and organize collective intelligence on them by two
interconnected frameworks, namely a Data Service Framework and an Experi-
ence Framework, that are illustrated in Figure 1 and separately detailed in the
next sections. Each framework focuses on specific elements further enriched with
cross-framework relationships. Elements considered in such a multi-perspective
data service model are the ones included in lightweight descriptions available
within most popular repositories (e.g., Mashape.com, ProgrammableWeb.com).
        Service Service name Technical features              Tags
        s1      HotWire      FsDataF ormat = {XML,JSON}      {City, Star, Hotel, Travel}
                                1
                             FsP rotocol = {RSS, Atom, REST}
                                1
        s2      EasyToBook   FsDataF ormat = {XML}           {City, Hotel, Travel}
                                2
                               P rotocol
                             Fs          = {SOAP}
                                2
        s3      MyAgentDeals FsDataF ormat = {XML,JSON}      {City, Star, Near, Hotel,
                                4
                               P rotocol
                             Fs          = {HTTP}            Travel}
                                  4


             Fig. 2. Data service descriptions used in the running example.

3.1     Data Service Framework

The Data Service Framework is organized according to three perspectives as
shown in Figure 1. Each perspective considers specific elements, namely ser-
vices, aggregations and terms used to describe them, further described with
proper features and relationships between elements.

Service Perspective. This perspective focuses on data services, according to the
following definition.

Definition 1. We define a data service s (hereafter, service) as an operation/me-
thod/query to access data of a web source, whose underlying data schema might
be unknown to those who use the service. Within the scope of this chapter, we
model a service s as hns , Fs , {ts }i, where: ns is the service name; Fs is an array
of elements, where each element FsX represents the technical feature X (e.g.,
protocols, data formats, authentication mechanisms) and is modeled as a set of
allowed values for that feature (e.g., XML or JSON among data formats); {ts }
is a set of tags. We denote with S the overall set of available services.

A tag in Ts may be: (a) a category, taken from a top-down classification imposed
within the repository where the data service is stored and advertised1 ; (b) a user
tag, that is, a term assigned by developers, aimed at classifying the data service
in a folksonomy-like style; (c) a keyword, that is, a recurrent term extracted
from the service name and textual description using common IR techniques.
In Figure 2 data services taken from ProgrammableWeb.com for the running ex-
ample are listed.

Aggregation Perspective. Concerning modern application development, to imple-
ment a web application starting from available data services, developer has to
explore the set of available services, select the most suitable ones, integrate and
compose them, in order to deploy the final application. Within the scope of this
paper, we focus on the first step, i.e., service exploration for selection purposes,
and we talk about service aggregations, instead of web applications that are the
final product of the development process. We model aggregations according to
the following definition.

Definition 2. A service aggregation represents a set of services that can be
composed to deploy a Web application. An aggregation g is modeled as a triple
hng , S(g), di, where: ng is the aggregation name; S(g) = {s1 , . . . , sn } is the set of
data services used in g; d∈D is the developer who designed the web application by
1
    See, for instance, the list of ProgrammableWeb.com categories at
    http://www.programmableweb.com/category-api.
                                      v1                                intra-service
                      …                    City, Hotel,                 co-occurrence degree cooc v1
                                              Travel
                                                (3)
                                                           s1, s2, s3         v3
              v2
                       City, Star,                                                City, Near,
                      Hotel, Travel                                              Hotel, Travel
                           (2)               2                                        (1)
             s1, s3                                                                              s3
                                                             1
                                                                           intra-aggregation
                               1
                                                  1                        co-occurrence degree cooc e15

                                                                  City, Near,
                       v4                             v5            Museum,
                            Cuisine, City,                       Entertainment
                             Restaurant,                              (1)
                                                                                   s5
                               Tourism
                                  (1)
                                             s4


Fig. 3. A portion of the term graph used in the running example; for each node, data
services annotated with the node terms are also specified.

composing services in g. We denote with G the overall set of service aggregations,
that is, g∈G, and with G(s) the set of aggregations where s has been included.

Fictious examples of aggregations are listed in the following.
    g1 ⇒ hTravelPlan, Sg1 = {s1 , s3 }, dg1 i
    g2 ⇒ hStay&Fun, Sg2 = {s2 , s3 }, dg2 i

Terminological Perspective. The Terminological Perspective collects and organ-
ises for supporting service exploration the terminological items used to describe
data services. The aim is to provide a term graph, described as shown in this
section, to start from in order to support data service exploration (see Section 4).
We will explain the structure of the graph with the help of the example shown in
Figure 3. Formally, the graph is represented as hV, Ei, where V is the set of nodes
and E is the set of edges. In particular, each node vi ∈V is formally described
as vi = hTvi , coocvi i, where Tvi is a set of tags jointly used to describe a num-
ber coocvi of data services (intra-service term co-occurrence degree). Each edge
eij ∈E⊆V×V×N is formally described as eij = hvi , vj , cooceij i, where vi and vj are
nodes (with corresponding sets of tags Tvi and Tvj , respectively), such that tags
in Tvi ∪Tvj have been jointly used within a number cooceij of aggregations (intra-
aggregation term co-occurrence degree). For example in Figure 3, tags in {City,
Hotel, Travel} have been used to describe three data services (namely, s1 , s2
and s3 ). The same tags, together with {Cuisine,City,Restaurant,Tourism},
are associated with two aggregations (namely, TravelPlan and Stay&Fun). Tags
aim at grouping services that are close from the data viewpoint.
    With reference to same figure, Tv1 and Tv5 can be used to suggest developers
to aggregate services s1 , s2 and s3 with s5 . The framework suggests firstly sets
corresponding to an existing aggregation already deployed and tested (let us
suppose, {s1 , s5 }). Other solutions are suggested as well although they do not
correspond to existing aggregations (for example {s2 , s5 } and {s3 , s5 }). There-
fore, the intra-aggregation term co-occurrence enables developers to explore ser-
vices that have not been aggregated yet, but can be considered for aggregation
because tagged with tags forming patterns used in some existing aggregation.
This enables a greater coverage of proposed solutions, at the cost of a lower
precision, that however can be acceptable in an explorative process. The term
graph can be built and maintained in a fully automatic way.
3.2    Experience Framework

We integrate the Data Service Framework with methods and techniques designed
to exploit the experience of developers in selecting data services, thus enabling
their ranked recommendation. Such a framework, that in this paper we refer to
as Experience Framework (EF), has been investigated in our previous work [4,
11]. For the sake of completeness, we report here only few details that are useful
to understand the rest of this paper.
    The Experience Framework is focused on the set D of developers. Given a
data service s∈S, we denote with µ(s, g, d)∈[0, 1] the vote assigned to s by a
developer d∈D with reference to the aggregation g∈G in which s has been used
(aggregation-contextual rating). Votes are assigned according to the NHLBI 9-
point Scoring System2 . Furthermore, in [11] we included credibility assessment
techniques, inspired by the ones defined in [8], with respect to which we intro-
duced the notion of aggregation-contextual rating. In the Experience Framework,
a developer is defined according to the following definition.

Definition 3. A developer represents an actor that is in charge of exploring
services and using them to design new aggregations. A developer might also
assign aggregation-contextual votes to services. A developer d is modeled as
hnd , c(d)i, where: (i) nd is the developer nickname in the considered repository;
(ii) c(d)∈[0, 1] is the estimated developer’s credibility.


4     Data service exploration

We envision the service exploration process as a sequence of exploration steps
between the developer and the system, used to search for services. The devel-
oper starts the exploration by specifying: (i) the set T r of terms used within the
search request, that provide some initial hints about developer’s interests; (ii)
the set F r of required technical features, for further refining requester’s search
constraints. Sets T r and F r compose the service request R, that is completed
with the set g r of services, representing the current composition of the aggre-
gation that is being designed. The framework is equipped with proper wizards
(described in [4]), that guide the developer in formulating the request. The sys-
tem suggests services by computing similarity, filtering and ranking techniques
such as the ones introduced in [11] and summarised in the following.
Service similarity evaluation and ranking. A set of similarity metrics
have been designed to compare each service description s∈S extracted from
ProgrammableWeb.com and the corresponding elements of the request R. The
rationale behind these metrics is that the more tags set Ts of s is similar to
T r , the more technical features Fs of s are similar to F r and the more aggre-
gations where s has been used are similar to the aggregation g r that is being
designed, the more description of service s∈S fits the request R. The building
2
    http://www.nhlbi.nih.gov/funding/policies/nine point scoring system and
    program project review.htm.
blocks are the term similarity (T ermSim()∈[0, 1]), the technical feature similar-
ity (T echSim()∈[0, 1]) and the aggregation similarity (AggSim()∈[0, 1]) metrics
that are combined into a overall similarity Sim(R, s)∈[0, 1]. We denote the set
S e ⊆S of search results such that S e = {si ∈S|Sim(R, si )≥γ}, where γ∈[0, 1] is
a threshold set by the developer.
     A ranking function over the set of services in S e , denoted with ρ : S e 7→ [0, 1],
is defined as follows:
                         1 X
                ρ(si ) =      [µ(si , gk , di ) · c(di ) · AggSim(g r , gk )]        (1)
                         N
                             N

where N votes have been assigned to si , each vote µ(si , gk , di ) is weighted with
the credibility c(di )∈[0, 1] of di ∈D as computed in [11] and with the aggrega-
tion similarity AggSim(·) of gk with respect to the aggregation g r that is being
developed. The rationale behind Equation (1) is that a service si ∈S is ranked
better if it received better votes by more credible designers in the context of
more similar aggregations. The system reacts to developer’s actions by support-
ing exploration according to the following three modalities.
Exploration by simple search. The system also looks for nodes vi ∈V such
that T r ⊆Tvi . If multiple nodes are found, for each vi ∈V the system will suggest
to the developer additional terms to be included within the set T r considering
the set Tvi \T r . A suggestion is given for each vi ∈V, ranked in decreasing order
with respect to the coocvi value. The developer can explore these suggestions in
order to consider services alternative to S e and to formulate a different request.
For instance, with reference to Figure 3, if T r = {City, Hotel, Travel}, the
system might also suggests as additional terminological item the term {Star}
first (coocv2 = 2), and {Near} as second option (coocv3 = 1). In this way, the
developer might realize that hotels can be searched either based on the number
of stars or based on the proximity to a given location and he/she might refine
the request by choosing one of the two options.
Exploration by proactive completion. The developer selects a subset S e ⊆S e
of services he/she is interested in. The system suggests services that could be
used together with services in g r , by updating the set S e , according to the
intra-aggregation co-occurrence. Let’s consider the example shown in Figure 4.
After performing a search based on T r = {City, Hotel, Travel}, thus obtain-
ing S e = {s1 , s2 , s3 } as results, the developer chooses s1 to be included in g r .
With reference to Figure 3, s1 is associated with v1 and v2 nodes. Considering
node v1 , other nodes connected to v1 by graph edges are v4 (associated with
s4 , cooce14 = 2) and v5 (associated with s5 , cooce15 = 1). Similarly, considering
node v2 , cooce24 = 1 and cooce25 = 1. Therefore, the system ranks better the
service s4 than s5 , since cooce14 + cooce24 > cooce15 + cooce25 . The developer can
accept one of these results. If more than one service is included in g r , the step
of retrieving services is repeated for each service in g r .

Exploration by hybrid completion. This explorative modality is a combina-
tion of proactive completion and simple search. After S e has been updated, the
developer selects a subset S e ⊆S e of services he/she is interested in, as well as
                                   Te         Se     Ge
                              {City, Hotel,   s1     s1
                                Travel}       s2
                                              s3
                                      w              w
                                                    

               Te       Se       Ge                      Te       Se      Ge
               {}        s4      s1                     {}        s5       s1
             (cooce14 + cooce24 = 3)               (cooce14 + cooce24 = 2)


Fig. 4. Example of exploration by completion (see also Figure 3 for the intra-
aggregation co-occurrences).

he/she specifies a new set T r of terms. The system suggests services that could
be used together with services in g r , by updating the set S e . In order to obtain
this set, a proactive completion step on g r retrieves some services as explained
before.


5   Preliminary validation of the framework

The exploration process described in Section 4 calls for a quantitative evaluation
of the scalability of the exploration activities and the execution of experiments
with developers, to test the effectiveness of the approach in supporting data ser-
vice exploration. In this section, we present preliminary experiments on scalabil-
ity, performed on a dataset of 1317 services extracted from ProgrammableWeb.
    We initially considered a set of service pairs hs1 , s2 i in the dataset and we
manually compared them according to tags, technical feature and aggregation
similarity, considering aggregations where they have been used. We run the sim-
ilarity evaluator to compute Sim(s1 , s2 ) by varying the threshold γ from 0.0 to
1.0 and we chose the value of γ that maximized the F-measure.
    Then, experiments have been performed ten times using different requests.
To this purpose we randomly retrieved aggregations from the repository and we
considered as relevant the services included in the aggregations. We then issued
the requests using the features of the services in the selected aggregations and
we calculated the precision and recall of search results given by our system. The
aim of these preliminary experiments is to confirm the advantages for service
search brought by our approach, that considers the elements from the multi-
ple perspectives described in the model. For these reasons, we compared our
approach against: (a) the keyword-based search facilities made available within
the ProgrammableWeb repository; (b) a partial implementation of our system,
where we excluded the aggregation similarity from Sim(R, s) computation. Ta-
ble 1 shows the precision, recall and F-measure results, the standard deviation
and the variance of F-measure for the compared systems. As expected, the com-
plete implementation of the system presents the best F-measure value. Although
the exploitation of term and technical feature similarity bring significant im-
provements compared to the basic searching facilities of the ProgrammableWeb
repository, it is quite evident as the highest enhancement in F-measure value is
due to the integration also of aggregation similarity, that mainly relies on service
co-occurrence.
        System              F-measure Precision Recall Std deviation Variance
        ProgrammableWeb       0,0414    0,03105   0,0621   0,0363      0,0013
        WISeR (no AggSim)     0,4247    0,5880    0,3324   0,3562      0,1269
        WISeR                 0,5993    0,7451    0,5012   0,2159      0,0466

                  Table 1. Results of the experimental evaluation.

6    Concluding remarks
In this paper, we proposed an approach for data service explorative search, based
on a multi-perspective model for data services and specific collective intelligence
on them. The approach includes proactive search facilities and enables developers
to iteratively increase their knowledge on available web data services. Future
work will be devoted to the study of techniques for including latent factors
(e.g., related to the perceived QoS) in the exploration process. Further open
research includes the definition of the visualization interface to further increase
the exploration experience of developers.


References
 1. S. Idreos, O. Papaemmanouil, S. Chaudhuri, Overview of data exploration tech-
    niques, in: ACM Conference on Management of Data (SIGMOD), 2015.
 2. W. Gao, L. Chen, J. Wu, A. Bouguettaya, Joint Modeling Users, Services, Mashups
    and Topics for Service Recommendation, in: Proc. of 23rd International Conference
    on Web Services (ICWS 2016), 2016.
 3. R. Hu, J. Liu, Y. Wen, Y. Mao, USER: A usage-based service recommendation
    approach, in: Proc. of 23rd International Conference on Web Services (ICWS 2016),
    2016.
 4. D. Bianchini, V. De Antonellis, M. Melchiori, A Multi-perspective Framework for
    Web API Search in Enterprise Mashup Design (Best Paper), in: Proc. of 25th Int.
    Conference on Advanced Information Systems Engineering (CAiSE), Vol. LNCS
    7908, 2013, pp. 353–368.
 5. B. Cao, X. Liu, B. Li, J. Liu, M. Tang, T. Zhang, Mashup Service Clustering Based
    on an Integration of Service Content and Network via Exploiting a Two-level Topic
    Model, in: Proc. of 23rd International Conference on Web Services (ICWS 2016),
    2016.
 6. W. Xiong, Z. Wu, B. Li, Q. Gu, L. Yuan, B. Hang, Inferring service recommenda-
    tion from natural language api description, in: Proc. of 23rd International Confer-
    ence on Web Services (ICWS 2016), 2016.
 7. X. Liu, I. Fulia, Incorporating User, Topic, and Service Related Latent Factors
    into Web Service Recommendation, in: Proc. of IEEE International Conference on
    Web Services (ICWS 2015), 2015, pp. 185–192.
 8. Z. Malik, A. Bouguettaya, RATEWeb: Reputation Assessment for Trust Estab-
    lishment among Web Services, VLBD Journal 18 (2009) 885–911.
 9. S. Pakari, E. Kheirkhah, M. Jalali, Web Service Discovery Methods and Tech-
    niques: A Review, Int. Journal of Computer Science, Engineering and Information
    Technology 4 (1) (2014) 1–14.
10. H. Wang, N. Gibbins, T. Payne, A. Patelli, Y. Wang, A survey of Semantic Web
    Services formalisms, Concurrency and Computation Practice and Experience.
11. D. Bianchini, V. De Antonellis, M. Melchiori, Capitalizing the Designers’ Experi-
    ence for Improving Web API Selection, in: On the Move to Meaningful Internet
    Systems: OTM 2014 Conferences, Vol. LNCS 8841, 2014, pp. 364–381.

</pre>