A Social Network-based Framework for Data
                                    Services Selection in Modern Web Application
                                                         Design

                                           Devis Bianchini, Valeria De Antonellis, Michele Melchiori

                                              Dept. of Information Engineering University of Brescia
                                                      Via Branze, 38 - 25123 Brescia (Italy)
                                     {devis.bianchini|valeria.deantonellis|michele.melchiori}@unibs.it


                                      Abstract. In recent years the design of enterprise Web applications is
                                      more and more based on the integration of resources delivered through
                                      data services from outside the organization boundaries. Searching and
                                      composing existing data services offer many advantages, namely, the
                                      availability of widespread solutions in the form of services and data
                                      shared over the Web, and reduced development costs. In this scenario,
                                      new methods for speeding up the design process are emerging and, in
                                      particular, developers’ social networks have been established, where de-
                                      velopers follow other developers to learn from their choices in selecting
                                      suitable services. In this paper, we propose a framework to support data
                                      service selection for modern Web application design, by also consider-
                                      ing the developers’ social network. The network of social relationships,
                                      properly weighted with the developers’ credibility, is used to compute
                                      developers’ rank. This rank qualifies developers’ experience in selecting
                                      data services.


                                1    Introduction

                                Modern enterprise Web application design increasingly relies on the selection and
                                composition of external services, that provide access to resources from outside the
                                organization boundaries. Availability of data services, that meet these require-
                                ments, are becoming more and more important, as also witnessed by the growth
                                of public Web API repositories (e.g., ProgrammableWeb.com, Mashape.com). In
                                particular, the lightweight description of RESTful services (often referred to as
                                Web APIs), compared to the standard description of SOAP-based service capa-
                                bilities (e.g., WSDL), is the main reason of their increasing success.
                                    Starting from RESTful lightweight description, recommendation of the most
                                suitable services to be adopted within a Web application development process
                                combines several factors, beyond functional and non-functional requirements [1,
                                2]. Increasing research effort is being devoted to: (i) the application of collabora-
                                tive filtering techniques to recommend services based on users’ ratings [3, 4]; (ii)
                                the exploitation of API popularity (i.e., number of times a Web API has been


                                Copyright c by the paper’s authors. Copying permitted only for private and academic
                                purposes.
                                In: S. España, M. Ivanović, M. Savić (eds.): Proceedings of the CAiSE’16 Forum at
                                the 28th International Conference on Advanced Information Systems Engineering,
                                Ljubljana, Slovenia, 13-17.6.2016, published at http://ceur-ws.org


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
74      Devis Bianchini et al.

used) to weight API relevance [5]; (iii) the computation of API co-occurrence
into existing mashups to rank APIs for selection purposes [6]; (iv) the exploita-
tion of users’ social relationships to suggest RESTful components to the user
u considering other users who are similar to u in the social network [7]. These
efforts demonstrate how the choice among different alternatives might be in-
spired by the experiences of other developers in using them, such as developers’
ratings, comments, similar applications where APIs have been included. Indeed,
it is frequent that a developer searches for advices based on design experiences
of other known developers, rather than only relying on votes/choices of generic
users over the Web [8].
     In this paper, we describe an extension to WISeR [9], our Web API discovery
and selection framework for modern Web application design. This extension ex-
ploits the developers’ social network for improved service selection. We currently
rely on “follower-of” relationships of Mashape.com, that is the largest repository
where developers can follow each other for Web API discovery and selection pur-
poses. In particular, given a set of data services that are candidate for selection,
services are sorted based also on a ranking of developers, in the social network,
who used these services in the past to develop their own applications. The pro-
posed developers’ rank metric is computed by considering: (i) the topology of
social relationships between developers; (ii) the number of Web applications de-
veloped by each designer; (iii) the developers’ credibility, estimated starting from
their evaluations on services. Existing service recommendation techniques still
remain valid and can be integrated within the framework as well.
     The paper is organized as follows: Section 2 describes the social network of
developers, underlying the framework, and the developers’ credibility assessment;
Section 3 explains techniques behind developers’ ranking for improving data
service selection process and discusses a preliminary validation of the framework;
finally, Section 4 closes the paper.


2    Social network-based model for data service selection
To enable social network-based selection of data services, we extended an existing
framework, WISeR (Web apI Search and Ranking) [9], with the developers’
rank functionalities. WISeR is based on a multi-layered model, developed over
different perspectives:

 – a component perspective, focused on data services;
 – an application perspective, focused on aggregations of data services;
 – an experience perspective, focused on developers who used and voted data
   services to build their own aggregations.

Formally, data services and aggregations are defined as follows.

Definition 1. We define a data service s (hereafter, service) as an operation/me-
thod/query to access data of a web source, whose underlying data schema might
be unknown to those who use the service. Within the scope of this paper, we
                                Social Mining as a Knowledge Management Solution      75

model a service s as hns , {ts }i, where: ns is the service name; {ts } is a set of
tags. We denote with S the overall set of available services.

Definition 2. A service aggregation represents a set of services usable to deploy
a Web application. An aggregation g is modeled as a triple hng , S(g), di, where:
ng is the aggregation name; S(g) = {s1 , . . . , sn } is the set of data services used in
g; d∈D is the developer who designed the web application by composing services
in g. We denote with G the overall set of service aggregations, that is, g∈G, and
with G(s) the set of aggregations where s has been included.

According to this vision, best representatives of data services are resource-
oriented services (i.e., RESTful ones). For services that present a structured,
operation-oriented description (e.g., WSDL), we consider only data they work
on, represented through tags.
    Developers, who used services to design their applications, are organized in
a social network, defined as follows.

Definition 3. The social network of developers is a pair SN = hD, Ei, where:
(a) D is the set of developers; (b) E is a set of follower-of relationships between
                                           f                          f
developers, defined as E = {di −   →dj |di , dj ∈D}, where di −
                                                              →dj indicates that di
explicitly declares to be inclined to learn from the choices made in the past by dj
for web application design purposes.

    Each developer di ∈D is modeled as hG(di ), D∗ i, where G(di )⊆G is the set of
aggregations designed by di in the past, D∗ ⊆D is the set of other developers,
whom di declares to be inclined to learn from, in order to design web applications,
                      f
that is, D∗ = {dk |di −
                      →dk ∈E}. The organization of the follower-of relationships
determines the network structure as extracted from Mashape repository. The
developers’ social network can be represented as one or more directed graphs,
as shown in Figure 1.


                 dev2            dev5
                                                        dev9          dev10   dev11


                                          dev8
          dev1          dev4
                                                        dev12         dev13   dev14


                        dev3       dev6          dev7


                          (a)                                   (b)


Fig. 1. Sample social networks of developers, that present a hierarchical (a) and a
peer-based structure (b).
76      Devis Bianchini et al.

2.1   Developers’ credibility
In our approach, a developer may assign votes to services used in the applica-
tions. In particular, since developers exchange their experiences in using services,
votes become an enabling feature to this purpose. Following this vision, for ex-
ample, all the most popular Web API repositories include a rating system. Our
approach introduces an important distinction for service rating, compared to
existing systems, because it takes into account the aggregation in which services
have been evaluated (aggregation-contextual rating), according to the following
definition.
Definition 4. Given a service sj ∈S, we denote with v(sj , gk , di )∈[0, 1] the vote
assigned to the service sj by a developer di ∈D with reference to the aggregation
gk ∈G in which sj has been used.
Aggregation-contextual rating helps in properly weighting votes assigned to ser-
vices. When a developer is looking for the average of votes assigned to a service,
relevant votes to be considered are those that have been assigned with reference
to aggregations that are similar to the one that is being developed, according
to the aggregation similarity, AggSim(), introduced in [9]. Moreover, we include
credibility evaluation techniques in the approach, inspired by the ones defined
in [10], with respect to which we introduced the notion of aggregation-contextual
rating. The basic idea is that, if the reported vote does not agree with the ma-
jority opinion, the developer’s credibility is decreased, otherwise it is increased.
Suppose the developer di assigned some votes to the service sj with reference to
the aggregations g1 , g2 ,· · · , gt , respectively. For each gm in these aggregations, we
consider the set Agm of aggregations go ∈ G that have similarity AggSim(go , gm )
above a given threshold, set by di . The majority opinion on sj is hence repre-
sented by the most densely populated cluster, whose centroid is considered as
the majority rating:

                           M (sj ) = centroid(maxki=1 (Ci ))                          (1)
where Ci is the i-th cluster, k = |{Ci )}| is the total number of clusters, max()
returns the cluster with the largest membership and centroid()l computes  m the
                                                                    p
centroid of the cluster. The number k of desired clusters is set to   N/2 where
N is the number of considered votes and dxe is the smallest integer not less than
x. Therefore, considering a developer di , having a credibility cn (di ) after he/she
already assigned n votes, and given a new vote v(sj , gm , di ) assigned by di to a
service sj when used within an aggregation gm , the new credibility value for the
developer is computed as follows:

                           cn (di )·n + (1 − |M (sj ) − v(sj , gm , di )|)
            cn+1 (di ) =                                                   ∈[0, 1]    (2)
                                              n+1
According to Equation (2), if the vote v(sj , gm , di )∈[0, 1] differs from the cen-
troid M (sj )∈[0, 1], then term 1 − |M (sj ) − v(sj , gm , di )| tends to zero, therefore
cn+1 (di ) < cn (di ) (the decrement is controlled by denominator n + 1, to avoid
                           Social Mining as a Knowledge Management Solution                77

the case in which a designer looses too quickly his/her credibility for few as-
signed votes that are not aligned with the majority opinion). Viceversa, if the
vote v(sj , gm , di ) is close to M (sj ), then term 1 − |M (sj ) − v(sj , gm , di )| tends to
1 and cn+1 (di ) > cn (di ) until cn+1 (di ) reaches 1 (max credibility). Initial values
c0 (di ) are set to 0.5. Note that credibility of a developer with a high value of n
and who is assigning a vote different from the ones expressed by the majority,
is reduced of a low amount. In fact, this type of vote is not necessarily describ-
ing an incoherent behavior of the developer and could be the result of a recent
change in the service conditions or quality perceived by the voter. The rationale
for clustering votes can be explained with the help if an example: if a service
receives votes 1,1,1,2,9,9,8, and we adopt an average-based model, we obtain an
overall rating of 4.4. Actually, this rating is not representative of the depicted
situation, where the majority of voters gives a low vote.


3    Ranking of developers

Let’s consider a developer dr , who is searching for a service sr . Consider two
candidate services s1 and s2 , used by two developers d1 and d2 , respectively, in
aggregations that are similar to the one that is being developed. If s1 and s2 are
equally relevant for the developer dr according to the overall similarity function
Sim(s1 , s2 ) (defined in [9]), s1 will be ranked better than s2 if the experience of
d1 , who used s1 , is ranked better than the experience of d2 , who used s2 . The
point here is at ranking the experience of developers d1 and d2 .
     Rank of a developer di ∈D is computed as the product of two different rank-
ings, according to the following formula:
                                         r
                           dr(di ) = ρdrel (di )·ρabs (di ) ∈ [0, 1]                      (3)
where: (a) a relative ranking ρdrel (di )∈[0, 1] ranks developer di based on the
follower-of relationships between di and dr (this rank is introduced to take into
account the viewpoint of dr , who explicitly declared to learn from other devel-
opers to select the right service); (b) an absolute ranking ρabs (di ) is based on
the overall network of developers, to take into account the centrality of di in the
network independently of the developer dr , who issued the request.
                                             r
    In particular, the relative ranking ρdrel (di ) is inversely proportional to the
distance `(dr , di ) between dr and di , in terms of follower-of relationships, that
is:

                                 r                1
                               ρdrel (di ) =               ∈ [0, 1]                       (4)
                                               `(dr , di )
If there is no a path from dr to di , `(dr , di ) is set to the length of the longest path
of follower-of relationships that relate dr to the other developers, incremented
by 1, to denote that di is far from dr more than all the developers within the
dr sub-network. Consider for example the network topology shown in Figure 1,
where the developer dev3 is the requester and has to choose among services
78      Devis Bianchini et al.

that have been used in the past by the developers dev4, dev5, dev6, dev8 and
dev11, whose follower-of relationships are depicted in the figure. In the exam-
ple, `(dev3,dev4)=`(dev3,dev8)=1, `(dev3,dev5)=2, and `(dev3,dev6)=`(dev3,
dev11)=2 + 1=3.
    The absolute ranking ρabs (di )∈[0, 1] is evaluated no matter is the viewpoint
of the requester dr . This ranking is composed of two different parts. The first one
depends on the number of aggregations designed by di , the second one depends
on the topology of the network of other developers who declared their interest
for di past experiences, that is:
                                                    n
                              1−α                  X   c(dj )·ρabs (dj )
                 ρabs (di ) =       ·|G(di )| + α·                               (5)
                               |D|                 j=1
                                                           F (dj )

This expression is an adaptation of the PageRank metrics to the context we
are considering. The value ρabs (di ) represents the probability that a developer
will consider the example
                     P given by di in using a service for designing a Web ap-
plication. Therefore, i ρabs (di ) = 1. Initially, all developers are assigned with
the same probability, that is, ρabs (di ) = 1/|D|. Furthermore, at each iteration
of the absolute ranking computation, the absolute rank of a developer dj , such
        f
that dj − →di , is ”transferred” to di according to the following criteria: (i) if dj
follows more developers, his/her rank is distributed over all these developers,
properly weighted considering the credibility c(dj ) of dj (see the second term in
Equation (5), where F (dj ) is the number of developers followed by dj ); (ii) a con-
tribution to ρabs (di ) is given by the experience of di and is therefore proportional
to the number of aggregations designed by di (see the first term in Equation (5)).
A damping factor α∈[0, 1] is used to balance contributions explained in (i) and
in
P (ii). At each step, a normalization procedure is applied in order to ensure that
   i ρabs (di ) = 1.
     The computation algorithm used for Equation (5) is similar to the one ap-
plied for PageRank. In particular, denoting with ρabs (di , τN ) the N-th itera-
tion in computing ρabs (di ), with DR(τN ) the column vector whose elements are
ρabs (di , τN ), we have:
                                                   
                                           |G(d1 )|
                                  1−α    |G(d2 )| 
                                                    
                  DR(τN +1 ) =         ·  .  + α·M·DR(τN )                       (6)
                                   |D|  .. 
                                        |G(dn )|
where M denotes the adjacency matrix properly modified to consider credibility,
                c(d )         f
that is, Mij = F (djj ) if dj −
                              →di , zero otherwise. As demonstrated in PageRank,
computation formulated in Equation (6) reaches a high degree of accuracy within
only a few iterations.

Framework validation. Since there are no benchmarks to compare our ap-
proach with similar efforts, we built a dataset to perform a validation on the
framework. We used wrappers to extract from Mashape repository service de-
scriptions and the developers who follow/consume those services, including the
                        Social Mining as a Knowledge Management Solution           79

network of their follower-of relationships. We completed the dataset construc-
tion by adding the number of developed aggregations and developers’ credibility
values in order to obtain the following classes of developers: (i) developers with
a high number of followers, who in turn have several followers as well, with high
credibility (0.7≤c(di )≤1.0) and several designed aggregations (|G(di )|≥3), like
dev5 in Figure 1; (ii) developers who present few followers, medium credibil-
ity (0.4≤c(di )<0.7) and several designed aggregations (|G(di )|≥3); (iii) devel-
opers who present few followers, medium credibility and few designed aggrega-
tions (|G(di )| < 3); (iv) developers who present few followers, low credibility
(0≤c(di )<0.4) and who designed few aggregations. Intuitively, the above men-
tioned classes are ordered according to an increasing rank of developers, where
developers in class (i) are top ranked. This rank will be considered as reference
for setup of the damping factor α and approach validation.
    We run the system randomly selecting a subset of developers as requesters,
we computed the Kendall tau distance k for each run with respect to the refer-
ence developers’ rank described above and we computed the average value for k.
We also repeated the same experiment considering two different configurations
of the approach, namely: (a) an optimistic configuration, where we considered
all developers having maximum credibility (i.e., c(di ) = 1.0, ∀i); (b) a config-
                                                                            r
uration biased on the requester, where only the relative ranking ρdrel (di ) has
been considered. We kept the value α = 0.6 for the damping factor. The results
of this validation are shown in Table 1. The average value of the Kendall tau
distance shows the accuracy of our ranking solution compared to an intuitive
sorting of developers described above considering the four classes of developers
(i)-(iv). Validation results also demonstrate the impact of considering developers’
credibility and the topology of social relationships between developers through
the computation of the absolute ranking ρabs (di ). Indeed, if we do not consider
different levels of credibility (optimistic case), the quality of ranking slightly de-
creases. However, decrement of ranking quality is even more evident if we do not
consider the absolute ranking (biased case), thus demonstrating that the relative
viewpoint of the requester is not able to correctly identify the centrality of other
developers in the social network.


                           Credibility    Dumping factor Kendall tau
                              c(di )           α        distance k∈[0, 1]
       Biased case             -               α = 0.6            0.6026
       Optimistic case c(di ) = 1, ∀di         α = 0.6            0.1795
       Our approach c(di )∈[0, 1], ∀di         α = 0.6            0.0360

   Table 1. Average Kendal tau distances computed for the approach validation
80      Devis Bianchini et al.

4    Conclusions
In this paper, we discussed how developers’ social relationships, as well as their
credibility, can be properly exploited to support data service selection. In par-
ticular, we proposed a framework for ranking developers by considering both a
relative and an absolute perspective. The framework interacts with the Mashape
repository, the largest service repository where developers can follow each other
in a social network. Other developers’ social networks, such as GitHub, are not
specifically meant for data service selection, while other highly populated Web
API repositories (e.g., ProgrammableWeb) do not present social relationships
between developers. Further studies are on going for extending the social net-
work model: specifically, other aspects such as the maturity of the use of data
services (estimated through their publishing data and the number and quality
of aggregations including the services) and specificity of the searched services
(i.e., general purpose or domain-specific) may be investigated with respect to a
possible influence in the search and ranking process.

References
 1. W. Xu, J. Cao, L. Hu, J. Wang, M. Li, A social-aware service recommendation
    approach for mashup creation, in: IEEE International Conference on Web Services,
    2013.
 2. L. Yao, S. Zheng, A. Segev, J. Yu, Recommending web services via combining col-
    laborative filtering with content-based features, in: IEEE International Conference
    on Web Services, 2013.
 3. B. Cao, M. Tang, X. Huang, Cscf: A mashup service recommendation approach
    based on content similarity and collaborative filtering, International Journal of
    Grid and Distributed Computing 7 (2) (2014) 163–172.
 4. R. Balakrishnan, S. Kambhampati, J. Manishkumar, Assessing Relevance and
    Trust of the Deep Web Sources and Results Based on Inter-Source Agreement,
    ACM Transactions on the Web 7 (2) (2013) 32 pages.
 5. C. Li, R. Z. Z. Huai, H. Sun, A novel approach for api recommendation in mashup
    development, in: Proc. of Int. Conference on Web Services (ICWS), 2014, pp. 289–
    296.
 6. B. Cao, J. Liu, M. Tang, Z. Zheng, G. Wang, Mashup Service Recommendation
    based on User Interest and Social Network, in: Proc. of Int. Conference on Web
    Services (ICWS), 2013.
 7. A. Maaradji, H. Hacid, R. Skraba, A. Lateef, J. Daigremont, N. Crespi, Social-
    based Web Services Discovery and Composition for Step-by-Step Mashup Com-
    pletion, in: Proc. of Int. Conference on Web Services (ICWS), 2011.
 8. A. Fuxman, P. Giorgini, M. Kolp, J. Mylopoulos, Information Systems as Social
    Structures, Formal Ontology in Information Systems (2001) 12–21.
 9. D. Bianchini, V. De Antonellis, M. Melchiori, A Multi-perspective Framework for
    Web API Search in Enterprise Mashup Design (Best Paper), in: Proc. of 25th Int.
    Conference on Advanced Information Systems Engineering (CAiSE), Vol. LNCS
    7908, 2013, pp. 353–368.
10. Z. Malik, A. Bouguettaya, RATEWeb: Reputation Assessment for Trust Estab-
    lishment among Web Services, VLBD Journal 18 (2009) 885–911.