A Social Network-based Framework for Data Services Selection in Modern Web Application Design Devis Bianchini, Valeria De Antonellis, Michele Melchiori Dept. of Information Engineering University of Brescia Via Branze, 38 - 25123 Brescia (Italy) {devis.bianchini|valeria.deantonellis|michele.melchiori}@unibs.it Abstract. In recent years the design of enterprise Web applications is more and more based on the integration of resources delivered through data services from outside the organization boundaries. Searching and composing existing data services offer many advantages, namely, the availability of widespread solutions in the form of services and data shared over the Web, and reduced development costs. In this scenario, new methods for speeding up the design process are emerging and, in particular, developers’ social networks have been established, where de- velopers follow other developers to learn from their choices in selecting suitable services. In this paper, we propose a framework to support data service selection for modern Web application design, by also consider- ing the developers’ social network. The network of social relationships, properly weighted with the developers’ credibility, is used to compute developers’ rank. This rank qualifies developers’ experience in selecting data services. 1 Introduction Modern enterprise Web application design increasingly relies on the selection and composition of external services, that provide access to resources from outside the organization boundaries. Availability of data services, that meet these require- ments, are becoming more and more important, as also witnessed by the growth of public Web API repositories (e.g., ProgrammableWeb.com, Mashape.com). In particular, the lightweight description of RESTful services (often referred to as Web APIs), compared to the standard description of SOAP-based service capa- bilities (e.g., WSDL), is the main reason of their increasing success. Starting from RESTful lightweight description, recommendation of the most suitable services to be adopted within a Web application development process combines several factors, beyond functional and non-functional requirements [1, 2]. Increasing research effort is being devoted to: (i) the application of collabora- tive filtering techniques to recommend services based on users’ ratings [3, 4]; (ii) the exploitation of API popularity (i.e., number of times a Web API has been Copyright c by the paper’s authors. Copying permitted only for private and academic purposes. In: S. España, M. Ivanović, M. Savić (eds.): Proceedings of the CAiSE’16 Forum at the 28th International Conference on Advanced Information Systems Engineering, Ljubljana, Slovenia, 13-17.6.2016, published at http://ceur-ws.org CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings 74 Devis Bianchini et al. used) to weight API relevance [5]; (iii) the computation of API co-occurrence into existing mashups to rank APIs for selection purposes [6]; (iv) the exploita- tion of users’ social relationships to suggest RESTful components to the user u considering other users who are similar to u in the social network [7]. These efforts demonstrate how the choice among different alternatives might be in- spired by the experiences of other developers in using them, such as developers’ ratings, comments, similar applications where APIs have been included. Indeed, it is frequent that a developer searches for advices based on design experiences of other known developers, rather than only relying on votes/choices of generic users over the Web [8]. In this paper, we describe an extension to WISeR [9], our Web API discovery and selection framework for modern Web application design. This extension ex- ploits the developers’ social network for improved service selection. We currently rely on “follower-of” relationships of Mashape.com, that is the largest repository where developers can follow each other for Web API discovery and selection pur- poses. In particular, given a set of data services that are candidate for selection, services are sorted based also on a ranking of developers, in the social network, who used these services in the past to develop their own applications. The pro- posed developers’ rank metric is computed by considering: (i) the topology of social relationships between developers; (ii) the number of Web applications de- veloped by each designer; (iii) the developers’ credibility, estimated starting from their evaluations on services. Existing service recommendation techniques still remain valid and can be integrated within the framework as well. The paper is organized as follows: Section 2 describes the social network of developers, underlying the framework, and the developers’ credibility assessment; Section 3 explains techniques behind developers’ ranking for improving data service selection process and discusses a preliminary validation of the framework; finally, Section 4 closes the paper. 2 Social network-based model for data service selection To enable social network-based selection of data services, we extended an existing framework, WISeR (Web apI Search and Ranking) [9], with the developers’ rank functionalities. WISeR is based on a multi-layered model, developed over different perspectives: – a component perspective, focused on data services; – an application perspective, focused on aggregations of data services; – an experience perspective, focused on developers who used and voted data services to build their own aggregations. Formally, data services and aggregations are defined as follows. Definition 1. We define a data service s (hereafter, service) as an operation/me- thod/query to access data of a web source, whose underlying data schema might be unknown to those who use the service. Within the scope of this paper, we Social Mining as a Knowledge Management Solution 75 model a service s as hns , {ts }i, where: ns is the service name; {ts } is a set of tags. We denote with S the overall set of available services. Definition 2. A service aggregation represents a set of services usable to deploy a Web application. An aggregation g is modeled as a triple hng , S(g), di, where: ng is the aggregation name; S(g) = {s1 , . . . , sn } is the set of data services used in g; d∈D is the developer who designed the web application by composing services in g. We denote with G the overall set of service aggregations, that is, g∈G, and with G(s) the set of aggregations where s has been included. According to this vision, best representatives of data services are resource- oriented services (i.e., RESTful ones). For services that present a structured, operation-oriented description (e.g., WSDL), we consider only data they work on, represented through tags. Developers, who used services to design their applications, are organized in a social network, defined as follows. Definition 3. The social network of developers is a pair SN = hD, Ei, where: (a) D is the set of developers; (b) E is a set of follower-of relationships between f f developers, defined as E = {di − →dj |di , dj ∈D}, where di − →dj indicates that di explicitly declares to be inclined to learn from the choices made in the past by dj for web application design purposes. Each developer di ∈D is modeled as hG(di ), D∗ i, where G(di )⊆G is the set of aggregations designed by di in the past, D∗ ⊆D is the set of other developers, whom di declares to be inclined to learn from, in order to design web applications, f that is, D∗ = {dk |di − →dk ∈E}. The organization of the follower-of relationships determines the network structure as extracted from Mashape repository. The developers’ social network can be represented as one or more directed graphs, as shown in Figure 1. dev2 dev5 dev9 dev10 dev11 dev8 dev1 dev4 dev12 dev13 dev14 dev3 dev6 dev7 (a) (b) Fig. 1. Sample social networks of developers, that present a hierarchical (a) and a peer-based structure (b). 76 Devis Bianchini et al. 2.1 Developers’ credibility In our approach, a developer may assign votes to services used in the applica- tions. In particular, since developers exchange their experiences in using services, votes become an enabling feature to this purpose. Following this vision, for ex- ample, all the most popular Web API repositories include a rating system. Our approach introduces an important distinction for service rating, compared to existing systems, because it takes into account the aggregation in which services have been evaluated (aggregation-contextual rating), according to the following definition. Definition 4. Given a service sj ∈S, we denote with v(sj , gk , di )∈[0, 1] the vote assigned to the service sj by a developer di ∈D with reference to the aggregation gk ∈G in which sj has been used. Aggregation-contextual rating helps in properly weighting votes assigned to ser- vices. When a developer is looking for the average of votes assigned to a service, relevant votes to be considered are those that have been assigned with reference to aggregations that are similar to the one that is being developed, according to the aggregation similarity, AggSim(), introduced in [9]. Moreover, we include credibility evaluation techniques in the approach, inspired by the ones defined in [10], with respect to which we introduced the notion of aggregation-contextual rating. The basic idea is that, if the reported vote does not agree with the ma- jority opinion, the developer’s credibility is decreased, otherwise it is increased. Suppose the developer di assigned some votes to the service sj with reference to the aggregations g1 , g2 ,· · · , gt , respectively. For each gm in these aggregations, we consider the set Agm of aggregations go ∈ G that have similarity AggSim(go , gm ) above a given threshold, set by di . The majority opinion on sj is hence repre- sented by the most densely populated cluster, whose centroid is considered as the majority rating: M (sj ) = centroid(maxki=1 (Ci )) (1) where Ci is the i-th cluster, k = |{Ci )}| is the total number of clusters, max() returns the cluster with the largest membership and centroid()l computes m the p centroid of the cluster. The number k of desired clusters is set to N/2 where N is the number of considered votes and dxe is the smallest integer not less than x. Therefore, considering a developer di , having a credibility cn (di ) after he/she already assigned n votes, and given a new vote v(sj , gm , di ) assigned by di to a service sj when used within an aggregation gm , the new credibility value for the developer is computed as follows: cn (di )·n + (1 − |M (sj ) − v(sj , gm , di )|) cn+1 (di ) = ∈[0, 1] (2) n+1 According to Equation (2), if the vote v(sj , gm , di )∈[0, 1] differs from the cen- troid M (sj )∈[0, 1], then term 1 − |M (sj ) − v(sj , gm , di )| tends to zero, therefore cn+1 (di ) < cn (di ) (the decrement is controlled by denominator n + 1, to avoid Social Mining as a Knowledge Management Solution 77 the case in which a designer looses too quickly his/her credibility for few as- signed votes that are not aligned with the majority opinion). Viceversa, if the vote v(sj , gm , di ) is close to M (sj ), then term 1 − |M (sj ) − v(sj , gm , di )| tends to 1 and cn+1 (di ) > cn (di ) until cn+1 (di ) reaches 1 (max credibility). Initial values c0 (di ) are set to 0.5. Note that credibility of a developer with a high value of n and who is assigning a vote different from the ones expressed by the majority, is reduced of a low amount. In fact, this type of vote is not necessarily describ- ing an incoherent behavior of the developer and could be the result of a recent change in the service conditions or quality perceived by the voter. The rationale for clustering votes can be explained with the help if an example: if a service receives votes 1,1,1,2,9,9,8, and we adopt an average-based model, we obtain an overall rating of 4.4. Actually, this rating is not representative of the depicted situation, where the majority of voters gives a low vote. 3 Ranking of developers Let’s consider a developer dr , who is searching for a service sr . Consider two candidate services s1 and s2 , used by two developers d1 and d2 , respectively, in aggregations that are similar to the one that is being developed. If s1 and s2 are equally relevant for the developer dr according to the overall similarity function Sim(s1 , s2 ) (defined in [9]), s1 will be ranked better than s2 if the experience of d1 , who used s1 , is ranked better than the experience of d2 , who used s2 . The point here is at ranking the experience of developers d1 and d2 . Rank of a developer di ∈D is computed as the product of two different rank- ings, according to the following formula: r dr(di ) = ρdrel (di )·ρabs (di ) ∈ [0, 1] (3) where: (a) a relative ranking ρdrel (di )∈[0, 1] ranks developer di based on the follower-of relationships between di and dr (this rank is introduced to take into account the viewpoint of dr , who explicitly declared to learn from other devel- opers to select the right service); (b) an absolute ranking ρabs (di ) is based on the overall network of developers, to take into account the centrality of di in the network independently of the developer dr , who issued the request. r In particular, the relative ranking ρdrel (di ) is inversely proportional to the distance `(dr , di ) between dr and di , in terms of follower-of relationships, that is: r 1 ρdrel (di ) = ∈ [0, 1] (4) `(dr , di ) If there is no a path from dr to di , `(dr , di ) is set to the length of the longest path of follower-of relationships that relate dr to the other developers, incremented by 1, to denote that di is far from dr more than all the developers within the dr sub-network. Consider for example the network topology shown in Figure 1, where the developer dev3 is the requester and has to choose among services 78 Devis Bianchini et al. that have been used in the past by the developers dev4, dev5, dev6, dev8 and dev11, whose follower-of relationships are depicted in the figure. In the exam- ple, `(dev3,dev4)=`(dev3,dev8)=1, `(dev3,dev5)=2, and `(dev3,dev6)=`(dev3, dev11)=2 + 1=3. The absolute ranking ρabs (di )∈[0, 1] is evaluated no matter is the viewpoint of the requester dr . This ranking is composed of two different parts. The first one depends on the number of aggregations designed by di , the second one depends on the topology of the network of other developers who declared their interest for di past experiences, that is: n 1−α X c(dj )·ρabs (dj ) ρabs (di ) = ·|G(di )| + α· (5) |D| j=1 F (dj ) This expression is an adaptation of the PageRank metrics to the context we are considering. The value ρabs (di ) represents the probability that a developer will consider the example P given by di in using a service for designing a Web ap- plication. Therefore, i ρabs (di ) = 1. Initially, all developers are assigned with the same probability, that is, ρabs (di ) = 1/|D|. Furthermore, at each iteration of the absolute ranking computation, the absolute rank of a developer dj , such f that dj − →di , is ”transferred” to di according to the following criteria: (i) if dj follows more developers, his/her rank is distributed over all these developers, properly weighted considering the credibility c(dj ) of dj (see the second term in Equation (5), where F (dj ) is the number of developers followed by dj ); (ii) a con- tribution to ρabs (di ) is given by the experience of di and is therefore proportional to the number of aggregations designed by di (see the first term in Equation (5)). A damping factor α∈[0, 1] is used to balance contributions explained in (i) and in P (ii). At each step, a normalization procedure is applied in order to ensure that i ρabs (di ) = 1. The computation algorithm used for Equation (5) is similar to the one ap- plied for PageRank. In particular, denoting with ρabs (di , τN ) the N-th itera- tion in computing ρabs (di ), with DR(τN ) the column vector whose elements are ρabs (di , τN ), we have:   |G(d1 )| 1−α   |G(d2 )|   DR(τN +1 ) = ·  .  + α·M·DR(τN ) (6) |D|  ..  |G(dn )| where M denotes the adjacency matrix properly modified to consider credibility, c(d ) f that is, Mij = F (djj ) if dj − →di , zero otherwise. As demonstrated in PageRank, computation formulated in Equation (6) reaches a high degree of accuracy within only a few iterations. Framework validation. Since there are no benchmarks to compare our ap- proach with similar efforts, we built a dataset to perform a validation on the framework. We used wrappers to extract from Mashape repository service de- scriptions and the developers who follow/consume those services, including the Social Mining as a Knowledge Management Solution 79 network of their follower-of relationships. We completed the dataset construc- tion by adding the number of developed aggregations and developers’ credibility values in order to obtain the following classes of developers: (i) developers with a high number of followers, who in turn have several followers as well, with high credibility (0.7≤c(di )≤1.0) and several designed aggregations (|G(di )|≥3), like dev5 in Figure 1; (ii) developers who present few followers, medium credibil- ity (0.4≤c(di )<0.7) and several designed aggregations (|G(di )|≥3); (iii) devel- opers who present few followers, medium credibility and few designed aggrega- tions (|G(di )| < 3); (iv) developers who present few followers, low credibility (0≤c(di )<0.4) and who designed few aggregations. Intuitively, the above men- tioned classes are ordered according to an increasing rank of developers, where developers in class (i) are top ranked. This rank will be considered as reference for setup of the damping factor α and approach validation. We run the system randomly selecting a subset of developers as requesters, we computed the Kendall tau distance k for each run with respect to the refer- ence developers’ rank described above and we computed the average value for k. We also repeated the same experiment considering two different configurations of the approach, namely: (a) an optimistic configuration, where we considered all developers having maximum credibility (i.e., c(di ) = 1.0, ∀i); (b) a config- r uration biased on the requester, where only the relative ranking ρdrel (di ) has been considered. We kept the value α = 0.6 for the damping factor. The results of this validation are shown in Table 1. The average value of the Kendall tau distance shows the accuracy of our ranking solution compared to an intuitive sorting of developers described above considering the four classes of developers (i)-(iv). Validation results also demonstrate the impact of considering developers’ credibility and the topology of social relationships between developers through the computation of the absolute ranking ρabs (di ). Indeed, if we do not consider different levels of credibility (optimistic case), the quality of ranking slightly de- creases. However, decrement of ranking quality is even more evident if we do not consider the absolute ranking (biased case), thus demonstrating that the relative viewpoint of the requester is not able to correctly identify the centrality of other developers in the social network. Credibility Dumping factor Kendall tau c(di ) α distance k∈[0, 1] Biased case - α = 0.6 0.6026 Optimistic case c(di ) = 1, ∀di α = 0.6 0.1795 Our approach c(di )∈[0, 1], ∀di α = 0.6 0.0360 Table 1. Average Kendal tau distances computed for the approach validation 80 Devis Bianchini et al. 4 Conclusions In this paper, we discussed how developers’ social relationships, as well as their credibility, can be properly exploited to support data service selection. In par- ticular, we proposed a framework for ranking developers by considering both a relative and an absolute perspective. The framework interacts with the Mashape repository, the largest service repository where developers can follow each other in a social network. Other developers’ social networks, such as GitHub, are not specifically meant for data service selection, while other highly populated Web API repositories (e.g., ProgrammableWeb) do not present social relationships between developers. Further studies are on going for extending the social net- work model: specifically, other aspects such as the maturity of the use of data services (estimated through their publishing data and the number and quality of aggregations including the services) and specificity of the searched services (i.e., general purpose or domain-specific) may be investigated with respect to a possible influence in the search and ranking process. References 1. W. Xu, J. Cao, L. Hu, J. Wang, M. Li, A social-aware service recommendation approach for mashup creation, in: IEEE International Conference on Web Services, 2013. 2. L. Yao, S. Zheng, A. Segev, J. Yu, Recommending web services via combining col- laborative filtering with content-based features, in: IEEE International Conference on Web Services, 2013. 3. B. Cao, M. Tang, X. Huang, Cscf: A mashup service recommendation approach based on content similarity and collaborative filtering, International Journal of Grid and Distributed Computing 7 (2) (2014) 163–172. 4. R. Balakrishnan, S. Kambhampati, J. Manishkumar, Assessing Relevance and Trust of the Deep Web Sources and Results Based on Inter-Source Agreement, ACM Transactions on the Web 7 (2) (2013) 32 pages. 5. C. Li, R. Z. Z. Huai, H. Sun, A novel approach for api recommendation in mashup development, in: Proc. of Int. Conference on Web Services (ICWS), 2014, pp. 289– 296. 6. B. Cao, J. Liu, M. Tang, Z. Zheng, G. Wang, Mashup Service Recommendation based on User Interest and Social Network, in: Proc. of Int. Conference on Web Services (ICWS), 2013. 7. A. Maaradji, H. Hacid, R. Skraba, A. Lateef, J. Daigremont, N. Crespi, Social- based Web Services Discovery and Composition for Step-by-Step Mashup Com- pletion, in: Proc. of Int. Conference on Web Services (ICWS), 2011. 8. A. Fuxman, P. Giorgini, M. Kolp, J. Mylopoulos, Information Systems as Social Structures, Formal Ontology in Information Systems (2001) 12–21. 9. D. Bianchini, V. De Antonellis, M. Melchiori, A Multi-perspective Framework for Web API Search in Enterprise Mashup Design (Best Paper), in: Proc. of 25th Int. Conference on Advanced Information Systems Engineering (CAiSE), Vol. LNCS 7908, 2013, pp. 353–368. 10. Z. Malik, A. Bouguettaya, RATEWeb: Reputation Assessment for Trust Estab- lishment among Web Services, VLBD Journal 18 (2009) 885–911.