=Paper= {{Paper |id=Vol-2140/paper9 |storemode=property |title=Mining Software Repositories to Support OSS Developers: A Recommender Systems Approach |pdfUrl=https://ceur-ws.org/Vol-2140/paper9.pdf |volume=Vol-2140 |authors=Phuong T. Nguyen,Juri Di Rocco,Davide Di Ruscio |dblpUrl=https://dblp.org/rec/conf/iir/NguyenRR18 }} ==Mining Software Repositories to Support OSS Developers: A Recommender Systems Approach== https://ceur-ws.org/Vol-2140/paper9.pdf
           Mining Software Repositories to Support OSS
          Developers: A Recommender Systems Approach

                    Phuong T. Nguyen, Juri Di Rocco, Davide Di Ruscio

           Department of Information Engineering, Computer Science and Mathematics
                               Università degli Studi dell’Aquila
                              Via Vetoio 2 – 67100 L’Aquila, Italy
          {phuong.nguyen,juri.dirocco,davide.diruscio}@univaq.it



          Abstract. To facilitate the development activities, software developers frequently
          look up external sources for related information. Consulting data available at
          open source software (OSS) repositories can be considered as their daily routine.
          Nonetheless, the heterogeneity of resources and their corresponding dependen-
          cies are the main obstacles to the effective mining and exploitation of the data.
          Given the context, the manual search for every single resource to find the most
          suitable ones is a daunting and inefficient task. Thus, equipping developers with
          techniques and tools to accelerate the search process as well as to improve the
          search results will help them enhance the work efficiency. Within the scope of the
          EU funded CROSSMINER project, advanced techniques and tools are being con-
          ceived for providing open software developers with innovative features aiming at
          obtaining improvements in terms of development effort, cost savings, developer
          productivity, etc. To this end, cutting-edge technologies are applied, such as in-
          formation retrieval and recommender systems to solve the problem of mining the
          rich metadata available at OSS repositories to support software developers. In
          this paper, we present the main research problems as well the proposed approach
          together with some preliminary results.


1      Introduction
During the development phase, software programmers need to tackle various issues,
such as mastering different programming languages, reusing source code, or choosing
suitable external third-party libraries. By exploiting existing well-defined artifacts from
open source software (OSS) repositories, such as code snippets and API usage patterns,
one can avoid coding from scratch. Nevertheless, as illustrated in Fig. 1, the available
information is huge and heterogeneous as it comes from different sources, e.g. source
code, Q&A systems, or API documentations. Given the circumstances, even experi-
enced and skilled developers might face difficulties in searching for suitable resources.
    Over the last years, considerable effort has been devoted to data mining and knowl-
edge inference techniques to provide automated assistance to developers in navigating
large information spaces and giving recommendations [24], for instance, API documen-
tation recommendation [27], mining Q&A systems [21], API usages recommendation
[17,30], and third-party library recommendation [26] just to mention a few.
    IIR 2018, May 28-30, 2018, Rome, Italy. Copyright held by the author(s).
    CROSSMINER1 [1] is a research                              Source code

project funded by the EU Horizon 2020                                  Q&A systems

Research and Innovation Programme
and aims at extending the EU OSS-                                              Bug Reports
METER FP7 project [9] by support-
ing the development of complex soft-                                                API
                                                                                    Documentation
ware systems by facilitating the com-
parison and adoption of already exist-             Developer
                                                                                Tutorials
ing open source software components.
To this end, CROSSMINER is conceiv-
ing techniques for knowledge extraction                                 Configuration
from large open source software reposi-                                 Management Systems

tories. Being equipped with cutting-edge Fig. 1. Developers are overwhelmed by huge and
technologies, developers can make use of miscellaneous sources of supporting materials
existing similar modules, instead of reim-
plementing them. Thus, many traditional
development tasks become semi- or fully- automated by means of meaningful recom-
mendations. Thus, the job of developers is expected to be more effective and efficient.
    The work in CROSSMINER differs from other existing studies in the sense that
it brings in a completely new paradigm for the representation of OSS artifacts so as
to pave the way for various computations. Further than extending state-of-the-art ap-
proaches in the field of automated analysis and measurement of open source soft-
ware, we develop advanced techniques to investigate relationships among different OSS
projects and properly organize them in a dedicated knowledge base. The knowledge
base fosters the deployment of recommender systems to present users with interesting
items previously unknown to them.
    In this paper, we describe our ongoing work with the focus on the mining of cross
relationships among OSS projects to provide developers with helpful recommenda-
tions. To this end, the paper is organized as follows: an overview of the CROSSMINER
project and of the envisioned recommendations is given in Section 2. Section 3 presents
the main constituting elements of the proposed approach to realize such recommenda-
tions. Section 4 recalls popular metrics used for evaluating recommendation outcomes.
Section 5 introduces some preliminary results and finally, Sect. 6 concludes the paper.



2     Overview of the envisioned CROSSMINER recommendations

CROSSMINER aims at supporting software developers by means of an advanced Ecli-
pse-based IDE providing intelligent recommendations that go far beyond the current
“code completion-oriented” practice. To this end, data retrieved from different sources
has to be collected and processed so to properly feed the recommendation component.
In particular, as shown in Fig. 2, four high-level modules compose the CROSSMINER
platform.

 1
     https://www.crossminer.org
    The Data Preproces-
sing module contains tools                               Real-time recommendations that serve productivity and quality increase

that extract metadata from OSS                                                        Developer
                                                                                                               query
                                                                                                                                Knowledge Base                Data
                                                                                         IDE             recommendations
                                                                                                                                                             Storage
repositories. Data can be of dif-             Developer
ferent types, such as: source
code, configuration, or cross       Data Preprocessing  Capturing Context
                                                                                    Producing                  Presenting
                                                                               Recommendations              Recommendations
project relationships. Natural
language processing (NLP) tools
are also deployed to analyze                                               Knowledge Base

                                                                                   lookup/store

developer forums and discus-                                          Mining and Analysis Tools
                                                                                                                                OSS forges
sions. The collected data is used                                  Source Code
                                                                      Miner
                                                                                             NLP
                                                                                            Miner       mine
                                                                                                                       Source
                                                                                                                                  Natural
                                                                                                                                 language
                                                                                                                                                 Configuration
                                                                                                                        Code                        Scripts
                                                                                                                                 channels
to populate a knowledge base                                       Configuration
                                                                      Miner
                                                                                        Cross project
                                                                                          Analysis


which serves as the core for
the mining functionalities. By               Fig. 2. A high-level view of CROSSMINER
capturing developers’ activities
(see Capturing Context),
the IDE is able to generate and display recommendations (see the module
Producing Recommendations and Presenting Recommendations).
    To provide developers with useful support, we concentrate on working with the use
cases depicted in Fig. 3. The knowledge base takes the developer context as input and
returns recommendations as discussed below:




                                         Fig. 3. Types of recommendations

  – GetProjectAlternatives: we implement novel clustering and similarity mechanisms
    being able to suggest OSS projects that can be alternatively used instead of OSS
    components, which have been previously selected and integrated in the software
    being developed. Based on designated similarity functions, we are able to detect
    projects that are similar because of: Provided APIs (GetProjectAlternativesWith-
    SimilarAPIs) [15]; Size (GetProjectAlternativesWithSimilarSize); Application do-
    main (GetProjectAlternativesWithSimilarTopics); and Comparable quality (GetPro-
    jectAlternativesWithSimilarQuality);
  – GetProjectsByUsedComponents: depending on the used components, the knowl-
    edge base is able to identify and suggest further components that, according to what
      other developers have done in the past, should be also included in the system being
      implemented. Two prominent examples are recommendation of third-party libraries
      [26] and code snippets [16];
    – GetAPIUsageSupport: The knowledge base provides developers with recommen-
      dations on how to use a given API and to manage the migration of the system in
      case of deprecated methods. This use case consists of:
        • GetAPIUsageDiscussions: given an API the developer has already included,
           it is possible to retrieve messages from communication channels (like forums,
           bug reports, and Stack Overflow posts) that are useful for understanding how
           to properly use it [21];
        • GetAPIUsagePatterns: in case of deprecated API methods, the knowledge base
           recommends code examples that can be considered as a reference for migrating
           the system and to make it work with the new version of the used API [20,30].
    – GetRecommendedDeps: starting from a given configuration and by considering
      similar projects developed by other developers, the knowledge base recommends
      other additional third-party libraries that should be further included [26];
    – GetRecommendedDocs: by considering the documentation examined by other de-
      velopers that used similar APIs and frameworks, the knowledge base suggests ad-
      ditional sources of information, e.g. technical documents, tutorials, etc., that are
      useful for solving the development problem at hand [27];
    – GetAPIBreakingUpdates: the knowledge base implements the notion of API evo-
      lution with the aim of identifying backward compatibility problems affecting source
      code that uses evolving APIs;
    – GetRequiredChanges: given a changed API and a project using it, the knowledge
      base provides an overview of the impact that the changes have on the depending
      project. Communication channel items discussing about such API changes will be
      also shown.

     It is worth noting that the recommendations previously summarized have been iden-
tified during the first 6 months of the CROSSMINER project to satisfy the requirements
of the industrial partners that work in the domains of IoT, multi-sector IT services, API
co-evolution, software analytics, software quality assurance, and OSS forges [1].


3     Proposed recommendation approach
Our approach is built based on the notation of recommender systems [24]. In the context
of mining software repositories, those are systems that can provide recommendations to
developers with regards to their development context. For recommender systems in gen-
eral, the ability to measure the similarity between items plays an important role in ob-
taining relevant recommendations [10]. Intuitively, for software mining recommender
systems, the measurement of similarities between artifacts, e.g. projects, dependencies,
code snippets, or even developers shall also be a critical factor. Nevertheless, the com-
putation of similarities between software systems/open source projects in particular has
been identified as a thorny issue [15]. Furthermore, considering the miscellaneousness
of artifacts in OSS repositories, similarity computation becomes more complicated as
many artifacts and several cross relationships prevail.
                                       Recommender systems

                         mine                                          feed /
                                Similarity computator                           Knowledge
            OSS forges                                                 use
                                                                                  base

                                           OSS ecosystem representer



      Fig. 4. The main components underpinning the CROSSMINER recommendation system


   Fig. 4 depicts a layered architecture consisting of the core elements, which un-
derpin the realization of the recommendations summarized in the previous section. An
overview of such elements, which mine OSS forgers and manage the content of a knowl-
edge base, is given in the next sub-sections.

3.1    OSS ecosystem representer
To enable both the representation of different OSS projects and the calculation of their
similarity, a graph-based model has been conceived [18]. We consider the community
of developers together with OSS projects and other artifacts as an ecosystem. Graphs
are then used for representing different types of relationships in the OSS ecosystem.
    The adoption of the graph-based representation allows for the transformation of the
relationships among various artifacts in the OSS ecosystem into a mathematically com-
putable format. The following relationships are used to construct graphs representing
the OSS ecosystem and eventually to calculate similarity using graph algorithms.
  – includes ⊆ Dependency×Project: according to [15,26], the similarity between two
     projects relies on the dependencies they have in common because they aim at im-
     plementing similar functionalities.
  – develops ⊆ Developer × Project: we assume that there exists a certain level of
     similarity between two projects if they are built by same developers [5];
  – stars ⊆ User × Project: this relationship models the star event to represent GitHub
     projects that a given user has starred.
  – develops ⊆ User × Project: this relationship is used to represent the projects that a
     given user contributes in terms of source code development;
  – implements ⊆ File × File: it depicts a specific relation that can occur between the
     source code given in two different files, e.g. a class specified in one file implement-
     ing an interface given in another file;
  – hasSourceCode ⊆ Project × File: it represents the source files in an OSS project.
    Fig. 5 depicts an excerpt of a graph representing an explanatory example with two
OSS projects project#1 and project#2. The former contains HttpSocket.java
and the latter contains FtpSocket.java. Both files implement interface#1 mark-
ed by implements.
3.2 Similarity computator
Nodes, links, and the mutual relationships among them allow for the computation of
similarity [4]. To the best of our knowledge, there are several techniques for computing
                 dev#1                               Socket.java                                  dev#2




                                                                im
                                                   ts
                          de




                                                                                           ps
                                                 en




                                                                   pl




                                                                                         lo
                           ve




                                                em




                                                                     em




                                                                                      ve
                              lo




                                            pl




                                                                      en



                                                                                   de
                                ps



                                           im




                                                                        ts
                               HttpSocket.java                     FtpSocket.java


                                                                                                                 dev#3




                                de




                                                                                       ha
                               Co




                                                                                        sS
                                                                                           o
                           rce




                                                                                            ur
                                                        lib#1




                                                                                            ceC
                          Sou




                                                                                                             s
                                                                                                          ar
                                                                                                 od
                         has
                                            es                               inc




                                                                                                          st
                                        lud                                     lud




                                                                                                 e
                                     inc                                           es


                   project#1          includ                                                     project#2
                                                es
                                                        lib#2


    Fig. 5. Representation of OSS projects and their corresponding developers and artifacts


similarity in graph [8]. SimRank is among the most notable algorithms for computing
graph similarity [12]. Given two nodes, SimRank computes the similarity by consid-
ering their neighbours: the more shared nodes point to them, the more similar the two
nodes are. Besides SimRank, there are many other algorithms for calculating similari-
ties in graph that cannot be recalled in this paper due to space limitation [11,14].
    In Fig. 5, we can compute the similarity between project#1 and project#2
using related semantic paths, e.g. the hasSourceCode and implements two-hop
path, or the one-hop path includes. This assumption relies on the fact that the
projects are implementing common functionalities by using common libraries [15,26].
The graph also allows to compute the similarity between two developers, e.g. dev#1
and dev#2 as they are indirectly connected by the develops and implements. In
summary, by transforming various OSS artifacts into a graph, using different similar
algorithms, we are able to perform similarity calculation for different artifacts which
then serves as a base for other computations.

3.3 Recommender system
We derive recommendation techniques from the mechanisms implemented for e-comm-
erce systems [13]. There, given a customer, products that have been purchased by sim-
ilar customers are recommended to her [25]. Similarly, given a software project we
recommend artifacts that exist in projects that are similar to it. A content-based rec-
ommender system works by recommending to a developer various artifacts, e.g. code
snippets, API method invocations, external libraries in projects that are similar to the
ones being developed. Whereas, a collaborative-filtering recommender system gives to
a developer recommendations that are based on the artifacts used by developers with
similar behaviors [25].
    By referring to the example shown in Fig. 5, we see that project#1 is similar to
project#2 in terms of the functionalities [15,26]: they contain classes HttpSock-
et.java and FtpSocket.java which implement the same interface Socket.j-
ava. Furthermore, both projects share the third-party library lib#1. Thus, it is sensi-
ble to recommend lib#2 to project#2 since lib#2 is being used by project#1.
With this recommendation, the developers of project#2 are able to save the time
spent on manual searching for lib#2. Analogously, since the two projects are similar,
it is also worthwhile to suggest developer dev#3 starring project#1 since she al-
ready starred project#22 . In practice, the recommendation of OSS artifacts is much
well-defined with the incorporation of several similar projects instead of only one.
     By following this design, we implemented the first prototype of a recommender
system that is able to recommend similar projects and third-party libraries. Before going
to present some initial results in Section 5, we recall some popular metrics for evaluating
recommendation outcomes in Section 4.

4      Evaluation metrics
Given a query, the outcome of the recommendation process is a ranked list of items
that are considered to be relevant for the query. For instance, a system that recom-
mends third-party libraries for a given project returns a list in descending order of real
similarity scores corresponding to libraries [26]. To validate the performance of a rec-
ommender system, we perform cross validation using a training and a testing dataset
[6]. Training data is used to build the model whereas testing data is used to validate the
outcome. Considering a project that needs library recommendation, the graph model is
used to compute similarities and then to find most k similar projects. The outcome of
the recommendation is a ranked list of libraries. Normally, a developer pays attention
only to the top-N items. We use k and N as parameter for the evaluation later on.
    We recall the following metrics that can be used to evaluate the performance of a
recommender system in the context of mining software repositories given the presence
of training and testing datasets. First, for a clear presentation of the metrics considered
during the outcome evaluation, the following notations are defined:
  – N is the cut-off value for the list of recommended items and k is the number of
     neighbour projects considered for the recommendation process;
  – For a testing project p, the ground-truth dataset is named as GT(p);
  – REC(p) is the top-N items recommended to p. It is a ranked list in descending
     order of real scores, with RECr (p) being the library in the position r;
  – If a recommended item i ∈ REC(p) for a testing project p is found in the ground
     truth of p (i.e., GT(p)), hereafter we call this as a library match or hit.

    Using these notations, the metrics utilized to measure the recommendation out-
comes are explained in the following. Among others, we consider success rate [26],
accuracy [15], sales diversity, and novelty [19] the most suitable metrics for evaluating
a recommender system in mining OSS repositories [24].

4.1     Success rate

Given a set of P testing projects, this metric measures the rate at which a recommender
system can return at least a match among top-N recommended items for every project
p ∈ P [26]. The metric is formally defined as follows:
 2
     Starring is used by GitHub developers as a means to bookmark an OSS repository and to thank
     its contributors. The GitHub star has nothing to do with ratings as by TripAdvisor or YouTube
                           countp∈P ( GT (p) (∪N
                                            T
                                                 r=1 RECr (p)) > 0)
          success rate@N =                                                            (1)
                                             |P |
where the function count() counts the number of times that the boolean expression
specified in its parameter is true.
4.2   Accuracy
Given a list of top-N items, precision@N, recall@N, and normalized discounted cumu-
lative gain are utilized to measure the accuracy of the recommendation results.
Precision@N is the ratio of the top-N recommended items belonging to the ground-
truth dataset:
                                         PN                 T
                                              r=1 |GT (p)
                                                        RECr (p)|
                   precision@N (p) =                                               (2)
                                                    N
Recall@N is the ratio of the ground-truth items appearing in the N items [7,8,19]:
                                         PN             T
                                          r=1 |GT (p)    RECr (p)|
                     recall@N (p) =                                                   (3)
                                                  |GT (p)|
Normalized Discounted Cumulative Gain Precision and recall reflect well the accuracy,
however they neglect ranking sensitivity [3]. nDCG is an effective way to measure if a
system can present highly relevant items on the top of the list:
                                                N
                                            1   X    2rel(p,i)
                        nDCG@N (p) =          ·                                       (4)
                                          iDCG i=1 log2 (i + 1)
where iDCG is used to normalize the metric to 1 when an ideal ranking is reached.
4.3   Sales Diversity
In e-commerce systems, sales diversity is the ability to improve the coverage as also
the distribution of products across customers [19,29]. In the context of mining software
repositories, sales diversity means the ability of the system to suggest to projects as
much items, e.g. libraries, code snippets, as possible, as well as to disperse the concen-
tration among all, instead of focusing only on a specific set of items [24].
Catalog coverage measures the percentage of items recommended to projects:

                                          ∪p∈P ∪N
                                                r=1 RECr (p)
                        coverage@N =                                                  (5)
                                                  |I|
Entropy evaluates if the recommendations are concentrated on only a small set or spread
across a wide range of items [22]:
                                   X  #rec(i)   #rec(i) 
                     entropy = −                  ln                                (6)
                                         total           total
                                   i∈I

where I is the set of all items available for recommendation, #rec(i) is the number of
projects getting i as a recommendation, #rec(i) = countp∈P ( (∪N r=1 RECr (p)) 3 i ),
i ∈ I, total denotes the total number of recommended items across all projects.
4.4     Novelty

Novelty measures if a system is able to expose items to projects. Expected popularity
complement (EPC) is utilized to measure novelty and is defined as follows [28,29]:
                                  P        PN    rel(p,r)∗[1−pop(RECr (p))]
                                     p∈P    r=1           log2 (r+1)
                    EP C@N =               P      PN rel(p,r)                            (7)
                                             p∈P     r=1 log2 (r+1)
                            T
where rel(p, r) = |GT (p) RECr (p)| represents the relevance of the item at the r
position of the top-N list to project p; pop(RECr (p)) is the popularity of the item
at the position r in the top-N recommended list. It is computed as the ratio between
the number of projects that receive RECr (p) as recommendation and the number of
projects that receive the most ever recommended items as recommendation. Equation 7
implies that the more unpopular items a system recommends, the higher the EPC value
it obtains and vice versa.

5      Preliminary results
For explanatory purpose, we introduce an example of how the CROSSMINER recom-
mender system can be applied to assist OSS developers in a specific context. During
the software development phase, among other tasks, programmers regularly search for
and reuse third-party libraries [2]. A third-party library is an interface to a reusable
source code and can be embedded in external software projects independently from en-
vironment code [23]. To help developers quickly locate suitable dependencies, in the
context of the CROSSMINER project we implemented CrossRec, a framework that
exploits Cross Projects Relationships in Open Source Software Repositories to build
a Recommender System. CrossRec employs a collaborative-filtering technique based
on the similar model applied in e-commerce systems [13]. Instead of recommending
products to customers, we recommend third-party libraries to projects using exactly the
same mechanism: “given a project, libraries come from similar projects.” To be precise,
in this section we address the use case GetRecommendedDeps outlined in Section 2.
    To evaluate CrossRec, we considered a well-established tool as baseline. In partic-
ular, to the best of our knowledge, LibRec [26] is one the most advanced techniques
for library recommendation. Based on the set of third-party libraries that a project has
already included, LibRec searches for relevant libraries with a high success rate (see
Section 4.1). Using the available implementation3 , we conducted an evaluation on Li-
bRec and CrossRec with the same dataset consisting of 5.200 GitHub Java projects to
see how well they can search for suitable third-party libraries.
    Since success rate was used as the only evaluation metric for LibRec [26], we ex-
ploited it as a means to compare directly the performance of both systems. Table 1
shows success rate@5 for k={5, 10, 15, 20, 25}. As can be seen, the success rates
obtained by CrossRec are always superior to those of LibRec. The maximum success
rate@5 of LibRec is 0.8780, whereas CrossRec obtains success rates that are always
greater than 0.9073 for all configurations, with 0.9286 being the maximum value. For
 3
     We would like to thank Ferdian Thung and David Lo at the School of Information Systems,
     Singapore Management University for providing us with the original LibRec implementation
    Table 1. Success rate for N={5,10},          Table 2. Success rate for N={1,3,5,7,10},
    k={5,10,15,20,25}                            k={10,20}
               N=5              N=10                        k=10               k=20
     k LibRec CrossRec LibRec CrossRec            N LibRec CrossRec LibRec CrossRec
     5    0.8576   0.9073   0.9143   0.9421       1   0.6248    0.7482    0.6565   0.7650
    10 0.8757      0.9230   0.9332   0.9526       3   0.8192    0.8892    0.8228   0.8951
    15 0.8767      0.9269   0.9313   0.9550       5   0.8757    0.9230    0.8780   0.9286
    20 0.8780      0.9286   0.9334   0.9557       7   0.9078    0.9386    0.9055   0.9442
    25 0.8769      0.9284   0.9334   0.9532       10 0.9332     0.9526    0.9334   0.9557

N = 10 both LibRec and CrossRec get a considerable improvement in their perfor-
mance success rate@10 compared to the case with N = 5. It is evident that in all test
configurations, CrossRec gains a better performance than that of LibRec.
     Next we investigate success rate with respect to the length of the recommendation
list N , i.e. N = {1, 3, 5, 7, 10}. In the first experiment, k is fixed to 10 and the outcomes
are depicted in Table 2. For N = 1, LibRec gets a success rate of 0.6248 which is
much lower than 0.7482, the corresponding value by CrossRec. The value of 0.7482
shows that CrossRec is able to provide relevant recommendations to the developer at
an encouraging match rate, even when she expects only an extremely brief list. Starting
from N = 5, CrossRec gains a quick increase in its performance as the success rates
are always greater than 0.9286.
     In the second experiment, k is changed to 20 and both systems have a slight increase
in their success rate, however the change is marginal. By conducting more experiments
with an increasing k, we noticed that incorporating more similar projects for recommen-
dation does not improve success rate (the outcomes of these experiments are omitted
from the paper due to space limitation).
     In summary, by considering the results depicted in Table 1 and Table 2, we come
to the conclusion that CrossRec obtains a better success rate in comparison to LibRec
in all cases. This confirms our hypothesis that the collaborative-filtering technique is a
profitable deployment for recommendation of third-party libraries, and consequently it
deserves to be further investigated in the context of the CROSSMINER project.

6        Conclusions

We presented our proposed framework to assist software developers in mining OSS
repositories. We exploit the graph model to represent the semantic relationships of the
OSS ecosystem. Afterwards, a knowledge base with metadata curated from OSS repos-
itories has been populated to serve for various mining techniques. We built our first
prototype of a recommender system using the collaborative-filtering technique. A pre-
liminary evaluation on a dataset of 5.200 GitHub Java projects shows that our system
for recommending third-party libraries outperforms a well-known baseline. For future
work, we are going to address all the recommendations mentioned in Section 2 follow-
ing the model proposed in this paper.
Acknowledgments

The research described in this paper has been carried out as part of the CROSSMINER
Project, EU Horizon 2020 Research and Innovation Programme, grant agreement No.
732223

References

 1. A. Bagnato et. al. Developer-centric knowledge mining from large open-source software
    repositories (crossminer). In Software Technologies: Applications and Foundations, pages
    375–384. Springer International Publishing, 2018.
 2. V. Bauer, L. Heinemann, and F. Deissenboeck. A structured approach to assess third-party
    library usage. In Proceedings of the 2012 IEEE International Conference on Software Main-
    tenance (ICSM), ICSM ’12, pages 483–492. IEEE Computer Society, 2012.
 3. A. Bellogı́n, I. Cantador, and P. Castells. A comparative study of heterogeneous item recom-
    mendations in social systems. Inf. Sci., 221:142–169, Feb. 2013.
 4. V. D. Blondel, A. Gajardo, M. Heymans, P. Senellart, and P. V. Dooren. A measure of
    similarity between graph vertices: Applications to synonym extraction and web searching.
    SIAM Rev., 46(4):647–666, Apr. 2004.
 5. N. Chen, S. C. Hoi, S. Li, and X. Xiao. Simapp: A framework for detecting similar mobile
    applications by online kernel learning. In Proceedings of the Eighth ACM International
    Conference on Web Search and Data Mining, WSDM ’15, pages 305–314. ACM, 2015.
 6. P. Cremonesi, R. Turrin, E. Lentini, and M. Matteucci. An evaluation methodology for
    collaborative recommender systems. In Proceedings of the 2008 International Conference on
    Automated Solutions for Cross Media Content and Multi-channel Distribution, AXMEDIS
    ’08, pages 224–231. IEEE Computer Society, 2008.
 7. J. Davis and M. Goadrich. The relationship between precision-recall and roc curves. In
    Proceedings of the 23rd International Conference on Machine Learning, ICML ’06, pages
    233–240. ACM, 2006.
 8. T. Di Noia, R. Mirizzi, V. C. Ostuni, D. Romito, and M. Zanker. Linked open data to support
    content-based recommender systems. In Proceedings of the 8th International Conference on
    Semantic Systems, I-SEMANTICS ’12, pages 1–8. ACM, 2012.
 9. D. Di Ruscio, D. S. Kolovos, I. Korkontzelos, N. Matragkas, and J. J. Vinju. Ossmeter: A
    software measurement platform for automatically analysing open source software projects.
    In Proceedings of the 2015 10th Joint Meeting on Foundations of Software Engineering,
    ESEC/FSE 2015, pages 970–973. ACM, 2015.
10. G. Guo, J. Zhang, and N. Yorke-Smith. A novel bayesian similarity measure for recom-
    mender systems. In Proceedings of the Twenty-Third International Joint Conference on
    Artificial Intelligence, IJCAI ’13, pages 2619–2625. AAAI Press, 2013.
11. T. H. Haveliwala. Topic-sensitive pagerank. In Proceedings of the 11th International Con-
    ference on World Wide Web, WWW ’02, pages 517–526. ACM, 2002.
12. G. Jeh and J. Widom. Simrank: A measure of structural-context similarity. In Proceedings
    of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data
    Mining, KDD ’02, pages 538–543. ACM, 2002.
13. G. Linden, B. Smith, and J. York. Amazon.com recommendations: Item-to-item collabora-
    tive filtering. IEEE Internet Computing, 7(1):76–80, Jan. 2003.
14. W. Lu, J. Janssen, E. Milios, N. Japkowicz, and Y. Zhang. Node similarity in the citation
    graph. Knowledge and Information Systems, 11(1):105–129, Jan 2007.
15. C. McMillan, M. Grechanik, and D. Poshyvanyk. Detecting similar software applications. In
    Proceedings of the 34th International Conference on Software Engineering, ICSE ’12, pages
    364–374. IEEE Press, 2012.
16. C. McMillan, D. Poshyvanyk, and M. Grechanik. Recommending source code examples via
    api call usages and documentation. In Proceedings of the 2Nd International Workshop on
    Recommendation Systems for Software Engineering, RSSE ’10, pages 21–25. ACM, 2010.
17. L. Moreno, G. Bavota, M. Di Penta, R. Oliveto, and A. Marcus. How can i use this method?
    In Proceedings of the 37th International Conference on Software Engineering - Volume 1,
    ICSE ’15, pages 880–890. IEEE Press, 2015.
18. P. T. Nguyen, J. D. Rocco, R. Rubei, and D. D. Ruscio. Crosssim: exploiting mutual rela-
    tionships to detect similar oss projects. In Procs. of 44th Euromicro Conference on Software
    Engineering and Advanced Applications (SEAA) - to appear, 2018.
19. P. T. Nguyen, P. Tomeo, T. Di Noia, and E. Di Sciascio. Content-based recommendations
    via dbpedia and freebase: A case study in the music domain. In Proceedings of the 14th
    International Conference on The Semantic Web - ISWC 2015 - Volume 9366, pages 605–
    621. Springer-Verlag New York, Inc., 2015.
20. H. Niu, I. Keivanloo, and Y. Zou. Api usage pattern recommendation for software develop-
    ment. J. Syst. Softw., 129(C):127–139, July 2017.
21. L. Ponzanelli, G. Bavota, M. Di Penta, R. Oliveto, and M. Lanza. Mining stackoverflow to
    turn the ide into a self-confident programming prompter. In Proceedings of the 11th Working
    Conference on Mining Software Repositories, MSR 2014, pages 102–111. ACM, 2014.
22. A. Ragone, P. Tomeo, C. Magarelli, T. Di Noia, M. Palmonari, A. Maurino, and E. Di Scias-
    cio. Schema-summarization in linked-data-based feature selection for recommender systems.
    In Proceedings of the Symposium on Applied Computing, SAC ’17, pages 330–335. ACM,
    2017.
23. M. P. Robillard, E. Bodden, D. Kawrykow, M. Mezini, and T. Ratchford. Automated api
    property inference techniques. IEEE Trans. Softw. Eng., 39(5):613–637, May 2013.
24. M. P. Robillard, W. Maalej, R. J. Walker, and T. Zimmermann, editors. Recommendation
    Systems in Software Engineering. Springer Berlin Heidelberg, 2014. DOI: 10.1007/978-3-
    642-45135-5.
25. B. Sarwar, G. Karypis, J. Konstan, and J. Riedl. Item-based collaborative filtering recom-
    mendation algorithms. In Proceedings of the 10th International Conference on World Wide
    Web, WWW ’01, pages 285–295. ACM, 2001.
26. F. Thung, D. Lo, and J. Lawall. Automated library recommendation. In 2013 20th Working
    Conference on Reverse Engineering (WCRE), pages 182–191, Oct 2013.
27. F. Thung, S. Wang, D. Lo, and J. Lawall. Automatic recommendation of api methods from
    feature requests. In Proceedings of the 28th IEEE/ACM International Conference on Auto-
    mated Software Engineering, ASE’13, pages 290–300. IEEE Press.
28. S. Vargas and P. Castells. Rank and relevance in novelty and diversity metrics for recom-
    mender systems. In Proceedings of the Fifth ACM Conference on Recommender Systems,
    RecSys ’11, pages 109–116. ACM, 2011.
29. S. Vargas and P. Castells. Improving sales diversity by recommending users to items. In
    Eighth ACM Conference on Recommender Systems, RecSys ’14, Foster City, Silicon Valley,
    CA, USA - October 06 - 10, 2014, pages 145–152, 2014.
30. H. Zhong, T. Xie, L. Zhang, J. Pei, and H. Mei. Mapo: Mining and recommending api usage
    patterns. In S. Drossopoulou, editor, ECOOP 2009 – Object-Oriented Programming, pages
    318–343. Springer Berlin Heidelberg, 2009.