<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GH4RE: Repository Recommendation on GitHub for Requirements Elicitation Reuse</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Roxana Lisette Quintanilla Portugal</string-name>
          <email>rportugal@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Antonio Casanova</string-name>
          <email>casanova@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tong Li</string-name>
          <email>litong@bjut.edu.cn</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Julio Cesar Sampaio do Prado Leite</string-name>
          <email>julio@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Beijing University of Technology</institution>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Departamento de Informática</institution>
          ,
          <addr-line>PUC-Rio, Rio de Janeiro, Brasil CEP 22451-900</addr-line>
        </aff>
      </contrib-group>
      <fpage>113</fpage>
      <lpage>120</lpage>
      <abstract>
        <p>One of the challenges of requirements engineers is to understand domain issues and elicit requirements effectively. One of the possible strategies is to perform a manual inspection of similar projects to quickly gain leverage of domain concepts underlying the projects. However, this task is time-consuming and limited to the projects at hand. To ensure comprehensive elicitation using more widely available systems, we propose to use GitHub projects as information sources. To handle the large amount of data and facilitate access to suitable sources, we propose the creation of project profiles with useful attributes for requirements engineering, and thereby achieve a meaningful recommendation of projects. In this paper, we describe the GitHub assets to be mined, its implementation and the assessment of our approach by using a corpus of readmes related to Real Estate projects.</p>
      </abstract>
      <kwd-group>
        <kwd>Requirements Elicitation</kwd>
        <kwd>Recommendation Systems</kwd>
        <kwd>Open Source Repositories</kwd>
        <kwd>GitHub</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>When dealing with elicitation tasks, one important source of information for
requirements engineers is a set of similar projects, because they allow engineers to learn about
the domain, their features and contexts in order to be better prepared for more focused
requirements elicitation tasks, e.g. interviews/meetings with stakeholders.</p>
      <p>
        GitHub, a repository of open software projects, may help in this scenario. GitHub
stores a vast number of projects that describe an application domain through its
different perspectives (e.g., readmes, issues, comments, source code, and release notes).
These perspectives, if mined properly, can provide relevant knowledge. However, we
cannot rely on manual inspection of projects, since it is time-consuming and unfeasible,
given the plethora of projects in GitHub. On the other hand, automated techniques to
find similar projects must deal with heterogeneous data in each perspective, and this
falls into the same problem cited by Castro-Herrera and Cleland-Huang [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], “techniques
that work well for recommending carefully categorized books, or movies with ample
ratings, will not necessarily work well for recommending discussion topics existing in
forums or wikis”. Therefore, given that we are using the readme perspective, techniques
to facilitate the finding of meaningful assets in texts must be carefully designed, bearing
in mind the requirements engineer needs. This may allow the creation of a meaningful
recommendation set of GitHub projects for requirements elicitation purposes.
      </p>
      <p>
        A common approach in Recommendation Systems is the content-based filtering,
which is used to extract features in items (movies, songs, or web pages), and the use of
user profiles based on his/her preferences [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In this regard, two issues arise. The first
issue is that the readme is not categorized, as a book or a movie, since it contains free
texts with an undefined purpose, i.e. a readme may contain information about features
as well as installation instructions. The second issue is that users are rating GitHub
projects by their end function and not by their perspectives separately.
      </p>
      <p>
        We propose a recommendation approach for GitHub projects by discovering
meaningful assets on GitHub projects perspectives, for which the readme perspective will be
used as a sample. To supply the lack of user preferences, we propose to: (1) reveal
frequent terms that may cause an interaction with the user; and (2) based on his/her
choices, cluster and rank projects for recommendation. Our approach relies on a
NLPbased approach for Recommendation Systems [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] that specifically filters nouns and
proper nouns to reveal important entities that allow for user browsing. As a seed piece,
we begin by using a user query “real estate” to retrieve readmes from GitHub projects
and ordered according GitHub relevance [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. With this base, the extraction of
meaningful information is performed.
      </p>
      <p>
        We believe this approach can support requirements elicitation tasks, as GitHub can
be used as a source of information providing projects that acts as domain viewpoints of
a domain that can be considered before and during software construction [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ][
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
Differently from proposals that exploited GitHub during software construction, our approach
aims at supporting requirements elicitation, contributing to better projects
recommendation in GitHub perspectives. We adopt the real estate domain as a running example
to exemplify our approach.
      </p>
      <p>This paper is organized as follows. Section 2 reviews related work. Section 3
describes the implementation to extract meaningful assets from readme data. Section 4
presents the assessment based on the quality of projects recommended to support
requirements elicitation tasks. Finally, Section 5 describes the limitations and
opportunities of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        Recommendation Systems in Requirements Engineering (RSREs) has been a growing
interest in the field of Requirements Engineering (RE) as surveyed by [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. The authors
referenced 23 papers, two of which were highlighted for pointing out an overview of
potentials of RSREs. In particular, one stress that Requirements Engineering (RE)
usually deals with domain knowledge, which is often vast and evolving [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Furthermore,
it is mentioned the necessity to process huge amounts of information by stakeholders,
which includes what are users’ needs, why they are needed, what are competitors
offering, what are technological advances and the feasible features [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Although not
made explicit by authors in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], it is important to note that much interesting information
about users’ needs can be found in documents from similar projects [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], and GitHub
provides an opportunity to discover latent related information about user needs,
competitors’ offerings, as well as technological assets being used.
      </p>
      <p>
        From the works surveyed in [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], five of them address requirements elicitation
[1014], having that [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10-13</xref>
        ] as well as [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are extensions of [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. To our knowledge, there
are a few studies that focus on requirements elicitation using recommendation systems
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13 ref14 ref15">10-15</xref>
        ]. Such works [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10-13</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] recommend forums to stakeholders by first eliciting
stakeholders’ needs, performing mining of themes on these texts, clustering, and finally
making the recommendations. These works [
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10-13</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] are similar to ours in the sense
that they use an open-source repository (SugarCRM) as a source of information and the
feature requests perspective to evaluate their approach. The difference with our
approach is that they know in advance the existence of features or themes in the
documents they mined, as well as the domain. In this regard, this is in their advantage, as
preliminary work by Portugal et al. evidenced the ambiguity issue in natural language
free texts, e.g. real estate was identified as an analogy in HCI Usability lingo ("the
amount of space available on a display for an application to provide output") [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
Another work mentioned in the survey [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ], Lim and Finkelstein [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] use a
collaborativefiltering system to identify stakeholders and ask them to indicate other stakeholders to
recommend relevant requirements. This collaborative-filtering system was also used in
[
        <xref ref-type="bibr" rid="ref10 ref11 ref12 ref13">10-13</xref>
        ][
        <xref ref-type="bibr" rid="ref15">15</xref>
        ].
      </p>
      <p>
        Our approach differs from previous work in the sense that we propose a
recommendation system using content-based system, which is based on the semantic content of
data. On the other hand, as we cannot rely on the user preferences that exist in GitHub,
given that projects are mostly ranked for their development purposes, we propose the
use of NLP techniques [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to reveal latent words in readme texts that can be candidate
keywords for better projects recommendation. Finally, work related to recommendation
of GitHub projects can be found in the approach proposed by Guendouz et al [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. This
work is similar to ours in the sense that it predicts useful repositories according to
developer needs. Such predictions are done by exploring the fork perspective based on
users’ activity history.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Processing GitHub Readmes for Projects Recommendation</title>
      <p>
        In this section, we present the activities that support the recommendation of GitHub
projects. These activities, detailed below are implemented in R [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] for our
recommendation system, and use a tool (http://corpus-retrieval.herokuapp.com/) for retrieving
readmes raw data from GitHub given a query [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>
        Our approach took as a sample the readme perspective of GitHub projects because
this perspective is the front end to communicate to humans the features a project
implemented. However, not all readme texts follow this pattern, some of them misses
descriptions, and some misses feature explanations and has instructions for installation
instead. It is worth noting that some projects even do not have a readme text because
its creation is not mandatory on GitHub. Despite these shortcomings, as we are dealing
with hundreds of projects, it is possible to filter a large set of readme texts that can be
used to find meaningful assets as shown in previous works [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ][
        <xref ref-type="bibr" rid="ref9">9</xref>
        ][
        <xref ref-type="bibr" rid="ref18">18</xref>
        ][
        <xref ref-type="bibr" rid="ref19">19</xref>
        ].
      </p>
      <p>
        The retrieval activity is used to automatically extract readme texts by submitting a
query through the GitHub API [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The result is a corpus of 2,155 readme texts ordered
by its default relevance that is given by the GitHub bestmatch function. Due to the
GitHub’s constraints about phrase queries, we applied a match string function in R
[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ][
        <xref ref-type="bibr" rid="ref20">20</xref>
        ] and obtained 1,772 readme texts. We called it corpus R.
      </p>
      <p>
        The filter activity uses the corpus R to perform a filtering step that make use of
POS-tagging techniques [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ][
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] to transform unstructured data (corpus R) into
structured data by distilling important assets, such as verbs, nouns, and proper-nouns. We
called it corpus T. This activity addresses the usual preprocessing tasks in texts such as
removing numbers, whitespaces, and non-alphanumeric terms. Finally, each readme
text in corpus T was exported as a comma-separated values (CSV) file. These files are
available at (https://git.io/v9YJ3) for further data exploration.
      </p>
      <p>
        The discovering activity uses the NLP processing approach [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to discover and
reveal frequent words that may attract user attention due to the lack of user preferences
for readme texts. In this regard, a corpus NP was created using the proper nouns from
corpus T. The frequency of words was computed by
using tf-idf weighting [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], and a Wordcloud
visualization technique was applied to display the data
[
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] (Fig. 1). In corpus NP, the word Zillow (Zillow,
is an online real estate database company) appears
to be the most relevant word in the real estate
domain.
      </p>
      <p>The preference activity aims to simulate the lack
of user preferences. In this regard, we used the
Wordcloud representation (Fig.1) so that the user (a
requirement engineer) can choose what is of their Fig. 1. Frequent proper nouns
particular interest. We have assumed that the user in real estate.
chooses the most relevant word of this domain
Zillow, and with this, we can filter a subset of readme texts. For the Zillow word, 61
readme texts have at least one occurrence of this word. We called this group corpus S.</p>
      <p>
        The clustering activity is used to organize a corpus S for recommendation. On this
subject, we needed to know the optimal number of clusters that was obtained by using
the k-medoids algorithm [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], then we used the k-means algorithm [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ] that based on
the number of cluster determines the similarity of each group. For corpus S, we got two
clusters, cluster A with one readme text and cluster B with the rest. Since cluster B has
most of the readme texts, Cluster B is our GH4RE recommendation for the Zillow word.
      </p>
      <p>
        It is important to note that optimal number of clusters depends on the method used
to find similarities. The k-medoids used in our approach uses the silhouette approach
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] which computes clustering for different values of k (number of clusters), we set a
k-min=2 and k-max=15. For each k, calculate the average silhouette of observations
(avg.sil). Plot the curve of avg.sil according to the number of clusters k. The location
of the maximum is considered as the appropriate number of clusters.
      </p>
      <p>We visualize that our approach allows iterative discovering of latent words in
readme texts, e.g., once the clusters are identified, each cluster can become the input of
the discovery activity to deepen the search of latent words.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Preliminary Assessment</title>
      <p>
        In this section, we present an assessment of 20 readme texts to verify the quality of our
recommendation approach. We queried “real estate zillow” on GitHub and selected the
first 10 readmes (Table 1). We called this group GitHub recommendation. On the other
hand, we selected 10 readmes from our GH4RE recommendation group. We combined
both groups of readme texts randomly, and then we asked six users to assess the
usefulness of the information in the texts. For such measures, we used the Likert scale
technique [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ]. The following text was presented to users:
      </p>
      <p>Imagine a scenario where a client desires an application, for instance, an application for the
Real Estate domain. One of the tasks you may need to perform as a (requirements engineer,
developer, project-manager or designer) is the learning about the Real Estate domain. By
observing the Wordcloud (Fig. 1), you may perceive that Zillow is an important word in this
domain. The excel file (https://git.io/v9YJc) presents 20 links to Readme texts containing
information about Zillow. We want to measure the usefulness of information in texts for the
concept Zillow and for the scenario described.</p>
      <p>Six people with experience in Software Engineering were selected to perform such
assessment. Among these people, three are RE senior researchers, one is an HCI senior
researcher, one is a senior developer, and the last one is a project manager. It is
important to note that all corpuses used in this work, as well as the assessments files, are
available at GitHub for further research and feedback (https://git.io/v9YJ8).</p>
      <p>Each readme text is named following the pattern below to keep the traceability to its
sources:</p>
      <p>Number of original relevance in GitHub.-.userName.-.projectName</p>
      <p>From GH4RE recommendation, we ranked the readme texts according to the
frequency of Zillow word in texts, and then we took the top10 readme texts (Table 1). On
average, each participant took 30 minutes to complete the assessment. Half of the users
performed the assessment online. From our recommendation, the first two readme texts
were rated as extremely useful by five people (83% of users). By contrast, the first two
readme texts of Github’s recommendation were evaluated with the lowest scores
(useless or not very usefull). We note that the assesment of 80% of readmes from our
recommendation range from somewhat useful to extremely useful, which we consider
as a positive result, given the various profile of participants.</p>
      <p>For the readme texts with the lowest score in the GH4RE recommendation,
&lt;https://github.com/hanneshapke/pyzillow&gt; and &lt;https://github.com/imFORZA/re-pro&gt;, we
manually verified their contents and found that the first one describes a client package
as well as its functions, and the second is a brief text indicating the features of a tool.
As this is information that can be better apreciated by a person with a developer profile,
we looked for the assessment of the user with this profile. Contrarily to what we
assumed, we found this user rated those readme texts as not very useful. This result may
lead to several intepretations, since this particular assessment (with the developer
person) was online, and we could not receive any feedback from him. On the other
hand, for the presential assessment performed with a Requirement Engineer, we found
that his feedback is suitable for this situation: “In general I perceived those readmes
can be useful in different times, for instance the ones I rated with higher values is
because I could easily obtain knowledge about Zillow. However, if my objective after
learning is the reuse of source code, for sure I would use the ones I rated lowest,
because I know they contained development words”.</p>
      <p>Related to GitHub recommendations, 90% of the readme texts were qualified with
the lowest scores. This situation supports our belief that GitHub is envisioned for
development purposes. We verified the first &lt;https://github.com/jdemaris/real&gt;
recommendation and found this readme text has few lines and describe the installation
of a development package related to the Zillow API. As for the verification of outliers,
we found that the unique readme rated as somewhat useful,
&lt;https://github.com/eternalmothra/real_estate_values&gt;, is a brief text explaining one feature
related to Zillow.
5</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>
        GitHub is becoming an ideal information source for research related to Software
Engineering. However, most of the works have explored GitHub projects from the viewpoint
of code developers. Despite taking the code developers’ viewpoint, the approach
proposed by Guendouz et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] is similar to ours in the sense that it predicts useful
repositories by exploring the fork perspective based on the users’ activity history.
      </p>
      <p>Our use of clustering is geared towards recommending potential usefulness of
GitHub projects as to empower requirements engineers with domain knowledge that
will be useful in performing requirements elicitation. The results so far show that our
approach for recommending projects based on the readme perspective performs better
than the direct querying the GitHub base. Such result is important as it improves our
overall goal of using GitHub mining as a key strategy for requirements elicitation.</p>
      <p>Future work will continue to improve the clustering strategy and will explore other
GitHub perspectives from the viewpoint of requirements elicitors. Moreover, we plan
to further validate our approach in the context of other domains, involving more
participants from both academia and industry. In particular, we would like to encapsulate our
approach as APIs for public use, and try to directly get feedback from end users.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgement</title>
      <p>R. Portugal acknowledges the support of Capes. J.C. Leite and M.A. Casanova
acknowledges the support of CNPq. J.C. Leite thanks Faperj (Cientista do Nosso
Estado) support, as well. Tong Li acknowledges the support of Startup Funding
No.007000514116022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Utilizing recommender systems to support software requirements elicitation</article-title>
          .
          <source>Proc. 2nd International Workshop on Recommendation Systems for Software Engineering</source>
          . pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          . ACM. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Pazzani</surname>
            <given-names>MJ</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Billsus</surname>
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Content-based recommendation systems</article-title>
          .
          <source>In The adaptive web</source>
          . pp.
          <fpage>325</fpage>
          -
          <lpage>341</lpage>
          . Springer Berlin Heidelberg. (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Fleischman</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovy</surname>
            <given-names>E.</given-names>
          </string-name>
          :
          <article-title>Recommendations without user preferences: a natural language processing approach</article-title>
          .
          <source>Proc. 8th Int'l. Conf. on Intelligent user interfaces</source>
          . pp.
          <fpage>242</fpage>
          -
          <lpage>244</lpage>
          (
          <year>2003</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Portugal</surname>
            <given-names>R.L.Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roque</surname>
            <given-names>H.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leite</surname>
            <given-names>J.C.S.P.:</given-names>
          </string-name>
          <article-title>A Corpus Builder: Retrieving Raw Data from GitHub for Knowledge Reuse in Requirements Elicitation</article-title>
          .
          <source>3rd Annual Int'l. Symposium on Information Management and Big Data</source>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Leite</surname>
            <given-names>J.C.S.P.:</given-names>
          </string-name>
          <article-title>Viewpoints on viewpoints</article-title>
          .
          <source>Joint Proc. of the 2nd Int'l. Software architecture workshop (ISAW-2) and international workshop on multiple perspectives in software development (Viewpoints' 96) on SIGSOFT'96 workshops</source>
          . pp.
          <fpage>285</fpage>
          -
          <lpage>288</lpage>
          . ACM. (
          <year>1996</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Leite</surname>
            ,
            <given-names>J. C. S. P</given-names>
          </string-name>
          and
          <string-name>
            <surname>Freeman</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <article-title>"Requirements validation through viewpoint resolution</article-title>
          .
          <source>IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>17</volume>
          , no.
          <issue>12</issue>
          , pp.
          <fpage>1253</fpage>
          -
          <lpage>1269</lpage>
          , (
          <year>1991</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Mohebzada</surname>
            <given-names>JG</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ruhe</surname>
            <given-names>G</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Eberlein</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>Systematic mapping of recommendation systems for requirements engineering</article-title>
          .
          <source>Proc. Int'l Conf. on Software and System Process</source>
          . pp.
          <fpage>200</fpage>
          -
          <lpage>209</lpage>
          . IEEE Press.
          <article-title>(</article-title>
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Maalej</surname>
            <given-names>W</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thurimella</surname>
            <given-names>AK</given-names>
          </string-name>
          .
          <article-title>Towards a research agenda for recommendation systems in requirements engineering</article-title>
          .
          <source>Proc. 2nd Int'l. Workshop on Managing Requirements Knowledge</source>
          . pp.
          <fpage>32</fpage>
          -
          <lpage>39</lpage>
          . IEEE Computer Society. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Portugal</surname>
            <given-names>R.L.Q.</given-names>
          </string-name>
          , do Prado Leite J.C.,
          <string-name>
            <surname>Almentero</surname>
            <given-names>E</given-names>
          </string-name>
          .
          <article-title>Time-constrained requirements elicitation: reusing GitHub content. In Just-In-Time Requirements Engineering (JITRE)</article-title>
          .
          <source>IEEE Workshop</source>
          . pp.
          <fpage>5</fpage>
          -
          <lpage>8</lpage>
          . IEEE. (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Using data mining and recommender systems to facilitate large-scale, open, and inclusive requirements elicitation processes</article-title>
          .
          <source>Proc.16th IEEE Int'l. Requirements Engineering Conf</source>
          . pp.
          <fpage>165</fpage>
          -
          <lpage>168</lpage>
          . IEEE. (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duan</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            <given-names>B.</given-names>
          </string-name>
          <article-title>A recommender system for requirements elicitation in large-scale software projects</article-title>
          .
          <source>Proc. Symposium on Applied Computing</source>
          . pp.
          <fpage>1419</fpage>
          -
          <lpage>1426</lpage>
          . ACM. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Enhancing stakeholder profiles to improve recommendations in online requirements elicitation</article-title>
          .
          <source>In 17th IEEE International Requirements Engineering Conference</source>
          . pp.
          <fpage>37</fpage>
          -
          <lpage>46</lpage>
          . IEEE. (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          .
          <article-title>Utilizing recommender systems to support software requirements elicitation</article-title>
          .
          <source>In Proceedings of the 2nd International Workshop on Recommendation Systems for Software Engineering</source>
          . pp.
          <fpage>6</fpage>
          -
          <lpage>10</lpage>
          . ACM. (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Lim</surname>
            <given-names>SL</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finkelstein</surname>
            <given-names>A</given-names>
          </string-name>
          .
          <article-title>StakeRare: using social networks and collaborative filtering for large-scale requirements elicitation</article-title>
          .
          <source>IEEE Trans. on Software Eng</source>
          . pp.
          <fpage>707</fpage>
          -
          <lpage>35</lpage>
          . (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Hariri</surname>
            <given-names>N</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Castro-Herrera</surname>
            <given-names>C</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cleland-Huang</surname>
            <given-names>J</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mobasher</surname>
            <given-names>B</given-names>
          </string-name>
          .
          <article-title>Recommendation systems in requirements discovery</article-title>
          .
          <source>Recommendation Systems in Software Eng</source>
          . pp.
          <fpage>455</fpage>
          -
          <lpage>476</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Guendouz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Hamou</surname>
            ,
            <given-names>R. M.</given-names>
          </string-name>
          <article-title>Recommending relevant GitHub repositories: a collaborative-filtering approach</article-title>
          .
          <source>on Networking and Advanced Systems</source>
          ,
          <volume>34</volume>
          . (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Team</surname>
            <given-names>RC</given-names>
          </string-name>
          . R:
          <article-title>A language and environment for statistical computing</article-title>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Portugal</surname>
            <given-names>R.L.Q</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leite</surname>
            <given-names>J.C.S.P.</given-names>
          </string-name>
          :
          <article-title>Extracting Requirements Patterns from Software Repositories</article-title>
          .
          <source>In Requirements Patterns (RePa)</source>
          ,
          <source>IEEE 6th International Workshop</source>
          . (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Portugal</surname>
            <given-names>R.L.Q.</given-names>
          </string-name>
          :
          <article-title>Mineração de Informação em Linguagem Natural para Apoiar a Elicitação de Requisitos</article-title>
          .
          <source>MSc. Dissertation</source>
          . PUC-Rio University, Rio de Janeiro, Brasil. (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Rinker</surname>
          </string-name>
          , T. W. qdap:
          <source>Quantitative Discourse Analysis Package. 2.2</source>
          .5. University at Buffalo. Buffalo, New York. http://github.com/trinker/qdap. (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Schmid H</surname>
          </string-name>
          .
          <article-title>Probabilistic part-of-speech tagging using decision trees</article-title>
          .
          <source>In New methods in language processing</source>
          . p.
          <fpage>154</fpage>
          .
          <string-name>
            <surname>Routledge</surname>
          </string-name>
          . (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Michalke</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>koRpus: An R Package for Text Analysis (Version 0</article-title>
          .
          <fpage>06</fpage>
          -
          <lpage>5</lpage>
          ). Available from http://reaktanz.de/?c=hacking&amp;s=koRpus (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Hiemstra</surname>
            <given-names>D.</given-names>
          </string-name>
          <article-title>A probabilistic justification for using tf×idf term weighting in information retrieval</article-title>
          .
          <source>International Journal on Digital Libraries. Aug</source>
          <volume>1</volume>
          ;
          <issue>3</issue>
          (
          <issue>2</issue>
          ):
          <fpage>131</fpage>
          -
          <lpage>9</lpage>
          . (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <given-names>Ian</given-names>
            <surname>Fellows</surname>
          </string-name>
          . Wordcloud:
          <article-title>Pretty word clouds</article-title>
          .
          <source>Package 2</source>
          .5. https://CRAN.Rproject.org/package=wordcloud. (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Kaufman</surname>
          </string-name>
          , L. and
          <string-name>
            <surname>Rousseeuw</surname>
            ,
            <given-names>P.J.</given-names>
          </string-name>
          ,
          <year>1990</year>
          .
          <article-title>Partitioning around medoids (program pam)</article-title>
          .
          <article-title>Finding groups in data: an introduction to cluster analysis</article-title>
          , pp.
          <fpage>68</fpage>
          -
          <lpage>125</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Jain</surname>
            <given-names>AK</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dubes</surname>
            <given-names>RC</given-names>
          </string-name>
          .
          <article-title>Algorithms for clustering data</article-title>
          . Prentice-Hall, Inc.; (
          <year>1988</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <given-names>R.A.</given-names>
            <surname>Likert</surname>
          </string-name>
          .
          <article-title>A technique for the measurement of attitudes Archives</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>