=Paper= {{Paper |id=Vol-1164/PaperPoster06 |storemode=property |title=SOVA - A Tool for Semantic and Ontological Variability Analysis |pdfUrl=https://ceur-ws.org/Vol-1164/PaperDemo06.pdf |volume=Vol-1164 |dblpUrl=https://dblp.org/rec/conf/caise/ItzikR14 }} ==SOVA - A Tool for Semantic and Ontological Variability Analysis== https://ceur-ws.org/Vol-1164/PaperDemo06.pdf
SOVA – A Tool for Semantic and Ontological Variability
                     Analysis

                            Nili Itzik and Iris Reinhartz-Berger

               Department of Information Systems, University of Haifa, Israel
                      nitzik@campus.haifa.ac.il, iris@is.haifa.ac.il



       Abstract. Variability analysis in Software Product Line Engineering (SPLE)
       utilizes various software-related artifacts, including requirements specifications.
       Currently, measuring the similarity of requirements specifications for analyzing
       variability of software products mainly takes into account semantic considera-
       tions. This might lead to failure to capture important aspects of the software be-
       havior as perceived by users. In this paper we present a tool, called SOVA –
       Semantic and Ontological Variability Analysis, which introduces ontological
       considerations to variability analysis, in addition to the semantic ones. The in-
       put of the tool is textual requirements statements organized in documents. Each
       document represents the expectations from or the characteristics of a different
       software product in a line, while each requirement statement represents an ex-
       pected behavior of that software product. The output is a feature diagram repre-
       senting the variability in the input set of requirements documents and setting the
       ground for behavioral domain analysis.


       Keywords: Software Product Line Engineering, Variability Analysis, Domain
       Analysis, Requirements Specifications, Ontology, Semantic Similarity


1      Introduction

As the complexity and variety of software products increased, the need to reuse soft-
ware-related artifacts became very important. Software Product Line Engineering
(SPLE) suggests an approach to systematically reuse artifacts, such as requirements
specifications, design documents and source code, among different, yet similar, soft-
ware products [‎3], [‎14]. Such reuse of artifacts often raises a significant challenge of
variability management. Variability in this context can be defined as “the ability of an
asset to be efficiently extended, changed, customized, or configured for use in a par-
ticular context” [‎7].
   Viewing software requirements as the drivers of different development activities
and methods, several studies have suggested using requirements specifications for
variability analysis of software products. In these studies, requirements are operation-
alized or realized by features, and variability is mainly represented as feature dia-
grams – tree or graph structures that describe the characteristics of a software product
line and the relationships and dependencies among them [‎8]. The current studies
178   Pre-proceedings of CAISE'14 Forum


commonly apply only semantic similarity metrics, which focus on similarities of ter-
minology, in order to identify and analyze variability. As we will elaborate later, us-
ing only semantic considerations might lead to failure to capture important aspects of
the software behavior, such as its triggers, pre-conditions, and post-conditions.
    In [‎16], we suggest combining semantic and ontological considerations for calcu-
lating similarity. In particular, a behavior is described in terms of the initial state of a
system before the behavior occurs, the external events that trigger the behavior, and
the final state of the system after the behavior occurs. We use semantic metrics to
evaluate the similarity of related behavioral elements and utilize this similarity to
analyze variability. To support this approach, we have developed a tool, called SOVA
– Semantic and Ontological Variability Analysis. This tool gets requirements docu-
ments written in plain text. Each document represents a different software product in
the line and is divided into requirements statements. Each requirement statement,
which may be composed of several sentences, reflects a use case, a user story, or any
unit that represents a single expected or existing behavior of a software product. The
variability of requirements is then analyzed, yielding a feature diagram. The resultant
feature diagrams are behavior-driven and set the ground for behavioral domain analy-
sis.
    The rest of this paper is structured as follows. Section 2 reviews related work, ex-
emplifying limitations of current approaches. Section 3 presents the main processes of
the approach and their support in the SOVA tool. Finally, Section 4 summarizes and
refers to future development plans.


2      Related Work
In the context of analyzing software products variability, different studies have sug-
gested ways to use textual requirements to generate variability models, such as feature
diagrams or Orthogonal Variability Models (OVM) [‎14].
    In [‎19], a tool, named ArborCraft, is presented. This tool creates feature diagrams
by grouping similar requirements using a hierarchical agglomerative clustering algo-
rithm and semantic similarity measures – Latent Semantic Analysis (LSA) [‎10]. Fea-
ture variants are then identified using a Requirements Description Language and se-
mantic considerations. In [‎4-‎5], publicly available repositories of product descriptions
are utilized. Based on these repositories and the conditional probabilities between
features occurrences, a probabilistic feature diagram is created using an incremental
diffusive clustering algorithm. In [‎13], a semi-automatic method for constructing
OVM diagrams is introduced. This method extracts functional requirements profiles
(FRPs), represented as "verb-direct object" pairs, using expert knowledge and linguis-
tic clues. The variability model is created using heuristic rules, such as: “If diverse
values are identified for a case, then alternative choice(s) should be made.”
    All the above methods employ only semantic considerations. In particular, they
may result with high similarity values for requirements that use similar terminology,
even if the pre-conditions, the triggers, and the post-conditions of the corresponding
behaviors are different. For example, the requirements “The system should be able to
                              SOVA – A Tool for Semantic and Ontological Variability Analysis                                               179


report on any user update activities” and “Any user should be able to report system
activities” may result in a very high value of semantic similarity, since both refer to
“system”, “user”, and “report”. In fact, LSA [‎10] results in a similarity value of 1 for
these requirements, implying that their semantic meanings are identical. However,
these requirements are quite different: the first requirement represents behavior that is
internal and likely aims at detecting suspicious user update activities. The second
requirement, on the other hand, represents a behavior triggered by an external user
who intends to report his/her system activities.
   Another limitation of current studies is that they take into consideration the full
text of a requirement statement. Such statements might include aspects (e.g., interme-
diate outcomes) that are less or not relevant for analyzing variability from an external
perspective of a user or a customer. Such a view of the expected behaviors of soft-
ware systems is important for reaching different reuse decisions, e.g., when conduct-
ing feasibility studies, estimating software development efforts, or adopting SPLE.
   To overcome the above limitations, we proposed in [‎16] to combine semantic and
ontological considerations when calculating similarity and analyzing variability. We
further demonstrated that our approach outperforms LSA when examining the simi-
larity of functional requirements. Here we present the tool we have developed to sup-
port that approach. The tool is named SOVA – Semantic and Ontological Variability
Analysis.


3      The SOVA Tool

Fig. 1 presents the main processes supported by the SOVA tool, namely requirements
parsing, behavioral similarity calculation, and feature diagram creation. Next we
elaborate on each process and its support in the tool. Additional material can be found
at http://mis.hevra.haifa.ac.il/~iris/research/SOVA/.
        Bunge’s ontology:                                                       Including:
        - Initial state          Ontology                 NLP techniques        - Semantic role labeling: agent, action, instrument, etc.
        - Event                                                                 - Temporal ordering using temporal graphs
        - Final state                                                           - Pronoun replacement: he, she, it, his, her, its, etc.



                                                Requirements                                      Parsed
     Requirements
                                                   parsing                                     requirements


                                                        Semantic measures:      Semantic                                      Behavioral
                                                        - Wu & Palmer
                                                        - Mihalcea, Corley &
                                                                                similarity                                     similarity
                                                        Strapparava (MCS)       measures                                      calculation


                                                        Feature
              Feature                                                                          Similarity
                                                        diagram
              diagram                                                                            matrix
                                                        creation

                                                                               Legend:
                 Including:                           Clustering &             Object                            Input/Output
                 - Hierarchical agglomerative
                 clustering
                                                         mining
                 - Relationships and                  techniques               Process                           Instrument
                 constraints mining


         Fig. 1. An overview of the processes and flows supported by the SOVA tool
180    Pre-proceedings of CAISE'14 Forum


3.1       Requirements Parsing
During the first step, the input requirements are parsed. This is done by two main
instruments: natural language processing (NLP) techniques and an ontological model.
     First, a semantic role labeling (SRL) approach [‎6] is used to associate the parts of
a requirement statement with their specific semantic roles. Five semantic roles are
currently supported due to their special importance to requirements in general and
functional requirements in particular: (1) Agent – Who performs? (2) Object (a.k.a.
Patient) – On what objet is it performed? (3) Instrument – How is it performed? (4)
Temporal modifier (AM-TMP) – When is it performed? And (5) Adverbial modifier
(AM-ADV) – In what conditions is it performed? A sixth label – Action – is handled
to answer the question: What is performed? This label holds the sentence’s predicate
or verb.
     Considering those labels and applying temporal order [‎11] and coreference resolu-
tion1 [‎15] techniques, the tool identifies behavioral vectors, each representing an ac-
tion or a pre-condition. Using concepts taken from Bunge's ontological model [‎1-‎2],
the behavioral vectors are then classified into initial states that represent pre-
conditions of the behavior, external events that trigger the behavior, and final states
that represent post-conditions or outputs of the behavior. These three “types” of be-
havioral elements (namely, initial states, external events, and final states) were sug-
gested in [‎17-‎18] for defining an external view of behavior. The classification of the
vectors to these behavioral elements is mainly done by analyzing the agent and the
action parts of the vectors and using the temporal order of the vectors [‎16].
    The screenshot presented in Fig. 2 exemplifies the outcome of the parsing require-
ments activity. The field at the top of this screen enables choosing a particular re-
quirements file and browsing its requirements statements (in the middle part of this
screen). Each requirement statement includes one or more sentences. Each sentence
appears in a separate row, where the number to its left indicates the requirement to
which it belongs. Requirement 2, for example, is composed of two sentences. Choos-
ing a particular sentence displays the parsing of the entire requirement to which the
sentence belongs in the bottom part of this screen. The second requirement in Fig. 2,
for example, is parsed into three behavioral vectors. The first vector is classified as an
initial state, since it represents a pre-condition (labeled as a temporal modifier). The
second vector, representing a login operation, is classified as an external event, since
it is performed by an external agent – the librarian. Finally, the third vector is classi-
fied as a final state, as it describes an internal operation performed by the system after
the librarian logins.
    During the parsing process, the tool further supports interactions with the user,
namely, a requirements engineer or a domain analyst. In particular, the user can edit
the ontological class, change the order of the parsed behavioral vectors, update the
original requirements, and view the semantic role labeling output (the SRL button).



1
    Coreference resolution replaces pronouns (e.g., he, she, and it) with their anaphors (i.e., the
     nouns to which they refer).
                    SOVA – A Tool for Semantic and Ontological Variability Analysis     181




                  Fig. 2. A screenshot of the requirements parsing outcome


3.2    Behavioral Similarity Calculation
In the second process, the behavioral similarity of each pair of requirements (either
from the same document or from different documents) is calculated. The behavioral
similarity is the weighted average of the semantic similarities of their behavioral vec-
tors. In other words, the behavioral similarity is the weighted average of the semantic
similarities of their initial states, external events, and final states. For calculating the
semantic similarities of the behavioral elements different semantic measures can be
used. Here we use MCS [‎12] to measure phrases’ similarity and Wu and Palmer [‎20]
to measure words’ similarity. The user can further set the weights for agents, actions,
objects, and instruments similarities. Perceiving agents and actions as the dominant
components in behavioral vectors similarities, Fig. 3 exemplifies the outcome of the
behavioral similarity calculation process in SOVA, using 0.3, 0.4, 0.2, and 0.1 for
weighting agents, actions, objects, and instruments, respectively. The screen displays
(in the right side) the initial state, external event, final state, and overall similarities
for each pair of requirements in the source files. The overall similarity is calculated
using initial state, external event and final state weights of 0.2, 0.3, and 0.5, respec-
tively, perceiving the final state as the most influencing factor on the overall similari-
ty.
    In Fig. 3, for example, the first pair of requirements (the ninth requirement in the
first input file and the forth requirement in the third input file) represents different
cases (initial states) and responses (final states), but similar interactions (external
events) in which someone (visitor or borrower) reaches the new flash page of the
library. The requirements in the second row represent very similar behaviors, which
differ only in their agents (users vs. librarians). Finally, the requirements in the third
row represent completely different behaviors.
182   Pre-proceedings of CAISE'14 Forum




             Fig. 3. A screenshot of the behavioral similarity calculation outcome


3.3    Feature Diagram Creation
In the third process, we use the calculated similarity values in order to create a feature
diagram that represents the variability found in the input requirements documents. To
this end, we utilize a hierarchical agglomerative clustering algorithm. This algorithm
starts with putting each requirement in a separate cluster. In each iteration, the algo-
rithm merges the closest clusters, namely, clusters whose average requirements’ simi-
larities is the highest. The output of this algorithm is a binary tree of clusters. To bet-
ter represent the analyzed variability, another pass is performed to flatten sub-trees
whose similarities are alike. To demonstrate this pass, consider the schematic tree in
the left side of Fig. 4. The leaves of this tree represent requirements (or actually clus-
ters with single requirements), numbered 1 to 5, while the inner nodes represent clus-
ters with several requirements. Each inner node exhibits its identity (e.g., C1:2_4) and
the overall similarity of the constituting requirements. Note that the sub-tree whose
root is C1:2_4 includes very similar requirements, namely R1, R2, and R4. Therefore,
in the flatten tree (in the right side of the figure), the three requirements have the same
parent. In contrast, the node C3_1:2:4 holds a requirement, R3, which is quite differ-
ent from the other related requirements, R1, R2, and R4. Thus, grouping the four re-
quirements together is unjustified. Instead R3 and C1:2_4 become siblings in the flat-
ten tree.




  Fig. 4. Illustration of flattening the clustering outcome in the feature diagram creation stage
                    SOVA – A Tool for Semantic and Ontological Variability Analysis     183


    Optionality as well as OR- and XOR-grouped features are deduced examining the
appearance of the different requirements in the input requirements documents. The
final output is presented in featureIDE format. FeatureIDE is an eclipse plug-in that
supports different phases of the feature-oriented software development [‎9]. It is user
friendly. In particular, the feature diagrams can be presented horizontally or vertical-
ly, the requirements can be presented as description of leave nodes, and the diagrams
can be exported to a variety of feature diagram formats.
    The SOVA tool enables generating feature diagrams according to different behav-
ioral views, namely, considering only the similarity of the initial states, the external
states, the final states, or the overall behaviors. Thus, and as opposed to existing ap-
proaches and tools, the variability of the requirements can be analyzed from different
perspectives. For example, considering only the similarity of final states may provide
an output-driven variability perspective, while considering the external events pro-
vides a functional variability perspective.


4      Summary and Future Work

We presented a tool, named SOVA – Semantic and Ontological Variability Analysis –
that supports identifying and analyzing behavioral variability of software products
based on requirements specifications. The tool combines semantic and ontological
considerations through a three stage process that includes parsing the requirements
using NLP techniques and Bunge’s ontological model, calculating the behavioral
similarity of software requirements using semantic measures, and generating feature
diagrams using a hierarchical agglomerative clustering algorithm. All these processes
are done automatically and the user is only required to set weights for the different
semantic similarities.
   In the future, we intend to extend the tool support in several ways. First, we intend
to involve the user throughout the process and to allow him/her to provide intermedi-
ate feedback which will be taken into consideration in the following stages. Second,
we intend to derive state variables from intermediate states and not just from initial
and final states. These state variables may further help identify the commonality and
variability of software products by refining the external view. Finally, we intend to
handle requirements statements that represent “swarms” of behaviors (including
branches and loops) and not just single ones. This will enable us to analyze relation-
ships between requirements and not just individual requirements.


References
1. Bunge, M. (1977). Treatise on Basic Philosophy, vol. 3, Ontology I: The Furniture of the
   World. Reidel, Boston, Massachusetts.
2. Bunge, M. (1979). Treatise on Basic Philosophy, vol. 4, Ontology II: A World of Systems.
   Reidel, Boston, Massachusetts.
3. Clements, P. and Northrop, L. (2001). Software Product Lines: Practices and Patterns. Ad-
   dison-Wesley.
184   Pre-proceedings of CAISE'14 Forum


4. Davril, J. M., Delfosse, E., Hariri, N., Acher, M., Cleland-Huang, J., and Heymans, P.
    (2013). Feature model extraction from large collections of informal product descriptions.
    The 9th Joint Meeting on Foundations of Software Engineering, pp. 290-300. ‫‏‬
5. Dumitru, H., Gibiec, M., Hariri, N., Cleland-Huang, J., Mobasher, B., Castro-Herrera, C.,
    and Mirakhorli, M. (2011). On-demand feature recommendations derived from mining pub-
    lic product descriptions. 33rd IEEE International Conference on Software Engineering
    (ICSE’11), pp. 181-190. ‫‏‬
6. Gildea, D. and Jurafsky, D. (2002). Automatic Labeling of Semantic Roles. Computational
    Linguistics 28 (3), pp. 245-288.
7. Jaring, M. (2005). Variability engineering as an Integral Part of the Software Product Fami-
    ly Development Process, Ph.D. thesis, The Netherlands.
8. Kang, K. C., Cohen, S. G., Hess, J. A., Novak, W. E., and Peterson, A. S. (1990). Feature-
    oriented domain analysis (FODA) – feasibility study. Technical report no. CMU/SEI-90-
    TR-21). Carngie-Mellon University, Pittsburgh.‫‏‬
9. Kastner, C., Thum, T., Saake, G., Feigenspan, J., Leich, T., Wielgorz, F., and Apel, S.
    (2009). FeatureIDE: A tool framework for feature-oriented software development. 31st
    IEEE International Conference on Software Engineering (ICSE’09), pp. 611-614.‫‏‬
10. Landauer, T. K., Foltz, P. W., and Laham, D. (1998). Introduction to Latent Semantic Anal-
    ysis. Discourse Processes, 25, pp. 259-284.
11. Mani, I., Verhagen, M., Wellner, B., Lee, C. M., and Pustejovsky, J. (2006). Machine learn-
    ing of temporal relations. In Proceedings of the 21st International Conference on Computa-
    tional Linguistics and the 44th annual meeting of the Association for Computational Lin-
    guistics, pp. 753-760.
12. Mihalcea, R., Corley, C., and Strapparava, C. (2006). Corpus-based and knowledge-based
    measures of text semantic similarity. The 21st national conference on Artificial intelligence
    (AAAI’2006), Vol. 1, pp. 775-780. ‫‏‬
13. Niu, N. and Easterbrook, S. (2008). Extracting and modeling product line functional re-
    quirements. In the 16th IEEE International Requirements Engineering conference (RE’08),
    pp. 155-164.
14. Pohl, K., Böckle, G., and van der Linden, F. (2005) Software Product-line Engineering:
    Foundations, Principles, and Techniques, Springer.
15. Raghunathan, K., Lee, H., Rangarajan, S., Chambers, N., Surdeanu, M., Jurafsky, D., and
    Manning, C. (2010). A Multi-Pass Sieve for Coreference Resolution. The conference on
    Empirical Methods in Natural Language Processing (EMNLP’10), pp. 492-501. ‫‏‬
16. Reinhartz-Berger, I., Itzik, N., and Wand, Y. (2014). Analyzing Variability of Software
    Product Lines Using Semantic and Ontological Considerations Proceedings of the 26th in-
    ternational conference on Advanced Information Systems Engineering (CAiSE’14), LNCS
    8484, pp. 150-164.
17. Reinhartz-Berger, I., Sturm, A., and Wand, Y. (2013). Comparing Functionality of Software
    Systems: An Ontological Approach. Data & Knowledge Engineering87, pp. 320-338.
18. Reinhartz-Berger, I., Sturm, A., and Wand, Y. (2011). External Variability of Software:
    Classification and Ontological Foundations. The 30th International Conference on Concep-
    tual Modeling (ER'2011), LNCS 6998, pp. 275-289.
19. Weston, N., Chitchyan, R., and Rashid, A. (2009). A framework for constructing semanti-
    cally composable feature models from natural language requirements. In Proceedings of the
    13th International Software Product Line Conference, pp. 211-220.
20. Wu, Z. and Palmer, M. (1994). Verbs semantics and lexical selection. The 32nd annual
    meeting on Association for Computational Linguistics, pp. 133-138. ‫‏‬