Towards a time-efficient algorithm to calculate the total number of products of a Software Product Line Completed Research Ruben Heradio Gil1 and David Fernandez Amoros2 1 Dpto. de Ingenieria de Software y Sistemas Informaticos Universidad Nacional de Educacion a Distancia Madrid, Spain rheradio@issi.uned.es 2 Dpto. de Lenguajes y Sistemas Informaticos Universidad Nacional de Educacion a Distancia david@lsi.uned.es Abstract. Feature Diagrams (FDs) are widely used to scope the domain of Software Product Lines (SPLs). In addition, valuable information can be inferred from FDs; for instance, the total number of possible products of a SPL. A common approach to calculate the number of products is translating FDs into propositional logic formulas, which are processed by off-the-shelf tools, such as SAT solvers. However, this approach only works for small FDs. We think more scalable solutions can be reached avoiding the diagram-to-logic transformation and taking advantage of the tree organization of FDs. Due to the profusion of feature modeling notations, this paper formally defines a pivot language, named Neutral Feature Tree (NFT), where FDs are restricted to be trees. Most popular FD notations can be easily and efficiently translated to NFT. The pa- per also proposes a general and time-efficient algorithm to calculate the number of products without considering crosstree constraints. 1 Introduction Software Product Line (SPL) practice has become an important and widely used approach for the efficient development of complete portfolios of software products [17]. The domain of a SPL must be carefully scoped, identifying the common and variable requirements of its products and the interdependencies between requirements. In a bad scoped domain, relevant requirements may not be implemented, and some implemented requirements may never be used, causing unnecessary complexity and both development and maintenance costs [6]. To avoid these serious problems, SPL domains are usually modeled with variability languages. Since FODA’s feature modeling language was proposed in 1990 [14], a number of extensions and alternative languages have been devised to model variability in families of related systems: Proceedings of DE@CAiSE'2009 1. As part of the following methods: FORM [15], FeatureRSEB [11], Generative Programming [6], Software Product Line Engineering [17], PLUSS [7]. 2. In the work of the following authors: M. Riebisch et al. [19], J. van Gurp et al. [24], A. van Deursen et al. [23], H. Gomaa [9]. 3. As part of the following tools: Gears [3] and pure::variants [18]. Unfortunately, this profusion of languages hinders the efficient communica- tion among specialists and the portability of variability models between tools. In order to face this problem, P. Schobbens et al. [20, 13, 16] propose the Varied Feature Diagram+ (VFD+ ) as a pivot notation for variability languages. VFD+ is expressively complete and many variability languages can be easily and effi- ciently translated into it. D. Benavides [2] has surveyed a number of analysis operations that infer valuable information from feature diagrams. One these operations is calculating the total number of products of a SPL. This value is used by economic models, such as the Structured Intuitive Model for Product Line Economics (SIMPLE)[5] and the Constructive Product Line Investment Model (COPLIMO)[4], to esti- mate the SPL costs and benefits. For instance, SIMPLE estimates the cost of building a SPL using equation 1, where: Corg expresses how much it costs for an organization to adopt the SPL approach, Ccab is the cost of developing the SPL core asset base 3 , n is the number of products of the SPL, Cunique (producti ) is the cost of developing the unique parts of a product, and Creuse (producti ) is the development cost of reusing core assets to build a product. n X CSPL = Corg + Ccab + (Cunique (producti ) + Creuse (producti )) (1) i=1 The importance of knowing the number of products of a feature diagram is recognized by many commercial tools for SPL development, such as Gears and pure::variants, which provide the calculation without considering textual constraints between features 4 . On the other hand, some researchers have pro- posed a full calculation, that includes textual constraints, by translating feature diagrams d into some kind of logic. For instance: – D. Batory [1] proposes a translation of d into propositional logic. Resulted formulas are processed by off-the-shelf Logic-Truth Maintenance Systems and Boolean Satisfiability (SAT) solvers. – D. Benavides [2] provides an abstract conversion of d into Constraint Sat- isfaction Problems (CSP). FaMa Tool Suite [22] adapts this abstract con- version to general CSP solvers, SAT solvers and Binary Decision Diagrams (BDD) solvers. 3 According to the SPL approach, products are built from a core asset base, a collection of artifacts that have been designed specifically for reuse across the SPL. 4 Textual constraints will be explained in section 2.2. Proceedings of DE@CAiSE'2009 – Gheyi et al. [8] provides a translation of d into Alloy’s logic. Internally, Alloy checks models by converting them into propositional logic and using off-the- shelf SAT solvers. Unfortunately, the diagram-to-logic approach is only scalable for small fea- ture diagrams. However, in SPLs with complex domains, companies have re- ported feature models with over 1000 features (e.g., M. Steger et al. [21] present a case study with 5200 features). This paper proposes a time-efficient algorithm to calculate the SPL number of products without considering textual constraints. Thus, we provide an upper bound for the number of products. In contrast with evaluated commercial tools, the algorithm is fully general and deals with a VFD+ subset, named Neutral Feature Tree (NFT), where diagrams are restricted to be trees5 . This paper formally defines NFT and demonstrates that NFT and VFD+ are completely equivalent. We think NFT is a good starting point for future implementations of analysis operations which take advantage of feature diagrams structured as trees. The remainder of this paper is structured as follows. Section 2 formally de- fines the abstract syntax and semantics of NFT. Section 3 presents the sketch of our algorithm6 . Finally, section 4 summarizes the paper and outlines directions for future work. 2 An abstract notation for modeling SPL variability Section 2.1 outlines the main parts of a formal language; sections 2.2 and 2.3 de- fine the abstract syntax and semantics of NFT, respectively; and finally, section 2.4 demonstrates the equivalence between NFT and VFD+ . We emphasize NFT is not meant as a user language, but only as a formal “back-end” language used to define semantics and allow for automated process- ing. 2.1 Anatomy of a formal language According to J. Greenfield et al. [10], the anatomy of a formal language includes an abstract syntax, a semantics and one or more concrete syntaxes. 1. The abstract syntax of a language characterizes, in an abstract form, the kinds of elements that make up the language, and the rules for how those elements may be combined. All valid element combinations supported by an abstract syntax conform the syntactic domain L of a language. 5 VFD+ diagrams are single-rooted Directed Acyclic Graphs. 6 There is available an implementation of the algorithm on http://www.issi.uned.es/miembros/pagpersonales/ruben heradio/rheradio english.html Proceedings of DE@CAiSE'2009 2. The semantics of a language define its meaning. According to D. Harel et al. [12], a semantic definition consists of two parts: a semantic domain S and a semantic mapping M from the syntactic domain to the semantic domain. That is, M : L → S. 3. A concrete syntax defines how the language elements appear in a concrete, human-usable form. Following sections define NFT abstract syntax and semantics. Most variabil- ity languages may be considered as concrete syntaxes or “views” of NFT. 2.2 Abstract syntax of NFT A NFT diagram d ∈ LNFT is a tuple (N, Σ, r, DE, λ, φ), where: 1. N is the set of nodes of d, among r is the root. Nodes are meant to represent features. The idea of feature is of widespread usage in domain engineering and it has been defined as a “distinguishable characteristic of a concept (e.g., system, component and so on) that is relevant to some stakeholder of the concept” [6]. 2. Σ ⊂ N is the set of terminal nodes (i.e., the leaves of d). 3. DE ⊆ N ×N is the set of decomposition edges; (n1 , n2 ) ∈ DE is alternatively denoted n1 → n2 . If n1 → n2 , n1 is the parent of n2 , and n2 is a child of n1 . 4. λ : (N −Σ) → card labels each non-leaf node n with a card boolean operator. If n has children n1 , ..., ns , cards [i..j](n1 , ..., ns ) evaluates to true if at least i and at most j of the s children of n evaluate to true. Regarding the card operator, the following points should be taken into account7 : (a) whereas many variability notations distinguish between mandatory, op- tional, or and xor dependencies, card operator generalizes these cate- gories. For instance, figure 1 depicts equivalences between the feature notation proposed by K. Czarnecki et al. [6] and NFT. (b) whereas, in many variability notations, children nodes may have different types of dependencies on their parent, in NFT all children must have the same type of dependency. This apparent limitation can be easily overcome by introducing auxiliary nodes. For instance, figure 2 depicts the equivalence between a feature model and a NFT diagram. Node A has three children and two types of dependencies: A → B is mandatory and (A → C, A → D) is a xor -group. In the NFT diagram, the different types of dependencies are modeled by introducing the auxiliary node aux. 5. φ8 are additional textual constraints written in propositional logic over any type of node (φ ∈ B(N )). Additionally, d must satisfy the following constraints: 1. Only r has no parent: ∀n ∈ N · (∃n0 ∈ N · n0 → n) ⇔ n 6= r. 7 The same considerations are valid for VFD+ . 8 also named cross-tree constraints [2]. Proceedings of DE@CAiSE'2009 2. d is a tree. Therefore, (a) a node may have at most one parent: ∀n ∈ N · (∃n0 , n00 ∈ N · ((n0 → n) ∧ (n00 → n) ⇒ n0 = n00 )) (b) DE is acyclic: @n1 , n2 . . . , nk ∈ N · n1 → n2 → . . . → nk → n1 . 3. card operators are of adequate arities: ∀n ∈ N ·(∃n0 ∈ N ·n → n0 ) ⇒ (λ(n) = cards )∧(s = k{(n, n0 )|(n, n0 ) ∈ DE}k) Fig. 1. card operator generalizes mandatory, optional, or and xor dependencies Fig. 2. Different types of dependencies between a node and its children can be expressed in NFT by introducing auxiliary nodes 2.3 Semantics of NFT Feature diagrams are meant to represent sets of products, and each product is seen as a combination of terminal features. Hence, the semantic domain of NFT is P(P(Σ)), i.e., a set of sets of terminal nodes. The semantic mapping of NFT (MNFT : LNFT → P(P(Σ))) assigns to every feature diagram d, a product line SPL, according to the next definitions: Proceedings of DE@CAiSE'2009 1. A configuration is a set of features, that is, any element of P(N ). A confi- guration c is valid for a d ∈ LNFT , iff: (a) The root is in c (r ∈ c). (b) The boolean value associated to the root is true. Given a configuration, any node of a diagram has associated a boolean value according to the following rules: i. A terminal node t ∈ Σ evaluates to true if it is included in the configuration (t ∈ c), else evaluates to false. ii. A non-terminal node n ∈ (N − Σ) is labeled with a card operator. If n has children n1 , ..., ns , cards [i..j](n1 , ..., ns ) evaluates to true if at least i and at most j of the s children of n evaluate to true. (c) The configuration must satisfy all textual constraints φ. (d) If a non-root node s(s 6= r) is in the configuration, one of its parents n, called its justification, must be too: ∀s ∈ N · s ∈ c ∧ s 6= r · ∃n ∈ N · n ∈ c ∧ (n, s) ∈ DE. 2. A product p, named by a valid configuration c, is the set of terminal features of c: p = c ∩ Σ. 3. The product line SPL represented by d ∈ LNFT consists of the products named by its valid configurations (SPL ∈ P(P(Σ))). 2.4 Equivalence between NFT and VFD+ NFT differentiates from VFD+ in the following points: 1. Terminal nodes vs. primitive nodes. As noted by some authors [1], there is currently no agreement on the following question: are all features equally relevant to define the set of possible products that a feature diagram stands for? In VFD+ , P. Schobbens et al. have adopted a neutral formalization: the modeler is responsible for specifying which nodes represent features that will influence the final product (the primitive nodes P ) and which nodes are just used for decomposition (N − P ). P. Schobbens points that primitive nodes are not necessarily equivalent to leaves, though it is the most common case. However, a primitive node p ∈ P , labeled with cards [i..j](n1 , ..., ns ), can always become a leaf (p ∈ Σ) according to the following transformation TP →Σ : (a) p is substituted by an auxiliary node aux1 . (b) the children of aux1 are p and a new auxiliary node aux2 . (c) aux1 is labeled with card2 [2..2](p, aux2 ). (d) p becomes a leaf. aux2 ’s children are the former children of p. (e) aux2 is labeled with the former cards [i..j](n1 , ..., ns ) of p. Figure 3 depicts the conversion of a primitive non-leaf node B into a leaf node. 2. DAGs vs. trees. Whereas diagrams are trees in NFT, in VFD+ are DAGs. Therefore, a node n with s parents (n1 , ..., ns ) can be translated into a node n with one parent n1 according to the following transformation TDAG→tree : (a) s − 1 auxiliary nodes aux2 , ..., auxs are added to the diagram. Proceedings of DE@CAiSE'2009 (b) edges n2 → n, ..., ns → n are replaced by new edges n2 → aux2 , ..., ns → auxs . (c) D. Batory [1] demonstrated how to translate any edge a → b into a propositional logic formula φa,b . Using Batory’s equivalences, implicit edges aux2 → n, ..., auxs → n are converted into textual constraints φaux2 ,n ...φauxs ,n and are added to φ (φ0 ≡ φ ∧ φaux2 ,n ∧ ... ∧ φauxs ,n ). Figure 4 depicts the conversion of a node D with two parents B and C into a node with a single parent. Fig. 3. Any primitive non-leaf node can be converted into a leaf node by using TP →Σ Fig. 4. Any DAG can be converted into a tree by using TDAG→tree In order to identify when a transformation on a diagram keeps (1) the di- agram semantics and (2) the diagram structure, P. Schobbens [20] defines the Proceedings of DE@CAiSE'2009 notion of graphical embedding. A graphical embedding is a translation T : L → L0 that preserves the semantics of L and is node-controlled, i.e., T is expressed as a set of rules of the form d → d0 , where d is a diagram containing a defined node or edge n, and all possible connections with this node or edge. Its translation d0 is a subgraph in L0 , plus how the existing relations should be connected to nodes of this new subgraph. According to this definition, TP →Σ and TDAG→tree are graph- ical embeddings that guarantee the equivalency between NFT and VFD+ . 3 Calculating the total number of products represented by an NFT diagram without considering textual constraints This section presents how to calculate the total number of products of a SPL modeled by a NFT diagram without considering textual constraints. The number of products of a node n is denoted as P (n). Thus, the total number of products represented by a NFT diagram is P (r), where r is the root. For a leaf node l, P (l) = 1. Table 3 includes equations to calculate P (n) for a non-leaf node n that has s children ni of type mandatory (i.e., n is labeled with cards [s..s]), optional (cards [0..s]), xor (cards [1..1]) and or (cards [0..s]). Hence, time-complexity for calculating P (n) in these cases is O(s). Therefore, time- complexity for computing P (r) is quadratic on the diagram number of nodes, i.e., O(N 2 ). type of relationship equation Qs mandatory (cards [s..s]) P (n) = i=1 P (ni ) Qs optional (cards [0..s]) P (n) = i=1 (P (ni ) + 1) Qs or (cards [1..s]) P (n) = ( i=1 (P (ni ) + 1)) − 1 Ps xor (cards [1..1]) P (n) = i=1 P (ni ) Table 1. Number of products for mandatory, optional, or and xor relationships In general, when a node n has s children and is labeled with cards [low..high], P (n) is calculated by equation 2, where Sk is the number of products choos- ing any combination of k children from s. For the sake of clarity, let us de- note P (n1 ), P (n2 ), . . . P (ns ) as p1 , p2 , . . . , ps . In a straightforward approach, Sk can be calculated by summing the number of products of all possible k- combinations (see equation 3). Unfortunately, this calculation has the following time-complexity C for P(r): O(N 2N ) ⊆ C ⊆ O(N 2 2N ). high X P (n) = Sk (2) k=low Proceedings of DE@CAiSE'2009 X Sk = pi1 pi2 . . . pik (3) 1≤i1