=Paper= {{Paper |id=None |storemode=property |title=Debian Packages Repositories as Software Product Line Models. Towards Automated Analysis |pdfUrl=https://ceur-ws.org/Vol-688/acota2010_paper5_galindo.pdf |volume=Vol-688 |dblpUrl=https://dblp.org/rec/conf/acota/GalindoBS10 }} ==Debian Packages Repositories as Software Product Line Models. Towards Automated Analysis== https://ceur-ws.org/Vol-688/acota2010_paper5_galindo.pdf
      ACoTA 2010            First International Workshop on Automated Tailoring and Configuration of Applications




 Debian Packages Repositories as Software Product
    Line Models. Towards Automated Analysis
                                    José A. Galindo, David Benavides and Sergio Segura
                                      Department of Computer Languages and Systems
                                        Av Reina Mercedes S/N, 41012 Seville, Spain
                                         {jagalindo, benavides, sergiosegura} at us.es

   Abstract—The automated analysis of variability models in             realistic models are highly desirable in order to assess the
general and feature models in particular is a thriving research         scalability of analysis tools.
topic. There have been numerous contributions along the last
twenty years in this area including both, research papers and              In order to look for realistic and large–scale variability mod-
tools. However, the lack of realistic variability models to evaluate    els, we decided to look into the open source community. We
those techniques and tools is recognized as a major problem
by the community. To address this issue, we looked for large–           found that Debian based linux distributions have a language
scale variability models in the open source community. We found         to describe dependencies in software packages repositories.
that the Debian package dependency language can be interpreted          They are used in Debian individual installations in order to
as software product line variability model. Moreover, we found          add, remove or update packages and manage their possible
that those models can be automatically analysed in a software           conflicts and dependencies [16]. We found that this language
product line variability model-like style. In this paper, we take
a first step towards the automated analysis of Debian package           can be interpreted as a variability language similar to those of
dependency language. We provide a mapping from these models             feature models, OVMs or decision models. Thus, a Debian–
to propositional formulas. We also show how this could allow            based linux distribution could be interpreted as a product line
us to perform analysis operations on the repositories like the          and the individual installations could be therefore considered
detection of anomalies (e.g. packages that cannot be installed).        the products of the product line. In this context, a product is a
                        I. I NTRODUCTION                                set of installed packages, i.e. a Debian installation. Similarly,
   Software Product Line (SPL) engineering is about building            we assume that a set of repositories define an SPL of Debian-
set of related software products. From existing assets, products        based distributions. We may remark that this could also be
are assembled reducing production costs and time–to–market.             applied to other dependency languages such as RPM [14], [10]
Variability Models are used to represent the set of products            (previusly called Red-Hat Package Manager) but for simplicity
of an SPL. Typical variability models are feature models [3],           they are out of scope of this paper.
orthogonal variability models [21] or decision models [9].                 The main contribution of this paper is twofold. First, we
The automated analysis of variability models deals with the             suggest that Debian packages dependency language can be
automated extraction of information from variability models             considered as a variability language. Second, we provide a
[3]. In the last years, a large number of techniques haven been         mapping from this language to propositional formulas enabling
proposed in the literature in order to automatically analyse            their analysis by means of SAT solvers. As a part of our
variability models. In [3], Benavides et al. provide a detailed         contribution, we present a motivating example showing how
literature review on twenty years of automated analysis of              our approach could be used to detect uninstallable packages in
feature models identifying 30 different analysis operations             the repositories, a task that is currently done by hand. Finally,
(e.g. Valid Product and Dead Features). These analyses are              we explain how we plan to integrate our approach in the FaMa
mainly performed using logic paradigms such propositional               Framework, a tool we developed, enabling an efficient and
logic, constraint programming or description logic. There are           user-friendly extraction of Debian variability models and their
also several commercial and open source tools supporting the            analysis .
analysis of variability models such as AHEAD tool suite [2],
DecisionKing [8], FaMa [4], pure::variants [20] S.P.L.O.T                  The rest of the paper is structured as follows. Section II
[17] and VarMod (OVM) [25].                                             describes the main elements of the Debian package depen-
   The lack of realistic and large–scale variability models to          dency language. Section III proposes a graphical view of
evaluate and compare the performance of analysis tools is               Debian variability models. A motivation example is presented
acknowledged to be a challenge in the SPL community [1],                in Section IV clarifying how the Debian packaging system
[23], [11], [24], [13]. Although there are references to real           describes a SPL and how can the automated analysis help
feature models with hundreds or even thousands of features              in the detection of inconsistencies and redundancies. We
[1], these are not available for the research community. This           show a mapping from Debian Variability Modelling Language
has led authors to use trivially small feature models as case           (DebianVML) to propositional formula in Section V to enable
studies or to generate the models randomly [23]. Therefore,             the automated reasoning of it.


      29
      ACoTA 2010           First International Workshop on Automated Tailoring and Configuration of Applications


                                                                       1  Package: openoffice.org-gnome
      II. D EBIAN PACKAGE DEPENDENCY LANGUAGE                          2  Essential: No
   This section describes the main elements of the Debian              3  Version: 1:2.4.0-3ubuntu6
package dependency language. A Debian–based installation               4  Replaces: openoffice.org-common (<< 2.0.4˜
                                                                           rc1-0),
is associated with a set of package repositories. A package            5 Provides: openoffice.org-gtk-gnome,
is a unit of software that can be installed and configured.                openoffice.org2-gnome
Each package repository is associated with a configuration file        6 Depends: gconf2, libc6 (>= 2.1.3)
written in the Debian package dependency language including            7 Pre-Depends: dpkg (>= 1.14.12ubuntu3)
information about the software packages and their relation-            8 Suggests: openoffice.org-evolution
                                                                       9 Conflicts: openoffice.org2-gnome (<<
ships/dependencies.                                                        1:2.4.0-3ubuntu6)
   Figure 1 shows a fragment of a repository configuration             10 Task: ubuntu-desktop, edubuntu-desktop,
file. For each package in the file, a set of properties are                gobuntu-desktop
presented. A property is composed of a name and one or
more values associated with it. Properties may be divided
                                                                                        Figure 1.   Sample Package Definition
into those providing basic information about the package (e.g.
name, version) and those describing the relationships with
other packages. These relationships can be divided into:
                                                                       Variability Modelling Language (DebianVML). In this reduced
   • Depends. Package A depends on package B if B must be              notation we used the a concatenation of the name and version
      installed for A to work properly. Sometimes, A depends           of the packages as the name of the nodes. We omit in this
      not only on B, but on a specific version of it. For instance,    reduced visual notation the relationships of versions. In Figure
      in Figure 1, line 6, describes that package openoffice.org-      2, we present the graphical symbols used to represent each
      gnome depends on package libc6 with version greater              relationship among packages: depends, pre-depends, conflicts,
      or equal to 2.1.3. If the relationship requires package B        suggests, recommends, replaces and provides. Notice that
      to be properly configured for the installation of A, the         some of these relationships are represented with the same
      relationship is referred to as Pre-Depends. Note that a          notation since they have similar meaning as described in
      installed package could not be properly configured.              previous Section.
   • Suggests. Package A suggests package B if B contains                 Figure 3 shows an example using this notation. As il-
      files that are related to (and usually enhance) the func-        lustrated, the repository is represented as a graph in which
      tionality of A. In Figure 1, line 8, package openoffice.org-     nodes represent packages and edges the relationships among
      gnome suggests package openoffice.org-evolution. When            packages. Each package is represented as a rectangle labelled
      the suggestion of the package is strong and it is intended       with the name and version of the package. In this particular
      to be accepted by most of the users this relationship is         example, we also include a capital letter to simplify later
      usually referred to as Recommends. Note that is a human          references to the packages in the paper.
      who determines if a the relation should be suggests or
      recommends.
   • Conflicts. Package A conflicts with package B when
      A will not operate if B is installed on the system. In
      most cases, conflicts are cases where A contains files
      which are an improvement of those in B. For example, in
      Figure 1, line 9, package openoffice.org-gnome conflicts
      with package openoffice.org2-gnome in versions prior to
      1:2.4.0-3ubuntu6.
   • Replaces. Package A replaces package B when files
      installed by B are removed and in some cases over-written
      by files in A. In Figure 1, line 4, package openoffice.org-
      gnome replaces package openoffice.org-common in ver-
      sions prior to 2.0.4 rc1-0. When this relationship affects
      only a part of the functionality of the package it is named
      Provides.
                                                                                        Figure 2.   DebianVML graphical view
           III. A G RAPHICAL N OTATION FOR D EBIAN
                        D EPENDENCIES
   In order to improve the understandability and readability                            IV. A MOTIVATING EXAMPLE
of the paper we propose a simplified graphical notation [18]              Consider the example in Figure 3, it represents a valid De-
for the Debian package dependency language described in the            bianVML instance composed by four packages and with three
previous section. We will refer to this notation as Debian             different types of relationships: Depends, Conflicts, Replaces.


      30
     ACoTA 2010               First International Workshop on Automated Tailoring and Configuration of Applications


  The model in Figure 3 represents the following products :
  1) { }
  2) {D}.
  3) {C}.
  4) {B, C}.
  5) {B, D}
  6) {A, B, D}


                                                                                                     Figure 5.   Valid Product



                                                                          Figure 6 shows an example where A is a dead package.
                                                                          Detecting dead packages in repositories increase the quality of
                                                                          the user experience, preventing the user for an unsupported and
                                                                          frustrating operation. Actually this operation is done manually
                                                                          by the members of the different Debian based distributions.
                                                                          Using the mechanism provided by automated analysis in
                  Figure 3.    DebianVML example
                                                                          Debian variability models this task can be done automatically
                                                                          saving time into the development of those distributions.
   Notice that the empty product is allowed in this variability
model as it represents a Linux distribution without installed
packages. Although this could be controversial, a distribution
completely void, even without the Linux Kernel, could be con-
sidered valid as even the kernel can be installed or uninstalled.
   There are also other potential products, that are invalid, for
example the product {A} is not valid because the depends
relationship with B package is not satisfied.
   An example of how the automated analysis could help in
this circumstance is shown if we apply the valid product
operation [28]. This analysis operation takes a feature model
and a product as inputs and determines whether the product
is valid for the model. Applying the same idea to Debian
configurations, we could determine that {A, B, C} (Figure 4)
is not a valid instantiation, because the inconsistency marked
by the conflicts relationship between packages A and C. In
contrast, the product {A, B, D} (Figure 5) is a valid product
                                                                                         Figure 6.     DebianVML with dead package
since it does not violate any constraint.
                                                                             In our preliminary tests using Ubuntu 8.04 [27] we detected
                                                                          some of these kind of repository inconsistencies. Note that
                                                                          Ubuntu distribution uses the Debian packaging system Figure
                                                                          7 shows a real example of how a package, wysihtml-le, that
                                                                          is in the universe repository can never be installed. Dead
                                                                          Packages can be detected using similar techniques as we do
                                                                          with other kinds of variability models so the knowledge can
                                                                          be reused. The complexity of these variability models are
                                                                          usually bigger than others used in the software product line
                                                                          community having up to 24780 packages and 159812 packages
                   Figure 4.    Not Valid Product
                                                                          dependencies in Ubuntu 8.04 with main, multiverse, restricted
                                                                          and universe repositories enabled.
  Another analysis operation of feature models that can be
used in the Debian community is the so–called dead features.
                                                                          V. M APPING D EBIAN VML TO P ROPOSITIONAL F ORMULA
A feature is dead when the feature does not appear in any
valid configuration. In the context of Debian, we can define a              To enable the automated analysis of DebianVML we must
dead package as the one that is present in a repository but can           define the mapping from DebianVML to constraints into an
never be installed because of the violation of dependencies.              specific off–the–shelf solver such as Sat4j [5], Choco [6] or


     31
     ACoTA 2010           First International Workshop on Automated Tailoring and Configuration of Applications




                                          Figure 7.   Real demo of a dead package in Ubuntu 8.04



JaCoP [12]. Note that these solvers are widely used for the            in TableI.
automated analysis of feature models [3].
                                                                                   Debian Variability Model      Propositional formula
   In this Section we do not refer to all the relationships as
                                                                                        A Depends B                   A ⇒ (B)
some of them have a very similar meaning and are treated in                             A Depends B                   A ⇒ (B∨
the same way as the relationships we describe here. Also there                   (being B replaced by C,D,...)       C ∨ D ∨ ...)
are some relationships that do no affect to the variability in                          A Conflicts B                A ⇒ (¬B)
                                                                                        A Replaces B               Used to generate
the model and does not need to be added to the solver as con-                                                         the tuples
straints more exactly suggests and recommends relationships.
   The algorithm we followed to translate DebianVML to                                                  Table I
constraints has the following general form: Each package                            D EBIAN R ELATIONS TO PROPOSITIONAL FORMULA
name make up the set of variables then, each relationship is
processed to make up the set of constraints (or propositional
clauses) of the problem.
                                                                                              VI. R ELATED WORK
   The first relationship to be processed is Replaces. This
is a special case of relationship as it is needed to generate             There are other contributions interested in solving the ex-
a set of tuples containing the replaced packages and their             isting lack of realistic variability models in the SPL context.
replacements to be ready to process the depends relationships.         Czarnecki et al.[13] present how to extract feature models from
This is needed because, after performing a general mapping             the Linux kernel using the LKC (Linux Kernel Configurator)
from Debian dependencies to propositional clauses, we have to          language that describes the relationships in a very similar way
change the output clauses according to the replacements that           as feature models do obtaining feature models with about 6319
are stored in the tuples. For instance, in the example of Figure       features. Our approach focuses on other variability dependency
3, we have the tuple (C,D) meaning that D is a replacement of          language existing in open source community increasing even
C. Of course, in a real big example, we can have more than             more the availability of realistic variability models up to 20000
a tuple referring to the same package (e.g. (C,X1), (C,X2),...         packages to the SPL community. Although our first attempts
,(C,Xn) means that C is replaced by X1, X2, ..., Xn).                  tried to map DebianVML to feature models, we finally realised
   After processing the Replaces relationship we can process           that DebianVML can be considered a variability language by
the other relationships following the general rules presented          it self.


     32
     ACoTA 2010           First International Workshop on Automated Tailoring and Configuration of Applications


   Recently, Di Cosmo et al. [7] presented a way to describe               6 hours in a laptop machine with main, multiverse, re-
feature models using Debian packages dependencies and they                 stricted and universe repositories of Ubuntu 8.04 enabled
propose using Debian dependencies management tools to                      (24780 packages). We will work on the of the codding
analyse feature models. Our contribution here is the other                 to improve those results.
way around: we propose to use feature model analysis tools               We consider that this work could foster discussion about
and operations to analyse Debian packages repositories. We            the following topics:
think that Di Cosmo et al. contribution and ours are good
                                                                         • Can open source models be considered as SPL models?
complements to investigate in this direction.
                                                                         • Can the OS community benefit from the knowledge
   A similar contribution to the problem of the analysis of
                                                                           acquired by the SPL analysis community in the last 20
Debian repositories is presented by some of the results of
                                                                           years [3]?
the EDOS project [19], [15]. Our contribution can benefit
                                                                         • How advanced are Open Source configuration languages
their results and we have to study in depth at what extend
                                                                           to abstract end–users from technical details?
they have performed analysis operations. We think that we
can complement their results providing our know–how in the                                      ACKNOWLEDMENTS
analysis of feature models and software product lines [3].
                                                                         We would like to thank the friends and co-worker’s that
  VII. C ONCLUSIONS , FUTURE WORK AND DISCUSSIONS                     encouraged us in this research and the anonymous reviewers
   In this paper, we propose using Debian repositories as a           for their helpful comments.
good source of realistic and complex variability models. These           This work has been partially supported by the European
can be used as motivations inputs problems for those tools            Commission (FEDER) and Spanish Government under CICYT
dealing with the automation of variability models. At the same        projects SETI (TIN2009-07366) and by the Andalusian Gov-
time, we consider the Debian community could benefit of               ernment under ISABEL project (TIC-2533).
this automation to perform several analysis operations like
the detection of inconsistencies in repositories, a work that                                       R EFERENCES
is currently done by hand (e.g. When an error is detected, it          [1] D. Batory, D. Benavides, and A. Ruiz-Cortés. Automated analysis
is correctly manually).                                                    of feature models: Challenges ahead. Communications of the ACM,
                                                                           December:45–47, 2006.
   In our ongoing research, we are working to enable the
                                                                       [2] Don Batory. Feature-oriented programming and the ahead tool suite.
analysis of DebianVML in FaMa, making FaMa, our tool to                    In ICSE ’04: Proceedings of the 26th International Conference on
analyse feature model, capable to perform this analysis and                Software Engineering, pages 702–703, Washington, DC, USA, 2004.
helping the community of Debian based distributions, applying              IEEE Computer Society.
                                                                       [3] D. Benavides, S. Segura, and A. Ruiz-Cortés. Automated analysis of
the knowledge about feature models under Debian VM.                        feature models 20 years later: A literature review. Information Systems,
   FaMa [26], [4] is a framework to build variability model                35(6), 2010. (In press).
analysis tools, that was designed with extensibility in mind.          [4] D. Benavides, P. Trinidad, S. Segura, and A. Ruiz-Cortés. Fama
                                                                           framework. http://www.isa.us.es/fama/. accessed May 2010.
It offers a platform to implement other kind of variability            [5] D. Le Berre. Sat4j. http://www.sat4j.org/. accessed May 2010.
models. It has already been used to analyse feature models             [6] CHOCO solver, http://choco.emn.fr/. accessed January 2010.
[26] and also OVMs [22]. Because the analysis of variability           [7] Roberto Di Cosmo and Stefano Zacchiroli. Feature diagrams as package
                                                                           dependencies. In 14th International Software Product Line Conference
models can be very expensive in terms of memory and time,                  (SPLC’10), 2010.
FaMa also offers a set of tools for benchmarking [23] over             [8] Deepak Dhungana, Paul Grünbacher, and Rick Rabiser. Decisionking:
models. These features, coupled to our familiarity with the                A flexible and extensible tool for integrated variability modeling. In 1st
                                                                           International Workshop on Variability Modelling of Software-intensive
tool as members of the FaMa project, made FaMa as a good                   Systems, pages 119–128, 2007.
candidate to be extended in order to support the analysis of           [9] Deepak Dhungana, Rick Rabiser, and Paul Grunbacher. Decision-
Debian variability models. This is part of our future work.                oriented modeling of product line architectures. Software Architecture,
                                                                           Working IEEE/IFIP Conference on, 0:22, 2007.
   These are the main directions of our future work:                  [10] FOSDEM’08. http://en.opensuse.org/Package management/Sat solver.
   • Address new operations. Using our experience on feature          [11] Arnaud Hubaux, Andreas Classen, Marcı́lio Mendonça, and Patrick
                                                                           Heymans. A preliminary review on the application of feature diagrams
     models (e.g. Development of analysis tools, benchmark-                in practice. In VaMoS, pages 53–59, 2010.
     ing, detection of errors ...) we need to be able to detect       [12] K. Kuchcinski and R. Szymanek. Jacop. http://jacop.osolpro.com/.
     anomalies in Debian models such as conditionally dead                 accessed May 2010.
     packages or redundancies, probably the operations with           [13] Rafael Lotufo, Steven She, Thorsten Berger, Krzysztof Czarnecki, and
                                                                           Andrzej Wasowski. Evolution of the linux kernel variability model. In
     more than one variability model can be very useful to                 14th International Software Product Line Conference (SPLC’10), 2010.
     Debian community improving the usability of packaging            [14] RPM Package Manager. http://www.rpm.org.
     systems. Also the explanations of the errors to help De-         [15] Fabio Mancinelli, Jaap Boender, Roberto di Cosmo, Jerome Vouillon,
                                                                           Berke Durak, Xavier Leroy, and Ralf Treinen. Managing the complexity
     bian community on its daily work could help to improve                of large free and open source package-based software distributions. In
     the end user experience.                                              ASE ’06: Proceedings of the 21st IEEE/ACM International Conference
   • Benchmarking and optimization. In our preliminary tests               on Automated Software Engineering, pages 199–208, Washington, DC,
                                                                           USA, 2006. IEEE Computer Society.
     we have detected that the Debian based models automated          [16] Debian Policy Manual.          http://www.debian.org/doc/debian-policy/
     analysis are computationally very expensive, more than                ch-relationships.html.



     33
       ACoTA 2010               First International Workshop on Automated Tailoring and Configuration of Applications


[17] Marcilio Mendonca, Moises Branco, and Donald Cowan. S.p.l.o.t.:
     software product lines online tools. OOPSLA ’09: Proceeding of the 24th
     ACM SIGPLAN conference companion on Object oriented programming
     systems languages and applications, pages 761–762, 2009.
[18] D. Moody. The physics of notations: Toward a scientific basis for con-
     structing visual notations in software engineering. Software Engineering,
     IEEE Transactions on, 35(6):756 –779, nov.-dec. 2009.
[19] EDOS project. http://www.edos-project.org.
[20] Pure-systems. pure::variants. http://www.pure-systems.com/. accessed
     May 2010.
[21] Fabricia Roos-Frantz. A preliminary comparison of formal properties
     on orthogonal variability model and feature models. In VaMoS, pages
     121–126, 2009.
[22] Fabricia Roos-Frantz and Sergio Segura. Automated analysis of orthog-
     onal variability models. a first step. In Steffen Thiel and Klaus Pohl,
     editors, First Workshop on Analyses of Software Product Lines (ASPL
     2008). SPLC’08, pages 243–248, Limerick, Ireland, September 2008.
[23] S. Segura and A. Ruiz-Cortés. Benchmarking on the automated analyses
     of feature models: A preliminary roadmap. In Third International
     Workshop on Variability Modelling of Software-intensive Systems, pages
     137–143, Seville, Spain, 2009.
[24] Steven She, Rafael Lotufo, Thorsten Berger, Andrzej Wasowski, and
     Krzysztof Czarnecki. The variability model of the linux kernel. In
     Fourth International Workshop on Variability Modelling of Software-
     intensive Systems (VAMOS’10), Linz, Austria, January 2010.
[25] VarMod-PRIME Tool-Environment. http://www.sse.uni-essen.de/wms/
     en/index.php.
[26] P. Trinidad, D. Benavides, A. Ruiz-Cortés, S. Segura, and A.Jimenez.
     Fama framework. In 12th Software Product Lines Conference (SPLC),
     2008.
[27] Ubuntu. http://www.ubuntu.com.
[28] J. White, B. Doughtery, D. Schmidt, and D. Benavides. Automated
     reasoning for multi-step software product-line configuration problems.
     In Proceedings of the Sofware Product Line Conference, pages 11–20,
     2009.




       34