=Paper= {{Paper |id=Vol-1453/20_WalchWalterKuechlin_FormalAnalysisOfTheLinuxKernel_Confws-15_p131 |storemode=property |title=Formal analysis of the Linux kernel configuration with SAT solving |pdfUrl=https://ceur-ws.org/Vol-1453/20_WalchWalterKuechlin_FormalAnalysisOfTheLinuxKernel_Confws-15_p131.pdf |volume=Vol-1453 |dblpUrl=https://dblp.org/rec/conf/confws/WalchWK15 }} ==Formal analysis of the Linux kernel configuration with SAT solving== https://ceur-ws.org/Vol-1453/20_WalchWalterKuechlin_FormalAnalysisOfTheLinuxKernel_Confws-15_p131.pdf
    Formal Analysis of the Linux Kernel Configuration with
                         SAT Solving
                                   Martin Walch1 and Rouven Walter1 and Wolfgang Küchlin1


Abstract. The Linux kernel is a highly configurable software sys-                       the project, it is a first prototype with only a simplified view on the
tem. The aim of this paper is to develop a formal method for the                        Kconfig files. As a consequence, the results were of the proof-of-
analysis of the configuration space. We first develop a Linux prod-                     concept type and of very limited use for verification purposes.
uct overview formula (L-POF), which is a Boolean formula repre-                            The comprehensive VAMOS project3 at Friedrich-Alexander-
senting the high-level configuration constraints of the kernel. Using                   Universität Erlangen-Nuremberg has lead to numerous results and
SAT solving on this L-POF, we can then answer many questions,                           publications, including the uncovering of hundreds of real world
such as which options are possible, mandatory, or impossible for any                    bugs. It also analyses the Linux kernel configuration with the means
of the processor architectures for which the kernel may be config-                      of Propositional Logic, but goes much further by considering the ac-
ured. Other potential applications include building a configurator or                   tual effects on the kernel code and applying the tools to other projects
counting the number of kernel configurations. Our approach is anal-                     that use LKC as well. The PhD Thesis of Reinhard Tartler [9] from
ogous to the methods we use for automobile configuration. However,                      2013 gives a detailed overview over the model, the implemented
in the Linux case the configuration options (e.g. the individual de-                    tools, and most of the applications and results.
vice drivers) are represented by symbols in Tristate Logic, a special-                     The PhD Thesis of Sarah Nadi [5] from 2014 picks up the VAMOS
ized three-valued logic system with several different data types, and                   project and extends it to not only consider the Kconfig files and the
the configuration constraints are encoded in a somewhat arcane lan-                     kernel code, but to additionally take the build system into account.
guage. We take great care to compile the L-POF directly from the                        Even more, it extracts configuration constraints from the implemen-
files that hold the configuration constraints in order to achieve max-                  tation.
imum flexibility and to be able to trace results directly back to the                      In this work, we focus solely on the Kconfig files. Although the
source.                                                                                 VAMOS model proves to yield good results, it does not aim at being
                                                                                        exact and relies on parts of LKC. We present a fairly precise model
                                                                                        and translation process into a product overview formula in Proposi-
1     Introduction                                                                      tional Logic with the goal to account for all details that are relevant in
Linux is a kernel for a broad range of platforms with highly versa-                     real-world use cases. Our implementation works independently from
tile configurations of peripheral components. Static configuration at                   LKC and we show the results from running it against Linux 4.0.
compile time helps to adapt to the different requirements. The central                     This work is loosely based on the paper by Zengler and Küchlin
tool for configuring this is LinuxKernelConf, abbreviated LKC. The                      from 2010 [11] in that it picks up and uses central ideas, but elabo-
input to LKC uses a domain specific language to describe configu-                       rates on many more details.
ration constraints. A common alternative name for LKC is Kconfig,                          There is no precise specification of Kconfig. So we consider what
and they are often used interchangeably. In this document, we refer                     information is available in the documentation, the implementation,
to the configuration system as LKC and we denote the language as                        and the way the input language is used. Section 2 gives a rough
Kconfig. The input is large and stored in files which we call Kcon-                     overview over this input language.
fig files. They continuously change as kernel development goes on.                         Our translation uses several intermediate stages. First we create
Automatic checks for semantic consistency on Kconfig files are de-                      what we call the Zengler Model in Section 3. In this step we abstract
sirable, but LKC has no such checks implemented.                                        from technical details of reading the configuration input and isolate
   At the Workshop on Configuration 2010 in Lisbon, Zengler and                         the data that is relevant for our purposes.
Küchlin presented an approach [11] to encode the whole Linux ker-                          The Zengler Model retains some parts of the input structure that
nel configuration in Propositional Logic. Conceptually this work                        have an impact on the meaning. In Section 4, we transform it into the
parses the Kconfig files and stores the relevant information in a                       Attributes Model, resolving these parts and switching from a repre-
database. Subsequently, this database is translated into a product                      sentation that focuses on input structure to one that revolves around
overview formula2 , abbreviated POF, in Propositional Logic. While                      the constraints.
it demonstrates central ideas and shows the technical feasability of                       LKC uses a three-valued logic, called Tristate Logic, that is un-
                                                                                        common in academic discourse. We take a look at this logic system
1 Symbolic Computation Group, WSI Informatics, Universität Tübingen,
                                                                                        in Section 5 and introduce some extensions to it. We then proceed by
    Germany, www-sr.informatik.uni-tuebingen.de
2 A POF is a single Boolean formula which captures all configuration con-               generating a product overview formula in this extended logic from
    straints of a high-level configuration model (cf. [3]). Historically it has         the Attributes Model, encoding the set of all valid Linux kernel con-
    been introduced in [4] to capture the high-level configuration model of
    Mercedes-Benz, which is called “Product overview” (German: Produkt-                 3     VAMOS: Variability Management in              Operating    Systems,
    übersicht).                                                                             www4.cs.fau.de/Research/VAMOS/




                                                                                  131                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                                  Proceedings of the 17th International Configuration Workshop
                                                                                                                       September 10-11, 2015, Vienna, Austria
 figurations.                                                                     LKC reads these files with a scanner and a parser that are gener-
    Eventually, we translate the product overview formula from the             ated using flex4 and GNU Bison.5 Exploring the Bison grammar is
 three-valued logic into Propositional Logic in Section 6. With com-           clearly beyond the scope of this document as it consists of more than
 mon transformation into CNF and SAT solving, we can reason about              100 production rules. Suffice it to say that the grammar we use very
 the set of valid configurations, yielding first results in Section 7.         closely ressembles the grammar of LKC.
                                                                                  To give an overview over the relevant parts of the configuration
                                                                               language, we confine ourselves to explaining an artificial example. A
 2    The Linux Kernel Configuration System
                                                                               concise formal description of the relevant parts was done by She and
 Kconfig is a specialized input language for LKC that describes                Berger [7].
 available features in terms of symbols and offers several different              Listing 1 contains most of the relevant language features. The
 means to encode constraints and interdependencies. Kernel devel-              central element everything else is built around is the configuration
 opers maintain this information in hierarchically organized Kconfig           block. Its purpose is to declare and to describe symbols. The keyword
 files.                                                                        config starts a configuration block and is followed by the name of
                                                                               the symbol it refers to. A configuration block contains one or more
                    Listing 1. Example Kconfig file                            lines that further specify what we call properties of that symbol. Each
 config T0                                                                     symbol must have at least one configuration block. In practice, for the
         tristate                                                              majority of symbols there is exactly one configuration block for each
         prompt " f e a t u r e T0 "                                           symbol, but there may be more. In Linux 4.0, there are up to six per
         select T1
                                                                               symbol.
         depends on T2 && !C0
                                                                                  Each symbol has one of the five types tristate, bool,
 config T1                                                                     string, int, and hex. In our example, the symbols T0, T1, and
         tristate                                                              T2 have the type tristate, the symbol B0 has the type bool, and
         prompt " f e a t u r e T1 " if C0                                     the symbol S0 has the type string. We call a symbol declared if it
                                                                               has at least one configuration block and one of the five types.
 config T2                                                                        On actual configuration, each symbol is assigned an unambigu-
         tristate                                                              ous value from their respective domains. The domain of tristate
         default y                                                             symbols is {0, 1, 2} and the domain of bool symbols is {0, 2}. The
                                                                               purpose of the three-valued tristate type is to encode the three
 choice
             tristate                                                          possible activation states that apply to many features in the kernel:
             prompt "CHOICE 0 "                                                Turn it off (0), compile it as a runtime-loadable module (1) or com-
                                                                               pile it into the kernel (2). The bool type is the same except that it is
             config C0                                                         missing the value 1 for the module state.
                     prompt " f e a t u r e C0 "                                  To identify the three different states in syntax, there are the three
                                                                               constants n, m, and y, which evaluate to 0, 1, and 2, respectively.
             config C1                                                            The domain of string symbols is the set of all valid strings.
                     prompt " f e a t u r e C1 "                               Similar to the domain of bool being a subset of the domain of
                                                                               tristate, the domains of int and hex symbols both are subsets
 endchoice
                                                                               of the domain of string symbols, in that their elements are strings
 menu " submenu "                                                              that read as valid integers or hexadecimal numbers, respectively.
          visible if T1                                                           There are different types of dependencies that affect properties,
                                                                               thus most properties can be active or inactive. The symbol type is the
 config S0                                                                     only unconditional property. It is always active and cannot dynami-
         string                                                                cally change during configuration. It is also the only property that is
         default " d e f a u l t s t r i n g "                                 mandatory for every symbol. However, to allow the user to directly
                                                                               assign a value, the symbol needs a prompt property. In our exam-
 config B0
                                                                               ple, there are several such prompt properties, one of them for the
         bool
         prompt " c u s t o m S0 "                                             symbol T0. The prompt keyword is followed by a string parame-
                                                                               ter "feature T0". This parameter is mandatory and shows up as short
 if B0                                                                         description in the configuration interface.
                                                                                  The symbol T0 also has a select property with the argument
 config S0                                                                     T1. This property is only valid with bool and tristate symbols.
         prompt " S0 "                                                         The select property sets up a direct relation between the values of
                                                                               two symbols. The value of the selected symbol is always at least as
 endif                                                                         high as the value of the symbol that the select property belongs
                                                                               to. So, in our example the value of T1 is always at least as high as
 endmenu
                                                                               the value of T0. The documentation calls this a reverse dependency.
                                                                                  The last line of the configuration block of T0 starts with depends
                                                                               on and is followed by an expression in Tristate Logic T2 && !C0.

                                                                               4 flex: The Fast Lexical Analyzer, flex.sourceforge.net
                                                                               5 GNU Bison, www.gnu.org/software/bison/




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                 132
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
This is a dependency that applies to the whole configuration block,              possibly gaining forward compatibility. It starts with the option
i.e. in our case it is a dependency to the prompt and to the select              keyword, followed by one of a few option names, and depending on
property. The actual dependency is encoded in the expression. It has             that, the meaning varies and an argument may be appended. As of
no direct influence on the value of the symbol, but rather on the prop-          Linux 4.0, two different options are defined that may be relevant in
erties which in turn set up constraints to the value.We take a closer            our context: modules and env. The modules option makes only
look at Tristate Logic and the mechanisms of dependencies in sec-                sense with a bool symbol, and it may be only used once in the
tion 5.                                                                          whole configuration. It associates the corresponding symbol with the
   It is possible to add a dependency to a single property by append-            availability of support for loading modules, i.e. as a consequence,
ing an expression with if to the respective line, like for the configu-          deactivating that symbol basically prohibits assigning the value 1
ration prompt of T1.                                                             from the tristate domain. The env option is usually only used
   A default property is used to non-bindingly suggest a value,                  for string symbols that have no prompt property as it imports
but it is also used to automatically set values of symbols that have             a value from the runtime environment of the system the configura-
no active prompt. In our example, the symbol T2 uses the trivial                 tion system is running on. Only very few symbols use this option at
expression y as default value, but more complicated expressions                  all and even fewer have an effect on the configuration that is worth
are possible.                                                                    mentioning. Still, as we want to be precise, we account for them, too.
   Our example contains a choice block. This language con-                          From our example it is also not apparent how Kconfig files are hi-
struct encloses a series of configuration blocks with choice and                 erarchically organized. It works similarly to #include directives
endchoice, constituting a set of symbols that logically exclude                  of the C preprocessor: A file inclusion is done with the keyword
each other. This is useful for features that serve the same purpose              string and a string argument. The string argument is treated as
and which therefore naturally cannot be simultaneously active, like              path and the content of the file at that path is pasted at the position of
e.g. the compression algorithm of the kernel image or the preemption             the string line.
model. Like a symbol, a choice block has a type, but only the two                   Kconfig still offers more constructs that we have not mentioned,
data types bool and tristate are valid. A choice block of type                   but they are irrelevant at this point.
bool enforces that exactly one of the enclosed symbols is set to 2.
The tristate type relaxes this strict constraint and allows to not
                                                                                 3   Zengler Model
assign 2 to any of the symbols, while setting an arbitrary number of
symbols to 1. There is a property optional that even allows setting              Linux 4.0 supports 30 different main architectures. Each architecture
all symbols to 0. In contrast to a symbol, a choice block must have a            has its own tree of Kconfig files, but there are big overlaps across all
prompt property. There may also be default properties, suggest-                  architectures by using common files. Accordingly we want to be able
ing one of the enclosed symbols, and dependencies using depends                  to reason about the configurations across all architectures. However,
on, which analogously to configuration blocks apply to the whole                 capturing several architectures in one data structure without losing
choice block.                                                                    precision is very complicated as this requires keeping track of which
   A hierarchical menu can be constructed using the menu blocks.                 files are included for each architecture and in which order they are
Its primary purpose is organizing the features to ease navigation. Just          read. Therefore, we create a separate model for each architecture and
like the prompt property, the menu keyword is followed by a string               consider them together if needed.
that serves as user visible label for the menu. For our point of view it            By considering one fixed architecture, we can deduce the full in-
is important that there may be dependencies added using depends                  clusion hierarchy of Kconfig files for that specific architecture. Ac-
on that apply to the whole menu, and a second type of dependency                 cordingly, our parser moves through the file hierarchy in the same
that is set up with visible if. That second kind of dependency                   way as LKC does, without the need to store which files we in-
toggles only the parts that concern direct configurability by the user,          clude. However, it produces different data structures. We create two
but leaves all other properties unaffected.                                      databases, a symbol database DS , and a choice database DC . They
   Finally it is also possible to add dependencies to an arbitrary se-           are heavily extended versions of the databases created by Zengler and
quence of configuration blocks, choice blocks, and menus as long as              Küchlin [11], hence we call these two databases the Zengler Model.
it does not violate the logical hierarchy. This is done with the enclos-            The symbol database contains symbol descriptors. Each configu-
ing keywords if and endif with the dependency expression right                   ration block corresponds to a symbol descriptor. They are grouped by
behind if.                                                                       the symbol they belong to. So, our symbol database is a set of pairs
                                                                                             −−−→                              −−−→
   Our example covers those language constructs that we regard as                DS = (s, desc) where s is a symbol and desc is a list of symbol
most important for variability, but there is more: For symbols with              descriptors desc with
the int or hex type, there is a range property that fixes a lower
                                                                                                   − → −→ −−→ −−−→
                                                                                                   −−−
and an upper bound for the value. Both bounds may depend on freely                      desc = (t, pmpt, sel, def , −
                                                                                                                    range, dep, opt, kdesc )
changeable values of other symbols. In practice nearly all bounds                      pmpt = (ep , kp )
have constant values and those few bounds that are variable can be
                                                                                           sel = (ss , es , ks )
manually checked for inconsistencies. From our perspective the im-
pact of range properties on the variability of the overall configura-                     def = (ev , ed , kd )
tion is mostly negligible in current versions of Linux. Therefore, we                 range = (esl , esh , er , kr )
incorporate range properties in our model, but ignore them when                         dep = (−
                                                                                               e→, −  e−→, −
                                                                                                     if   dep  e−→)vis
creating a product overview formula as we do not expect any observ-
                                                                                          opt = (benv , bmod ).
able consequences.
   Another language feature that does not occur in our example is                    The type t is one of the five types or it is a special type “unknown”
the generic option line in configuration blocks, which allows ex-                if the configuration block has no type property. It suffices if only one
tending the language with only minimal changes to the grammar and                descriptor holds a regular type.




                                                                           133                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                           Proceedings of the 17th International Configuration Workshop
                                                                                                                September 10-11, 2015, Vienna, Austria
    The properties prompt, select, default, and range appear                          We write E to denote sets of expressions. The other attributes are
 in the descriptor as the tuples pmpt, sel, def , and range, respec-                  only affected by the other dependencies, hence we define them as
 tively. As the input order plays a role, each of them and the descriptor             follows:
 contain unique indices kdesc , kp , ks , kd , and kr that are ascending in
 input order.                                                                             aS = (Es , s, ks )              =   (−
                                                                                                                               e→    −−→
                                                                                                                                if ∪ edep ∪ es , s, ks )
    The string argument of a prompt property is unimportant for the                       aD = (Ed , ev , kd )            =   (−
                                                                                                                               e→ ∪ −
                                                                                                                                if   e−→ ∪ e , e , k )
                                                                                                                                       dep      d   v   d
 configuration logic, so we only store the expression of the optional
 if dependency ep . If there is no such dependency, we store a spe-                       aR
                                                                                               = (Er , esl , esh , kr )   =   (−
                                                                                                                               e→    −−→          s s
                                                                                                                                if ∪ edep ∪ er , el , eh , kr )

 cial symbol as placeholder for an empty expression. Analogously,
                                                                                      With these definitions we collect for each symbol s the prompt at-
 the optional if dependencies for select, default, and range
                                                                                      tributes aP in a set P (s), the default attributes aD in a set D(s),
 properties are stored as es , ed , and er .
                                                                                      and the range attributes aR in a set R(s). We also create a set S(s)
    The target of a select property is a single symbol es . A
                                                                                      for each symbol, but note that the symbol in the definition of aS is
 default value may be the result of a non-trivial expression ev . The
                                                                                      the symbol of the descriptor that contains the property and not the
 lower bound of a range property esl may be either another symbol
                                                                                      symbol ss that is the select target. Accordingly we add each aS
 or a constant value. The same goes for the upper bound esh .
                                                                                      to the set of the target symbol. If there is no descriptor of the target
    We do not store any menus or enclosing ifs explicitely. Instead we
                                                                                      symbol we discard the attribute.
 propagate their dependencies to the contained descriptors and store
                                                                                         To determine the symbol type that results from the types in the
 the dependency expressions in lists that distinguish between the dif-
                                                                                      corresponding descriptors, we take the type of the symbol descrip-
 ferent types. The dependencies of surrounding ifs are stored in −     e→
                                                                        if .
                                                       −−→                            tor with the lowest kdesc that is not “unknown”. If there is no such
 Those of depends on dependencies add up to edep , including any
                                                                                      descriptor we also set the type of the symbol to “unkown”.
 such dependencies contained in the configuration block under con-
                                                                                         Finally we concatenate the opt bitvectors component-wise with a
 sideration. The third list −
                            e−
                             →
                             vis contains the visible if dependen-
                                                                                      logical or and store it with the symbol.
 cies from menus.
                                                                                         We do not need dep and kdesc anymore and do not transfer them
    The pair of bits opt indicates if there is a module or env option.
                                                                                      into the Attributes Model.
 Even though env carries an argument in the configuration block, we
                                                                                         Next we transform the choice descriptors from the Zengler Model
 only care whether it is present.
                                                                                      into what we call choice groups. Like for symbols, we determine the
    Our choice database DC is a set of tuples which we call choice
                                                                                      type of the choice group. If t in the choice descriptor is bool or
 descriptors:
                                                                                      tristate, then this is also the type of the choice group. However,
                                −
                                −−−
                                  →        −−−
                                             →                                        it is a frequently used feature to skip the explicit type declaration
                     (t, bopt , pmpt, dep, kdesc , kc )
                                                                                      in choice blocks, resulting in choice descriptors with the special type
 Each choice block translates into a choice descriptor. The type t                    “unknown”. In that case the type is taken from the first symbol that is
 is one of bool, tristate or “unknown”. The bit bopt indicates                        enclosed in the choice descriptor and has a regular type. If any sym-
 whether the choice block carries the optional flag. Just like in a sym-              bol in the choice descriptor is lacking a regular type then it inherits
 bol descriptor, there is a list of pmpt tuples, dependencies lists dep,              the type of the choice group.
 and an index kc . Rather than storing the symbol names of the con-                      Now that we have completed the type resolution for choice groups,
                                                              −−−→                    we determine which symbols are part of the choice group. This is not
 tained symbol descriptors, we create a list of their indices kdesc .
                                                                                      trivial as not all symbols that correspond to symbol descriptors in the
    In contrast to the symbol descriptors, the default properties of
                                                                                      choice descriptor are necessarily transferred. Symbols can be moved
 choice blocks have no influence on variability and we ignore them.
                                                                                      into their own submenu by depending on a symbol that is immedi-
                                                                                      ately above, excluding them from the choice group. We actually see
 4    Attributes Model                                                                this constellation intentionally used in Linux 4.0 and have to process
                                                                                      it adequately. LKC involves an extensive logic to determine whether
 Although the Zengler Model is an abstraction from the underlying                     to move a symbol into a submenu, but a simple heuristic suffices to
 Kconfig files, storing the relevant information in a normalized way,                 correctly capture any real-world case.
 its focus lies on the input structure: The symbol database holds in-                    As LKC ignores the attributes S(s) and D(s) of all symbols of
 dividual symbol descriptors desc and lists of dependencies dep re-                   the choice group, we clear them if they have any content.
 flecting artifacts from the input structure. Furthermore, we have yet                   To complete the choice group, we generate a prompt attribute
 to determine the final types of the symbols and choices and which                    aP the same way we do from a symbol descriptor and we keep the
 symbols a choice includes as this is not trivial in all cases.                       optional bit bopt .
    In the next step, we resolve all this and create a model that fo-                    Our new databases DS0 and DC    0
                                                                                                                           are now
 cuses on the effective impact of the descriptor properties on the sym-
                                                                                                           −−−−−−−−−−−−−−−−−−−−−−−−→
 bols. For a better distinction from the properties of the descriptors in                            DS0 = (s, t, P (s), S(s), D(s), R(s), opt)
 the Zengler Model, we refer to their resulting equivalents in the new                                     −−−−−−−−−−→
 model as attributes. Therefore we call the new model the Attributes
                                                                                                      0
                                                                                                     DC  = (t, bopt , aP , −
                                                                                                                           →
                                                                                                                           s)
 Model.
    First we associate the attributes directly with the dependencies
 from dep that control them. As already mentioned in Section 2, the
                                                                                      5   Tristate* Logic and POF
 dependencies in − e−→
                     vis do not affect all types of attributes. In fact, they         Our next step is to translate the Attributes Model into a coherent
 control only our prompt attributes aP . We define them as                            product overview formula. While our goal is to arrive at a POF in
                                                                                      Propositional Logic, we prefer to split this translation into two steps.
            aP = (Ep , kp ) = (−
                               e→    −−→ −−→
                                if ∪ edep ∪ evis ∪ ep , kp )                          First we define Tristate* Logic, an extension to Tristate Logic that




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                        134
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
we specifically design to create an initial POF. Then we translate the          operators and symbols in expressions is actually a feature, leading to
POF from Tristate* Logic into Propositional Logic in Section 6.                 frequent conversions between the two domains.
   The following grammar describes the general syntax of expres-                   This may become quite complex. Consider the short expression
sions in Tristate Logic:                                                        A=B. If both, A and B, are declared string symbols, their values
                                                                                are compared, yielding 2 if they are exactly identical and 0 otherwise.
hexpressioni         → hexpression symboli                                      However, if A is a tristate symbol and B is undeclared, then the
                     | hexpression symboli‘=’hexpression symboli                values 0, 1, 2 of A are interpreted as the strings “n”, “m”, and “y” and
                     | hexpression symboli‘!=’hexpression symboli               B is interpreted as the string “B”, i.e. in this case a string com-
                     | ‘(’ hexpressioni‘)’                                      parison is done on these letters, always yielding 0. We take all these
                     | ‘!’hexpressioni                                          details into account when producing our POF in Tristate* Logic.
                     | hexpressioni‘&&’hexpressioni                                Encoding the constraints of bool and tristate symbols that
                     | hexpressioni‘||’hexpressioni                             originate from their respective attributes, works in an indirect way:
                                                                                For each of these symbols, we add two auxiliary variables which
hexpression symboli → symbol name
                                                                                represent a lower and an upper limit to the value of the symbol in
                    | constant value
                                                                                consideration, and, using the ⇒ operator, we append the constraint
   There are five operators and there are parentheses to override op-           that the value of the symbol must not exceed the values set up by the
erator precedences. We distinguish two groups of operators: The                 auxiliary variables.
tristate operators !, &&, and ||, and the string operators =                       This is in contrast to encoding the choice groups: We encode the
and !=. Both groups operate on their respective domains.                        exclusiveness of symbols with expressions that directly relate to the
   The tristate operators and their domain form a three-valued                  symbols instead of their associated auxiliary variables.
logic like those of Łukasiewicz and Kleene. A comprehensive                        Finally, we also take account of the mode without module support
overview of these logics has been done by Gabbay and Woods [2].                 by adding two different subformulae for tristate symbols and
   In fact, the three base operators !, &&, and || correspond to the            tristate choice groups which depend on the symbol that has the
operators ¬, ∧ and ∨ in K3 and Ł3 from Kleene and Łukasiewicz.                  bmod bit set.
This observation has already been made in 2010 by Berger et al. [1].
However, Tristate Logic itself is not expressive enough for a full POF.         6   POF in Propositional Logic
We need to extend it for our purposes.
   The nature of our dependencies is Boolean and not three-valued,              Translating the POF from Tristate* Logic into Propositional Logic
because either they are met or they are not. There is no third value            is mostly straightforward. We encode each symbol with the type
like the module state in Tristate Logic. Hence, when extending Tri-             tristate or bool and all auxiliary variables using two variables
state Logic to allow encoding all constraints in a POF, we look for             in Propositional Logic. The three values of the tristate domain
operators that operate on tristate values, but only yield two dif-              correspond to three states that the two Propositional Variables can
ferent values. Such operators are not typically part of K3 or Ł3 and            encode. We explicitely prohibit the fourth possible state.
hence we exclude these logics from further consideration.                          Translating tristate constants, symbols, and operators works
   Instead we introduce our own new operators ⇒ and ⇔ and define                mostly the same way as in the paper by Zengler and Küchlin [11]
their semantics as shown in Table 1 and Table 2.                                with the two projections π0 and π1 as listed in table 3. The three
                                                                                constants y, m, and n map to corresponding combinations of > and
               Table 1.   Truth table of tristate* operator ⇔                   ⊥, and similarly each tristate symbol AT maps to two proposi-
                                                                                tional variables p0 (AT ) and p1 (AT ). Mapping the unary operator !
                                                                                is simple, but the entries for the binary operators operators && and
                             ⇔     0    1     2                                 || stick out by being generalized to n-ary operators. This is an opti-
                             0     2    0     0                                 mization to keep the size of the formulae smaller for long chains of
                             1     0    2     0                                 the same operands.
                             2     0    0     2                                    Of course, the Tristate* operators ⇔ and ⇒ also have to be
                                                                                translated into Propositional Logic. However, due to our definition
                                                                                of these operands this is trivial for π1 as they yield only 0 and 2, and
                                                                                π0 is intuitive. We use these two translations to also cover the usage
               Table 2.   Truth table of tristate* operator ⇒                   of the string operator = with tristate operands.
                                                                                   Finding a Propositional encoding for string symbols requires
                                                                                more work, because Propositional Logic is not well suited for encod-
                             ⇒     0    1     2
                                                                                ing arbitrary strings. For each string symbol, we iterate over our
                             0     2    2     2                                 POF in Tristate* Logic and collect all occurrences in expressions.
                             1     0    2     2
                                                                                In our method we consider only cases of equal strings and resulting
                             2     0    0     2
                                                                                other equalities and inequalities and define new propositional vari-
                                                                                ables PX S ←s to encode that the string variable X S has the value
   They express equality and “less than or equal to” inequality on              s and accordingly (X S = Y S ) to encode if X S and Y S have the
tristate values. We call this extended version of Tristate Logic                same value. This suffices, because for the configuration space the ac-
the Tristate* Logic.                                                            tual value of a string is not important. The same goes for int and
   Note that Tristate* Logic also contains the string operators =               hex symbols. Finally we use the mappings for = and ! to define the
and !=. They compare values from the string domain and also                     mappings for != .
yield one of the two values 0 and 2. Mixing tristate and string                    With these mappings we create our POF in Propositional Logic.




                                                                          135                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                          Proceedings of the 17th International Configuration Workshop
                                                                                                               September 10-11, 2015, Vienna, Austria
 As it encodes the configuration space of the Linux kernel, we call it            The fact that there are more regular variables in the POF in Tristate*
 L-POF.                                                                           Logic than there are declared symbols on the respective architecture
   To use modern SAT solvers we do one final step: Our L-POF is                   comes from the fact that it still contains expressions with undeclared
 not in CNF. So we use the Warthog Logic Framework6 to generate an                symbols. They are cleaned in the translation process into Proposi-
 equisatisfiable formula using the Plaisted-Greenbaum algorithm [6].              tional Logic. Each of our formulae in CNF contains roughly one
                                                                                  million Propositional variables. For some very basics analyses with
 7    Results
                                                                                           Table 6. Redundant or necessary symbols in Linux 4.0
 Our implementation consists of more than 7,000 lines of Java Code,
 using the features of Java SE 6. We measured execution times on a
 computer with an Intel Core2Quad Q6600 using the Java implemen-                           arch           redundant (⇔ n)    necessary (not ⇔ n)
 tation of the IcedTea project in version 6.1.13.7 on Gentoo Linux.                        alpha                    3223                     55
 Creating the L-POFs from the Kconfig files for all 30 architectures,                      arc                      4263                     63
 takes on average less than 3 seconds per architecture. Transforma-                        arm                      1691                     75
                                                                                           arm64                    3159                    135
 tion into CNF varies between 20 and 42 seconds, depending on the
 architecture.                                                                             avr32                    4522                      54
    To give a rough impression of size of the configuration space, we                      blackfin                 4430                      47
                                                                                           c6x                      4644                      42
 show the distribution of symbol types for each architecture in Table 4.                   cris                     3785                      34
 Across all architectures, the vast majority of symbols has one of the
                                                                                           frv                      3700                      37
 two types tristate and bool which has a major impact on the                               hexagon                  4625                      45
 form of the formula.                                                                      ia64                     3454                      74
                                                                                           m32r                     4937                      32
              Table 4. Distribution of symbol types in Linux 4.0                           m68k                     3741                      32
                                                                                           metag                    4301                      67
                                                                                           microblaze               3251                      71
       arch           tristate   bool    string    int   hex        total                  mips                     2773                      64

       alpha            6317     3402       34    192     27        9972                   mn10300                  3702                      39
       arc              6293     3352       36    194     29        9904                   nios2                    4428                      45
       arm              6371     4724       38    209     41       11383                   openrisc                 4424                      56
       arm64            6366     3511       36    195     27       10135                   parisc                   3499                      51

       avr32            6360     3462       36    193     30       10081                   powerpc                  2652                     94
       blackfin         6380     3676       36    702     43       10837                   s390                     4149                    107
       c6x              6292     3293       35    189     28        9837                   score                    7068                     36
       cris             6345     3436       45    254     78       10158                   sh                       3297                     67

       frv              6316     3378       35    192     28        9949                   sparc                    3201                      51
       hexagon          6292     3292       35    190     27        9836                   tile                     3633                      57
       ia64             6400     3519       36    195     27       10177                   um                       7368                      28
       m32r             6324     3361       34    194     31        9944                   unicore32                3541                      58

       m68k             6322     3453       35    192     37       10039                   x86                      2301                    138
       metag            6293     3363       37    193     28        9914                   xtensa                   3248                     43
       microblaze       6294     3335       37    195     32        9893                   globally                   135                      1
       mips             6391     3964       36    198     28       10617
       mn10300          6316     3426       35    199     33       10009
       nios2            6292     3303       36    191     36        9858
                                                                                  SAT solving, we use PicoSAT.7 Processing the CNF formula without
       openrisc         6292     3292       36    188     27        9835          any additional clauses takes PicoSAT between 6 and 10 seconds. We
       parisc           6325     3384       34    191     27        9961          search for redundant bool or tristate symbols, i.e. symbols that
       powerpc          6404     3933       37    205     40       10619          can never be active, and for symbols that are necessary, i.e. it is not
       s390             6313     3453       35    195     27       10023          possible to fully deactivate them. We find out if a symbol is redun-
       score            6292     3292       35    188     28        9835          dant by assuming that one of the two corresponding Propositional
       sh               6374     3640       37    198     34       10283          variables is true. If this is not satisfiable, then the symbol is always
       sparc            6370     3444       37    193     30       10074          inactive. Vice versa by assuming that both Propositional variables are
       tile             6306     3393       36    191     29        9955          false we find out whether a symbol must always be active. More than
       um               6297     3362       38    193     27        9917
       unicore32        6363     3425       36    191     27       10042
                                                                                  99.8% of the individual tests run in less than three seconds.
                                                                                     Table 6 shows the results of these tests. The reason for the high
       x86              6420     3781       40    206     32       10479
                                                                                  numbers of features that cannot be activated on each architecture is
       xtensa           6326     3379       41    194     29        9969
                                                                                  that there are many features that run only on few architectures, but the
                                                                                  corresponding Kconfig files are common for all architectures. Sym-
    Table 5 gives a rough overview of how big the formulae grow.                  bols that cannot be deactivated on the other side are less, but still a
 It comes to no surprise that the translation from Tristate* Logic                lot. They are symbols that are intentionally not deactivatable and in
 to Propositional Logic and the transformation into CNF using the                 general they do not represent selectable features, but basic aspects of
 Plaisted-Greenbaum algorithm both significantly increase the size.               an architecture.
 6 Warthog Logic Framework: github.com/warthog-logic/warthog                      7 PicoSAT: fmv.jku.at/picosat/




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                    136
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria
                                                  Table 3.   Translation rules for tristate* operators



    e0                   π0 (e0 )                                                    π1 (e0 )
    y                    >                                                           ⊥
    m                    ⊥                                                           >
    n                    ⊥                                                           ⊥
    AT                   p0 (AT )                                                    p1 (AT )
    !e                   ¬π0 (e) ∧ ¬π1 (e)                                           π
                                                                                     V1 (e)                                 W
    e0 && · · · && en    π0 (e0 ) ∧ · · · ∧ π0 (en )                                                 (π0 (ei ) ∨ π1 (ei )) ∧ i∈{0,...,n} π1 (ei )
                                                                                     Vi∈{0,...,n}                  W
    e0 || · · · || en    π0 (e0 ) ∨ · · · ∨ π0 (en )                                     i∈{0,...,n} (¬π0 (ei )) ∧   i∈{0,...,n} π1 (ei )
    e1 ⇔ e2              (π0 (e1 ) ↔ π0 (e2 )) ∧ (π1 (e1 ) ↔ π1 (e2 ))               ⊥
    e1 ⇒ e2              π0 (e2 ) ∨ ¬π0 (e1 ) ∧ (¬π1 (e1 ) ∨ π1 (e2 ))              ⊥
    AT = t               π0 AT ⇔ t                                                  ⊥
    AT = B T             π0 AT ⇔ B T                                                 ⊥
    XS = s               PX S ←s                                                     ⊥
    XS = Y S             PX S = Y S                                                  ⊥
    es1 != es2           ¬π0 (es1 = es2 )                                            ⊥

   To get meaningful results, we merge the results. We collect all               valid. Thus, we want to find the minimal set of constraints to remove
symbols that do not have any architecture that allows activating, and            or change. Such MaxSAT re-configuration use cases have been de-
we collect all symbols that are declared across all architectures, but           scribed in the context of automotive configuration in [10] and could
may not be deactivated on any of them. These numbers are in the                  be adopted for the Linux kernel configuration.
line globally. For most of the 135 symbols that cannot be activated,
this is actually the intention of the maintainer. The one tristate
symbol that can never be deactivated may intentionally only alternate
between the two other possible states.
   These results do not surprise as similar tests were also done in
the context of the VAMOS project and hence many problems have
already been uncovered and solved.


8    Conclusion
Our approach successfully leads to a precise product overview for-
mula. Although Linux has reached more than 10.000 features, our
implementation quickly creates the L-POF, a product overview for-
mula of the Linux kernel in Propositional Logic. Despite its consid-
erable size, fast SAT solving on the formula is in general possible. If
needed there is still much room for optimization.
   A future research direction could be to investigate the possibility
of further verification tests beyond redundant and necessary symbols,
e.g. specialized verification tests for a choice block analogously to
verification tests for positions of a Bill of Materials as it is done in
automotive configuration [8].
   As we do not use parts of LKC in our implementation, we now
have the option to extend our program to do fine-grained tests con-
sidering individual lines in Kconfig files and easily locate the origin
of inconsistencies.
   Another interesting topic is re-configuration. Although LKC does
not permit invalid configurations by disabling options during the con-
figuration process, it might be useful for users to select all wanted
options first without caring about the validation. Afterwards, if the
configuration is invalid, we can re-configure the selections of the user
in an optimal way, i.e. by solving a MaxSAT optimization problem.
   The reverse is also imaginable: If the selections of a user lead to
an invalid configuration, the user might want to know which config-
uration constraints have to change in order to make the configuration




                                                                           137                Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors
                                                                                           Proceedings of the 17th International Configuration Workshop
                                                                                                                September 10-11, 2015, Vienna, Austria
                                                             Table 5. Sizes of POFs for Linux 4.0



   arch              regular variables      auxiliary variables          total number of          variables in L-POF      variables in CNF      clauses in CNF
                             in POF in               in POF in          variables in POF
                       Tristate* Logic         Tristate* Logic         inTristate* Logic
   alpha                         10673                    48845                     59518                     117417                996839             1812856
   arc                           10602                    48481                     59083                     116538                954428             1718683
   arm                           11976                    55760                     67736                     134270               1299812             2849653
   arm64                         10824                    49640                     60464                     119333               1007563             1828031
   avr32                         10793                    49366                     60159                     118671               1001036             1816464
   blackfin                      11539                    51058                     62597                     123096               1026513             1861942
   c6x                           10548                    48174                     58722                     115799                949031             1708805
   cris                          10867                    49279                     60146                     118559                999006             1812066
   frv                           10666                    48722                     59388                     117126                990489             1797565
   hexagon                       10540                    48169                     58709                     115795                981542             1781667
   ia64                          10866                    49850                     60716                     119837               1010856             1834072
   m32r                          10643                    48681                     59324                     117039                993691             1804508
   m68k                          10717                    49136                     59853                     118115               1008800             1836987
   metag                         10605                    48535                     59140                     116671                955382             1720572
   microblaze                    10591                    48406                     58997                     116370                985108             1788040
   mips                          11249                    52034                     63283                     125090               1048937             1909971
   mn10300                       10721                    48974                     59695                     117738                994158             1804015
   nios2                         10566                    48235                     58801                     115951                950035             1710610
   openrisc                      10544                    48168                     58712                     115783                949121             1709044
   parisc                        10658                    48794                     59452                     117297                996712             1810111
   powerpc                       11247                    51964                     63211                     124935               1055822             1917736
   s390                          10699                    49084                     59783                     117997                998901             1813210
   score                         10539                    48168                     58707                     115783                949788             1710461
   sh                            10955                    50336                     61291                     121037               1020515             1854779
   sparc                         10774                    49327                     60101                     118582               1004762             1823946
   tile                          10655                    48748                     59403                     117195                990749             1798113
   um                            10606                    48550                     59156                     116723                998908             1821791
   unicore32                     10753                    49191                     59944                     118246                998333             1811566
   x86                           11135                    51280                     62415                     123314               1051478             1913811
   xtensa                        10674                    48786                     59460                     117278                993296             1802906

 References                                                                                  cial Intelligence for Engineering Design, Analysis and Manufacturing,
                                                                                             17(1), 75–97, (January 2003). Special issue on configuration.
                                                                                         [9] Reinhard Tartler, Mastering Variability Challenges in Linux and
  [1] Thorsten Berger, Steven She, Rafael Lotufo, Andrzej Wasowski,
                                                                ˛         and                Related Highly-Configurable System Software, Ph.D. dissertation,
      Krzysztof Czarnecki, ‘Variability Modeling in the Real: A Perspective                  Friedrich-Alexander-Universität Erlangen-Nürnberg, 2013.
      from the Operating Systems Domain’, in Proceedings of the IEEE/ACM                [10] Rouven Walter, Christoph Zengler, and Wolfgang Küchlin, ‘Applica-
      International Conference on Automated Software Engineering, ASE                        tions of MaxSAT in automotive configuration’, in Proceedings of the
      ’10, pp. 73–82, New York, NY, USA, (2010). ACM.                                        15th International Configuration Workshop, eds., Michel Aldanondo
  [2] The Many Valued and Nonmonotonic Turn in Logic, Volume 8 (Hand-                        and Andreas Falkner, pp. 21–28, Vienna, Austria, (August 2013).
      book of the History of Logic), eds., Dov M. Gabbay and John Woods,                [11] Christoph Zengler and Wolfgang Küchlin, ‘Encoding the Linux Ker-
      North Holland, 1 edn., 8 2007.                                                         nel Configuration in Propositional Logic’, in Proceedings of the 19th
  [3] Albert Haag, ‘Sales configuration in business processes’, IEEE Intelli-                European Conference on Artificial Intelligence (ECAI 2010) Workshop
      gent Systems, 13(4), 78–85, (July 1998).                                               on Configuration, eds., Lothar Hotz and Alois Haselböck, pp. 51–56,
  [4] Wolfgang Küchlin and Carsten Sinz, ‘Proving consistency assertions                     (2010).
      for automotive product data management’, J. Automated Reasoning,
      24(1–2), 145–163, (February 2000).
  [5] Sarah Nadi, Variability Anomalies in Software Product Lines, Ph.D. dis-
      sertation, University of Waterloo, 2014.
  [6] David A. Plaisted and Steven Greenbaum, ‘A structure-preserving
      clause form translation’, Journal of Symbolic Computation, 2(3), 293–
      304, (September 1986).
  [7] Steven She and Thorsten Berger, ‘Formal semantics of the kconfig lan-
      guage’, Technical note, University of Waterloo, 24, (2010).
  [8] Carsten Sinz, Andreas Kaiser, and Wolfgang Küchlin, ‘Formal meth-
      ods for the validation of automotive product configuration data’, Artifi-




Juha Tiihonen, Andreas Falkner and Tomas Axling, Editors                          138
Proceedings of the 17th International Configuration Workshop
September 10-11, 2015, Vienna, Austria