<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>International Configuration Workshop
September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Formal Analysis of the Linux Kernel Configuration with SAT Solving</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Martin Walch</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rouven Walter</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wolfgang Küchlin</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <volume>1</volume>
      <fpage>0</fpage>
      <lpage>11</lpage>
      <abstract>
        <p>The Linux kernel is a highly configurable software system. The aim of this paper is to develop a formal method for the analysis of the configuration space. We first develop a Linux product overview formula (L-POF), which is a Boolean formula representing the high-level configuration constraints of the kernel. Using SAT solving on this L-POF, we can then answer many questions, such as which options are possible, mandatory, or impossible for any of the processor architectures for which the kernel may be configured. Other potential applications include building a configurator or counting the number of kernel configurations. Our approach is analogous to the methods we use for automobile configuration. However, in the Linux case the configuration options (e.g. the individual device drivers) are represented by symbols in Tristate Logic, a specialized three-valued logic system with several different data types, and the configuration constraints are encoded in a somewhat arcane language. We take great care to compile the L-POF directly from the files that hold the configuration constraints in order to achieve maximum flexibility and to be able to trace results directly back to the source. Linux is a kernel for a broad range of platforms with highly versatile configurations of peripheral components. Static configuration at compile time helps to adapt to the different requirements. The central tool for configuring this is LinuxKernelConf, abbreviated LKC. The input to LKC uses a domain specific language to describe configuration constraints. A common alternative name for LKC is Kconfig, and they are often used interchangeably. In this document, we refer to the configuration system as LKC and we denote the language as Kconfig. The input is large and stored in files which we call Kconfig files. They continuously change as kernel development goes on. Automatic checks for semantic consistency on Kconfig files are desirable, but LKC has no such checks implemented. At the Workshop on Configuration 2010 in Lisbon, Zengler and Küchlin presented an approach [11] to encode the whole Linux kernel configuration in Propositional Logic. Conceptually this work parses the Kconfig files and stores the relevant information in a database. Subsequently, this database is translated into a product overview formula2, abbreviated POF, in Propositional Logic. While it demonstrates central ideas and shows the technical feasability of 1 Symbolic Computation Group, WSI Informatics, Universität Tübingen, Germany, www-sr.informatik.uni-tuebingen.de 2 A POF is a single Boolean formula which captures all configuration constraints of a high-level configuration model (cf. [3]). Historically it has been introduced in [4] to capture the high-level configuration model of Mercedes-Benz, which is called “Product overview” (German: Produktübersicht).</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>the project, it is a first prototype with only a simplified view on the
Kconfig files. As a consequence, the results were of the
proof-ofconcept type and of very limited use for verification purposes.</p>
      <p>
        The comprehensive VAMOS project3 at
Friedrich-AlexanderUniversität Erlangen-Nuremberg has lead to numerous results and
publications, including the uncovering of hundreds of real world
bugs. It also analyses the Linux kernel configuration with the means
of Propositional Logic, but goes much further by considering the
actual effects on the kernel code and applying the tools to other projects
that use LKC as well. The PhD Thesis of Reinhard Tartler [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] from
2013 gives a detailed overview over the model, the implemented
tools, and most of the applications and results.
      </p>
      <p>
        The PhD Thesis of Sarah Nadi [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] from 2014 picks up the VAMOS
project and extends it to not only consider the Kconfig files and the
kernel code, but to additionally take the build system into account.
Even more, it extracts configuration constraints from the
implementation.
      </p>
      <p>In this work, we focus solely on the Kconfig files. Although the
VAMOS model proves to yield good results, it does not aim at being
exact and relies on parts of LKC. We present a fairly precise model
and translation process into a product overview formula in
Propositional Logic with the goal to account for all details that are relevant in
real-world use cases. Our implementation works independently from
LKC and we show the results from running it against Linux 4.0.</p>
      <p>
        This work is loosely based on the paper by Zengler and Küchlin
from 2010 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] in that it picks up and uses central ideas, but
elaborates on many more details.
      </p>
      <p>There is no precise specification of Kconfig. So we consider what
information is available in the documentation, the implementation,
and the way the input language is used. Section 2 gives a rough
overview over this input language.</p>
      <p>Our translation uses several intermediate stages. First we create
what we call the Zengler Model in Section 3. In this step we abstract
from technical details of reading the configuration input and isolate
the data that is relevant for our purposes.</p>
      <p>The Zengler Model retains some parts of the input structure that
have an impact on the meaning. In Section 4, we transform it into the
Attributes Model, resolving these parts and switching from a
representation that focuses on input structure to one that revolves around
the constraints.</p>
      <p>LKC uses a three-valued logic, called Tristate Logic, that is
uncommon in academic discourse. We take a look at this logic system
in Section 5 and introduce some extensions to it. We then proceed by
generating a product overview formula in this extended logic from
the Attributes Model, encoding the set of all valid Linux kernel
con3</p>
      <p>VAMOS: Variability Management
www4.cs.fau.de/Research/VAMOS/
in</p>
      <p>Operating</p>
      <p>Systems,
figurations.</p>
      <p>Eventually, we translate the product overview formula from the
three-valued logic into Propositional Logic in Section 6. With
common transformation into CNF and SAT solving, we can reason about
the set of valid configurations, yielding first results in Section 7.
2</p>
    </sec>
    <sec id="sec-2">
      <title>The Linux Kernel Configuration System</title>
      <p>Kconfig is a specialized input language for LKC that describes
available features in terms of symbols and offers several different
means to encode constraints and interdependencies. Kernel
developers maintain this information in hierarchically organized Kconfig
files.</p>
      <p>Listing 1. Example Kconfig file
config T0
tristate
prompt " f e a t u r e T0 "
select T1
depends on T2 &amp;&amp; !C0
config T1
tristate
prompt " f e a t u r e T1 " if C0
config T2
tristate
default y
choice
tristate
prompt "CHOICE 0 "
config C0</p>
      <p>prompt " f e a t u r e C0 "
config C1</p>
      <p>prompt " f e a t u r e C1 "
endchoice
menu " submenu "</p>
      <p>visible if T1
config S0
string
default " d e f a u l t s t r i n g "
config B0
bool
prompt " c u s t o m S0 "
if B0
config S0</p>
      <p>prompt " S0 "
endif
endmenu</p>
      <p>LKC reads these files with a scanner and a parser that are
generated using flex4 and GNU Bison.5 Exploring the Bison grammar is
clearly beyond the scope of this document as it consists of more than
100 production rules. Suffice it to say that the grammar we use very
closely ressembles the grammar of LKC.</p>
      <p>
        To give an overview over the relevant parts of the configuration
language, we confine ourselves to explaining an artificial example. A
concise formal description of the relevant parts was done by She and
Berger [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>Listing 1 contains most of the relevant language features. The
central element everything else is built around is the configuration
block. Its purpose is to declare and to describe symbols. The keyword
config starts a configuration block and is followed by the name of
the symbol it refers to. A configuration block contains one or more
lines that further specify what we call properties of that symbol. Each
symbol must have at least one configuration block. In practice, for the
majority of symbols there is exactly one configuration block for each
symbol, but there may be more. In Linux 4.0, there are up to six per
symbol.</p>
      <p>Each symbol has one of the five types tristate, bool,
string, int, and hex. In our example, the symbols T0, T1, and
T2 have the type tristate, the symbol B0 has the type bool, and
the symbol S0 has the type string. We call a symbol declared if it
has at least one configuration block and one of the five types.</p>
      <p>On actual configuration, each symbol is assigned an
unambiguous value from their respective domains. The domain of tristate
symbols is f0; 1; 2g and the domain of bool symbols is f0; 2g. The
purpose of the three-valued tristate type is to encode the three
possible activation states that apply to many features in the kernel:
Turn it off (0), compile it as a runtime-loadable module (1) or
compile it into the kernel (2). The bool type is the same except that it is
missing the value 1 for the module state.</p>
      <p>To identify the three different states in syntax, there are the three
constants n, m, and y, which evaluate to 0, 1, and 2, respectively.</p>
      <p>The domain of string symbols is the set of all valid strings.
Similar to the domain of bool being a subset of the domain of
tristate, the domains of int and hex symbols both are subsets
of the domain of string symbols, in that their elements are strings
that read as valid integers or hexadecimal numbers, respectively.</p>
      <p>There are different types of dependencies that affect properties,
thus most properties can be active or inactive. The symbol type is the
only unconditional property. It is always active and cannot
dynamically change during configuration. It is also the only property that is
mandatory for every symbol. However, to allow the user to directly
assign a value, the symbol needs a prompt property. In our
example, there are several such prompt properties, one of them for the
symbol T0. The prompt keyword is followed by a string
parameter "feature T0". This parameter is mandatory and shows up as short
description in the configuration interface.</p>
      <p>The symbol T0 also has a select property with the argument
T1. This property is only valid with bool and tristate symbols.
The select property sets up a direct relation between the values of
two symbols. The value of the selected symbol is always at least as
high as the value of the symbol that the select property belongs
to. So, in our example the value of T1 is always at least as high as
the value of T0. The documentation calls this a reverse dependency.</p>
      <p>The last line of the configuration block of T0 starts with depends
on and is followed by an expression in Tristate Logic T2 &amp;&amp; !C0.
4 flex: The Fast Lexical Analyzer, flex.sourceforge.net
5 GNU Bison, www.gnu.org/software/bison/
This is a dependency that applies to the whole configuration block,
i.e. in our case it is a dependency to the prompt and to the select
property. The actual dependency is encoded in the expression. It has
no direct influence on the value of the symbol, but rather on the
properties which in turn set up constraints to the value.We take a closer
look at Tristate Logic and the mechanisms of dependencies in
section 5.</p>
      <p>It is possible to add a dependency to a single property by
appending an expression with if to the respective line, like for the
configuration prompt of T1.</p>
      <p>A default property is used to non-bindingly suggest a value,
but it is also used to automatically set values of symbols that have
no active prompt. In our example, the symbol T2 uses the trivial
expression y as default value, but more complicated expressions
are possible.</p>
      <p>Our example contains a choice block. This language
construct encloses a series of configuration blocks with choice and
endchoice, constituting a set of symbols that logically exclude
each other. This is useful for features that serve the same purpose
and which therefore naturally cannot be simultaneously active, like
e.g. the compression algorithm of the kernel image or the preemption
model. Like a symbol, a choice block has a type, but only the two
data types bool and tristate are valid. A choice block of type
bool enforces that exactly one of the enclosed symbols is set to 2.
The tristate type relaxes this strict constraint and allows to not
assign 2 to any of the symbols, while setting an arbitrary number of
symbols to 1. There is a property optional that even allows setting
all symbols to 0. In contrast to a symbol, a choice block must have a
prompt property. There may also be default properties,
suggesting one of the enclosed symbols, and dependencies using depends
on, which analogously to configuration blocks apply to the whole
choice block.</p>
      <p>A hierarchical menu can be constructed using the menu blocks.
Its primary purpose is organizing the features to ease navigation. Just
like the prompt property, the menu keyword is followed by a string
that serves as user visible label for the menu. For our point of view it
is important that there may be dependencies added using depends
on that apply to the whole menu, and a second type of dependency
that is set up with visible if. That second kind of dependency
toggles only the parts that concern direct configurability by the user,
but leaves all other properties unaffected.</p>
      <p>Finally it is also possible to add dependencies to an arbitrary
sequence of configuration blocks, choice blocks, and menus as long as
it does not violate the logical hierarchy. This is done with the
enclosing keywords if and endif with the dependency expression right
behind if.</p>
      <p>Our example covers those language constructs that we regard as
most important for variability, but there is more: For symbols with
the int or hex type, there is a range property that fixes a lower
and an upper bound for the value. Both bounds may depend on freely
changeable values of other symbols. In practice nearly all bounds
have constant values and those few bounds that are variable can be
manually checked for inconsistencies. From our perspective the
impact of range properties on the variability of the overall
configuration is mostly negligible in current versions of Linux. Therefore, we
incorporate range properties in our model, but ignore them when
creating a product overview formula as we do not expect any
observable consequences.</p>
      <p>Another language feature that does not occur in our example is
the generic option line in configuration blocks, which allows
extending the language with only minimal changes to the grammar and
possibly gaining forward compatibility. It starts with the option
keyword, followed by one of a few option names, and depending on
that, the meaning varies and an argument may be appended. As of
Linux 4.0, two different options are defined that may be relevant in
our context: modules and env. The modules option makes only
sense with a bool symbol, and it may be only used once in the
whole configuration. It associates the corresponding symbol with the
availability of support for loading modules, i.e. as a consequence,
deactivating that symbol basically prohibits assigning the value 1
from the tristate domain. The env option is usually only used
for string symbols that have no prompt property as it imports
a value from the runtime environment of the system the
configuration system is running on. Only very few symbols use this option at
all and even fewer have an effect on the configuration that is worth
mentioning. Still, as we want to be precise, we account for them, too.</p>
      <p>From our example it is also not apparent how Kconfig files are
hierarchically organized. It works similarly to #include directives
of the C preprocessor: A file inclusion is done with the keyword
string and a string argument. The string argument is treated as
path and the content of the file at that path is pasted at the position of
the string line.</p>
      <p>Kconfig still offers more constructs that we have not mentioned,
but they are irrelevant at this point.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Zengler Model</title>
      <p>Linux 4.0 supports 30 different main architectures. Each architecture
has its own tree of Kconfig files, but there are big overlaps across all
architectures by using common files. Accordingly we want to be able
to reason about the configurations across all architectures. However,
capturing several architectures in one data structure without losing
precision is very complicated as this requires keeping track of which
files are included for each architecture and in which order they are
read. Therefore, we create a separate model for each architecture and
consider them together if needed.</p>
      <p>
        By considering one fixed architecture, we can deduce the full
inclusion hierarchy of Kconfig files for that specific architecture.
Accordingly, our parser moves through the file hierarchy in the same
way as LKC does, without the need to store which files we
include. However, it produces different data structures. We create two
databases, a symbol database DS , and a choice database DC . They
are heavily extended versions of the databases created by Zengler and
Küchlin [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], hence we call these two databases the Zengler Model.
      </p>
      <p>The symbol database contains symbol descriptors. Each
configuration block corresponds to a symbol descriptor. They are grouped by
the symbol they belong to. So, our symbol database is a set of pairs
DS = (s; de!sc) where s is a symbol and de!sc is a list of symbol
descriptors desc with</p>
      <p>desc = (t; pmp!t; s!el; d!ef ; rang!e; dep; opt; kdesc)
pmpt = (ep; kp)
sel = (ss; es; ks)
def = (ev; ed; kd)
range = (els; esh; er; kr)
dep = (e!if ; ed!ep; ev!is)
opt = (benv; bmod):</p>
      <p>The type t is one of the five types or it is a special type “unknown”
if the configuration block has no type property. It suffices if only one
descriptor holds a regular type.</p>
      <p>The properties prompt, select, default, and range appear
in the descriptor as the tuples pmpt, sel, def , and range,
respectively. As the input order plays a role, each of them and the descriptor
contain unique indices kdesc, kp, ks, kd, and kr that are ascending in
input order.</p>
      <p>The string argument of a prompt property is unimportant for the
configuration logic, so we only store the expression of the optional
if dependency ep. If there is no such dependency, we store a
special symbol as placeholder for an empty expression. Analogously,
the optional if dependencies for select, default, and range
properties are stored as es, ed, and er.</p>
      <p>The target of a select property is a single symbol es. A
default value may be the result of a non-trivial expression ev. The
lower bound of a range property els may be either another symbol
or a constant value. The same goes for the upper bound es .
h</p>
      <p>We do not store any menus or enclosing ifs explicitely. Instead we
propagate their dependencies to the contained descriptors and store
the dependency expressions in lists that distinguish between the
different types. The dependencies of surrounding ifs are stored in e!if .
Those of depends on dependencies add up to ed!ep, including any
such dependencies contained in the configuration block under
consideration. The third list ev!is contains the visible if
dependencies from menus.</p>
      <p>The pair of bits opt indicates if there is a module or env option.
Even though env carries an argument in the configuration block, we
only care whether it is present.</p>
      <p>Our choice database DC is a set of tuples which we call choice
descriptors:</p>
      <p>(t; bopt; pmp!t; dep; kde!sc; kc)
Each choice block translates into a choice descriptor. The type t
is one of bool, tristate or “unknown”. The bit bopt indicates
whether the choice block carries the optional flag. Just like in a
symbol descriptor, there is a list of pmpt tuples, dependencies lists dep,
and an index kc. Rather than storing the symbol names of the
con!
tained symbol descriptors, we create a list of their indices kdesc.</p>
      <p>In contrast to the symbol descriptors, the default properties of
choice blocks have no influence on variability and we ignore them.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Attributes Model</title>
      <p>Although the Zengler Model is an abstraction from the underlying
Kconfig files, storing the relevant information in a normalized way,
its focus lies on the input structure: The symbol database holds
individual symbol descriptors desc and lists of dependencies dep
reflecting artifacts from the input structure. Furthermore, we have yet
to determine the final types of the symbols and choices and which
symbols a choice includes as this is not trivial in all cases.</p>
      <p>In the next step, we resolve all this and create a model that
focuses on the effective impact of the descriptor properties on the
symbols. For a better distinction from the properties of the descriptors in
the Zengler Model, we refer to their resulting equivalents in the new
model as attributes. Therefore we call the new model the Attributes
Model.</p>
      <p>First we associate the attributes directly with the dependencies
from dep that control them. As already mentioned in Section 2, the
dependencies in ev!is do not affect all types of attributes. In fact, they
control only our prompt attributes aP . We define them as
aP = (Ep; kp) = (e!if [ ed!ep [ ev!is [ ep; kp)</p>
      <p>We write E to denote sets of expressions. The other attributes are
only affected by the other dependencies, hence we define them as
follows:
aS = (Es; s; ks)
aD = (Ed; ev; kd)
aR = (Er; els; esh; kr)
=
=
=
(e!if [ ed!ep [ es; s; ks)
(e!if [ ed!ep [ ed; ev; kd)
(e!if [ ed!ep [ er; els; esh; kr)
With these definitions we collect for each symbol s the prompt
attributes aP in a set P (s), the default attributes aD in a set D(s),
and the range attributes aR in a set R(s). We also create a set S(s)
for each symbol, but note that the symbol in the definition of aS is
the symbol of the descriptor that contains the property and not the
symbol ss that is the select target. Accordingly we add each aS
to the set of the target symbol. If there is no descriptor of the target
symbol we discard the attribute.</p>
      <p>To determine the symbol type that results from the types in the
corresponding descriptors, we take the type of the symbol
descriptor with the lowest kdesc that is not “unknown”. If there is no such
descriptor we also set the type of the symbol to “unkown”.</p>
      <p>Finally we concatenate the opt bitvectors component-wise with a
logical or and store it with the symbol.</p>
      <p>We do not need dep and kdesc anymore and do not transfer them
into the Attributes Model.</p>
      <p>Next we transform the choice descriptors from the Zengler Model
into what we call choice groups. Like for symbols, we determine the
type of the choice group. If t in the choice descriptor is bool or
tristate, then this is also the type of the choice group. However,
it is a frequently used feature to skip the explicit type declaration
in choice blocks, resulting in choice descriptors with the special type
“unknown”. In that case the type is taken from the first symbol that is
enclosed in the choice descriptor and has a regular type. If any
symbol in the choice descriptor is lacking a regular type then it inherits
the type of the choice group.</p>
      <p>Now that we have completed the type resolution for choice groups,
we determine which symbols are part of the choice group. This is not
trivial as not all symbols that correspond to symbol descriptors in the
choice descriptor are necessarily transferred. Symbols can be moved
into their own submenu by depending on a symbol that is
immediately above, excluding them from the choice group. We actually see
this constellation intentionally used in Linux 4.0 and have to process
it adequately. LKC involves an extensive logic to determine whether
to move a symbol into a submenu, but a simple heuristic suffices to
correctly capture any real-world case.</p>
      <p>As LKC ignores the attributes S(s) and D(s) of all symbols of
the choice group, we clear them if they have any content.</p>
      <p>To complete the choice group, we generate a prompt attribute
aP the same way we do from a symbol descriptor and we keep the
optional bit bopt.</p>
      <p>Our new databases DS0 and DC0 are now
!
DS0 = (s; t; P (s); S(s); D(s); R(s); opt)</p>
      <p>!</p>
      <p>DC0 = (t; bopt; aP ; !s)
5</p>
    </sec>
    <sec id="sec-5">
      <title>Tristate* Logic and POF</title>
      <p>Our next step is to translate the Attributes Model into a coherent
product overview formula. While our goal is to arrive at a POF in
Propositional Logic, we prefer to split this translation into two steps.
First we define Tristate* Logic, an extension to Tristate Logic that
we specifically design to create an initial POF. Then we translate the
POF from Tristate* Logic into Propositional Logic in Section 6.</p>
      <p>The following grammar describes the general syntax of
expressions in Tristate Logic:
hexpressioni
! hexpression symboli
| hexpression symboli‘=’hexpression symboli
| hexpression symboli‘!=’hexpression symboli
| ‘(’ hexpressioni‘)’
| ‘!’hexpressioni
| hexpressioni‘&amp;&amp;’hexpressioni
| hexpressioni‘||’hexpressioni
hexpression symboli ! symbol name</p>
      <p>| constant value</p>
      <p>There are five operators and there are parentheses to override
operator precedences. We distinguish two groups of operators: The
tristate operators !, &amp;&amp;, and ||, and the string operators =
and !=. Both groups operate on their respective domains.</p>
      <p>
        The tristate operators and their domain form a three-valued
logic like those of Łukasiewicz and Kleene. A comprehensive
overview of these logics has been done by Gabbay and Woods [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        In fact, the three base operators !, &amp;&amp;, and || correspond to the
operators :, ^ and _ in K3 and Ł3 from Kleene and Łukasiewicz.
This observation has already been made in 2010 by Berger et al. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
However, Tristate Logic itself is not expressive enough for a full POF.
We need to extend it for our purposes.
      </p>
      <p>The nature of our dependencies is Boolean and not three-valued,
because either they are met or they are not. There is no third value
like the module state in Tristate Logic. Hence, when extending
Tristate Logic to allow encoding all constraints in a POF, we look for
operators that operate on tristate values, but only yield two
different values. Such operators are not typically part of K3 or Ł3 and
hence we exclude these logics from further consideration.</p>
      <p>Instead we introduce our own new operators ) and , and define
their semantics as shown in Table 1 and Table 2.</p>
      <p>They express equality and “less than or equal to” inequality on
tristate values. We call this extended version of Tristate Logic
the Tristate* Logic.</p>
      <p>Note that Tristate* Logic also contains the string operators =
and !=. They compare values from the string domain and also
yield one of the two values 0 and 2. Mixing tristate and string
operators and symbols in expressions is actually a feature, leading to
frequent conversions between the two domains.</p>
      <p>This may become quite complex. Consider the short expression
A=B. If both, A and B, are declared string symbols, their values
are compared, yielding 2 if they are exactly identical and 0 otherwise.
However, if A is a tristate symbol and B is undeclared, then the
values 0, 1, 2 of A are interpreted as the strings “n”, “m”, and “y” and
B is interpreted as the string “B”, i.e. in this case a string
comparison is done on these letters, always yielding 0. We take all these
details into account when producing our POF in Tristate* Logic.</p>
      <p>Encoding the constraints of bool and tristate symbols that
originate from their respective attributes, works in an indirect way:
For each of these symbols, we add two auxiliary variables which
represent a lower and an upper limit to the value of the symbol in
consideration, and, using the ) operator, we append the constraint
that the value of the symbol must not exceed the values set up by the
auxiliary variables.</p>
      <p>This is in contrast to encoding the choice groups: We encode the
exclusiveness of symbols with expressions that directly relate to the
symbols instead of their associated auxiliary variables.</p>
      <p>Finally, we also take account of the mode without module support
by adding two different subformulae for tristate symbols and
tristate choice groups which depend on the symbol that has the
bmod bit set.
6</p>
    </sec>
    <sec id="sec-6">
      <title>POF in Propositional Logic</title>
      <p>Translating the POF from Tristate* Logic into Propositional Logic
is mostly straightforward. We encode each symbol with the type
tristate or bool and all auxiliary variables using two variables
in Propositional Logic. The three values of the tristate domain
correspond to three states that the two Propositional Variables can
encode. We explicitely prohibit the fourth possible state.</p>
      <p>
        Translating tristate constants, symbols, and operators works
mostly the same way as in the paper by Zengler and Küchlin [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
with the two projections 0 and 1 as listed in table 3. The three
constants y, m, and n map to corresponding combinations of &gt; and
?, and similarly each tristate symbol AT maps to two
propositional variables p0(AT ) and p1(AT ). Mapping the unary operator !
is simple, but the entries for the binary operators operators &amp;&amp; and
|| stick out by being generalized to n-ary operators. This is an
optimization to keep the size of the formulae smaller for long chains of
the same operands.
      </p>
      <p>Of course, the Tristate* operators , and ) also have to be
translated into Propositional Logic. However, due to our definition
of these operands this is trivial for 1 as they yield only 0 and 2, and
0 is intuitive. We use these two translations to also cover the usage
of the string operator = with tristate operands.</p>
      <p>Finding a Propositional encoding for string symbols requires
more work, because Propositional Logic is not well suited for
encoding arbitrary strings. For each string symbol, we iterate over our
POF in Tristate* Logic and collect all occurrences in expressions.
In our method we consider only cases of equal strings and resulting
other equalities and inequalities and define new propositional
variables PXS s to encode that the string variable XS has the value
s and accordingly (XS = Y S ) to encode if XS and Y S have the
same value. This suffices, because for the configuration space the
actual value of a string is not important. The same goes for int and
hex symbols. Finally we use the mappings for = and ! to define the
mappings for != .</p>
      <p>With these mappings we create our POF in Propositional Logic.
135
As it encodes the configuration space of the Linux kernel, we call it
L-POF.</p>
      <p>
        To use modern SAT solvers we do one final step: Our L-POF is
not in CNF. So we use the Warthog Logic Framework6 to generate an
equisatisfiable formula using the Plaisted-Greenbaum algorithm [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
7
      </p>
    </sec>
    <sec id="sec-7">
      <title>Results</title>
      <p>Our implementation consists of more than 7,000 lines of Java Code,
using the features of Java SE 6. We measured execution times on a
computer with an Intel Core2Quad Q6600 using the Java
implementation of the IcedTea project in version 6.1.13.7 on Gentoo Linux.
Creating the L-POFs from the Kconfig files for all 30 architectures,
takes on average less than 3 seconds per architecture.
Transformation into CNF varies between 20 and 42 seconds, depending on the
architecture.</p>
      <p>To give a rough impression of size of the configuration space, we
show the distribution of symbol types for each architecture in Table 4.
Across all architectures, the vast majority of symbols has one of the
two types tristate and bool which has a major impact on the
form of the formula.
The fact that there are more regular variables in the POF in Tristate*
Logic than there are declared symbols on the respective architecture
comes from the fact that it still contains expressions with undeclared
symbols. They are cleaned in the translation process into
Propositional Logic. Each of our formulae in CNF contains roughly one
million Propositional variables. For some very basics analyses with
SAT solving, we use PicoSAT.7 Processing the CNF formula without
any additional clauses takes PicoSAT between 6 and 10 seconds. We
search for redundant bool or tristate symbols, i.e. symbols that
can never be active, and for symbols that are necessary, i.e. it is not
possible to fully deactivate them. We find out if a symbol is
redundant by assuming that one of the two corresponding Propositional
variables is true. If this is not satisfiable, then the symbol is always
inactive. Vice versa by assuming that both Propositional variables are
false we find out whether a symbol must always be active. More than
99:8% of the individual tests run in less than three seconds.</p>
      <p>Table 6 shows the results of these tests. The reason for the high
numbers of features that cannot be activated on each architecture is
that there are many features that run only on few architectures, but the
corresponding Kconfig files are common for all architectures.
Symbols that cannot be deactivated on the other side are less, but still a
lot. They are symbols that are intentionally not deactivatable and in
general they do not represent selectable features, but basic aspects of
an architecture.</p>
      <p>To get meaningful results, we merge the results. We collect all
symbols that do not have any architecture that allows activating, and
we collect all symbols that are declared across all architectures, but
may not be deactivated on any of them. These numbers are in the
line globally. For most of the 135 symbols that cannot be activated,
this is actually the intention of the maintainer. The one tristate
symbol that can never be deactivated may intentionally only alternate
between the two other possible states.</p>
      <p>These results do not surprise as similar tests were also done in
the context of the VAMOS project and hence many problems have
already been uncovered and solved.
8</p>
    </sec>
    <sec id="sec-8">
      <title>Conclusion</title>
      <p>Our approach successfully leads to a precise product overview
formula. Although Linux has reached more than 10.000 features, our
implementation quickly creates the L-POF, a product overview
formula of the Linux kernel in Propositional Logic. Despite its
considerable size, fast SAT solving on the formula is in general possible. If
needed there is still much room for optimization.</p>
      <p>
        A future research direction could be to investigate the possibility
of further verification tests beyond redundant and necessary symbols,
e.g. specialized verification tests for a choice block analogously to
verification tests for positions of a Bill of Materials as it is done in
automotive configuration [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>As we do not use parts of LKC in our implementation, we now
have the option to extend our program to do fine-grained tests
considering individual lines in Kconfig files and easily locate the origin
of inconsistencies.</p>
      <p>Another interesting topic is re-configuration. Although LKC does
not permit invalid configurations by disabling options during the
configuration process, it might be useful for users to select all wanted
options first without caring about the validation. Afterwards, if the
configuration is invalid, we can re-configure the selections of the user
in an optimal way, i.e. by solving a MaxSAT optimization problem.</p>
      <p>
        The reverse is also imaginable: If the selections of a user lead to
an invalid configuration, the user might want to know which
configuration constraints have to change in order to make the configuration
valid. Thus, we want to find the minimal set of constraints to remove
or change. Such MaxSAT re-configuration use cases have been
described in the context of automotive configuration in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] and could
be adopted for the Linux kernel configuration.
arch
138
regular variables
      </p>
      <p>in POF in
Tristate* Logic
auxiliary variables</p>
      <p>in POF in
Tristate* Logic
total number of variables in L-POF
variables in POF
inTristate* Logic
variables in CNF clauses in CNF</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Thorsten</given-names>
            <surname>Berger</surname>
          </string-name>
          , Steven She, Rafael Lotufo, Andrzej Wa˛sowski, and Krzysztof Czarnecki, '
          <article-title>Variability Modeling in the Real: A Perspective from the Operating Systems Domain'</article-title>
          ,
          <source>in Proceedings of the IEEE/ACM International Conference on Automated Software Engineering, ASE '10</source>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>82</lpage>
          , New York, NY, USA, (
          <year>2010</year>
          ). ACM.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] The Many Valued and Nonmonotonic Turn in Logic</article-title>
          , Volume
          <volume>8</volume>
          (
          <article-title>Handbook of the History of Logic)</article-title>
          , eds.,
          <string-name>
            <surname>Dov</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Gabbay</surname>
          </string-name>
          and John Woods, North Holland,
          <volume>1</volume>
          <fpage>edn</fpage>
          ., 8
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Albert</surname>
            <given-names>Haag,</given-names>
          </string-name>
          '
          <article-title>Sales configuration in business processes'</article-title>
          ,
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>13</volume>
          (
          <issue>4</issue>
          ),
          <fpage>78</fpage>
          -
          <lpage>85</lpage>
          , (
          <year>July 1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Wolfgang</given-names>
            <surname>Küchlin</surname>
          </string-name>
          and Carsten Sinz, '
          <article-title>Proving consistency assertions for automotive product data management'</article-title>
          ,
          <source>J. Automated Reasoning</source>
          ,
          <volume>24</volume>
          (
          <issue>1-2</issue>
          ),
          <fpage>145</fpage>
          -
          <lpage>163</lpage>
          , (
          <year>February 2000</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Sarah</given-names>
            <surname>Nadi</surname>
          </string-name>
          ,
          <source>Variability Anomalies in Software Product Lines, Ph.D. dissertation</source>
          , University of Waterloo,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>David</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Plaisted</surname>
          </string-name>
          and Steven Greenbaum, '
          <article-title>A structure-preserving clause form translation'</article-title>
          ,
          <source>Journal of Symbolic Computation</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ),
          <fpage>293</fpage>
          -
          <lpage>304</lpage>
          , (
          <year>September 1986</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Steven</given-names>
            <surname>She</surname>
          </string-name>
          and Thorsten Berger, '
          <article-title>Formal semantics of the kconfig language'</article-title>
          , Technical note, University of Waterloo,
          <volume>24</volume>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Carsten</given-names>
            <surname>Sinz</surname>
          </string-name>
          , Andreas Kaiser, and Wolfgang Küchlin, '
          <article-title>Formal methods for the validation of automotive product configuration data'</article-title>
          ,
          <source>Artificial Intelligence for Engineering Design, Analysis and Manufacturing</source>
          ,
          <volume>17</volume>
          (
          <issue>1</issue>
          ),
          <fpage>75</fpage>
          -
          <lpage>97</lpage>
          , (
          <year>January 2003</year>
          ).
          <article-title>Special issue on configuration</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Reinhard</given-names>
            <surname>Tartler</surname>
          </string-name>
          ,
          <article-title>Mastering Variability Challenges in Linux and Related Highly-Configurable System Software</article-title>
          ,
          <source>Ph.D. dissertation</source>
          ,
          <string-name>
            <surname>Friedrich-Alexander-Universität</surname>
          </string-name>
          Erlangen-Nürnberg,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Rouven</surname>
            <given-names>Walter</given-names>
          </string-name>
          , Christoph Zengler, and Wolfgang Küchlin, '
          <article-title>Applications of MaxSAT in automotive configuration'</article-title>
          ,
          <source>in Proceedings of the 15th International Configuration Workshop</source>
          , eds.,
          <source>Michel Aldanondo and Andreas Falkner</source>
          , pp.
          <fpage>21</fpage>
          -
          <lpage>28</lpage>
          , Vienna, Austria, (
          <year>August 2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Christoph</given-names>
            <surname>Zengler</surname>
          </string-name>
          and Wolfgang Küchlin, '
          <article-title>Encoding the Linux Kernel Configuration in Propositional Logic'</article-title>
          ,
          <source>in Proceedings of the 19th European Conference on Artificial Intelligence (ECAI</source>
          <year>2010</year>
          ) Workshop on Configuration, eds.,
          <source>Lothar Hotz and Alois Haselböck</source>
          , pp.
          <fpage>51</fpage>
          -
          <lpage>56</lpage>
          , (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>