<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Complexity Indicators and Their Impact on Algorithm Performance in Automotive Part Selection</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Daniel Bischof</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Tobias Nerz</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kaan Ekiz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Aalen University - Engineering</institution>
          ,
          <addr-line>Business and Health, Beethovenstraße 1, 73430 Aalen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Mercedes-Benz AG</institution>
          ,
          <addr-line>Leibnizstraße 6/1, 71032 Böblingen</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2026</year>
      </pub-date>
      <abstract>
        <p>This paper presents a comprehensive study on complexity indicators and relative scaling behavior of an algorithm designed for visualizing and changing part selection data in the automotive industry. The algorithm transforms Boolean formulas into multi-way decision diagrams within the context of a tool called POSEIDON. The case study is based on real industrial configuration data from a leading German automotive manufacturer, highlighting the challenges posed by high variant diversity and logical interdependencies. To characterize the structural and distributional properties of the dataset, various complexity indicators are introduced and computed, including Shannon entropy and average path lengths in idealized decision diagrams. These metrics are then used to evaluate the behavior and efectiveness of the POSEIDON algorithm in generating compact, interpretable, and scalable decision diagrams. The results show that the average path length in the POSEIDON-generated diagrams scales linearly with entropy-based measures, confirming the adequacy of the algorithm in an environment where we've observed rising complexity year over year.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Complexity Indicators</kwd>
        <kwd>Algorithm Performance</kwd>
        <kwd>Automotive Part Selection</kwd>
        <kwd>Variant Configuration</kwd>
        <kwd>Configuration Rules</kwd>
        <kwd>Industrial Data</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Product configuration, particularly part selection, is a critical task in the automotive industry, based
on vast and intricate datasets derived from manufacturing and engineering processes. The inherent
complexity of such industrial data poses significant challenges for configuration systems. Understanding
and quantifying this complexity is essential not only for benchmarking existing tools but also for
improving their performance. Furthermore, evaluating how algorithms behave under realistic levels of
complexity is crucial for ensuring their practical applicability.</p>
      <p>
        Building on prior published work by the author(s) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], this paper evaluates an algorithm that
transforms Boolean formulas, used to encode feature-to-part relationships in automotive product lines, into
multi-way decision diagrams. Unlike traditional binary decision diagrams (BDDs) [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], these structures
consist of decision nodes representing feature choices with multiple discrete options (plus an explicit
’otherwise’ branch encompassing remaining choices) and utilize the actual selected parts as terminal
nodes, directly reflecting the configuration outcome. A key aspect distinguishing this approach is its
application context: the decision diagrams, generated within the POSEIDON tool, serve as the primary
interactive visualisation. Data engineers use them to create, analyse, and modify configuration logic,
with a strong focus on comprehensibility and correctness by construction. This paper presents an
in-depth case study utilizing authentic industrial data from this automotive context. Our primary
contributions are twofold: first, we propose and measure complexity indicators designed to capture
characteristics of industrial Bill of Materials, focusing on the combinatorial solution space described by
their variation points. Second, we evaluate the performance of the decision diagram generation
algorithm on our data, primarily focusing on the compactness of the resulting diagrammatic representation.
For brevity, we will henceforth refer to this simply as the algorithm.
      </p>
      <p>
        While various techniques exist for representing and visualising product configuration logic—including
tabular formats, feature models, Karnaugh maps, and tree-based structures [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]—this study discusses
multi-way decision diagrams.
      </p>
      <p>
        Previous research has addressed the measurement of complexity in product configuration systems
from diferent perspectives. Ghosh et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] analyse the cognitive complexity of configurators using a
UML-based metric, but their approach does not operate on the level of parts or Bills of Materials and
does not compare visualisation techniques with objective complexity indicators. Similarly, Schmidt et
al. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] develop key performance indicators for managing variant diversity in complex product families,
focusing on economic portfolio management rather than analysing objective complexity metrics or
variant logic visualisation. Other approaches, such as Modrak and Bednar [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and Herrera-Vidal et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
apply entropy-based measures to quantify uncertainty and complexity but primarily address assembly
or process-level dynamics rather than configuration rules at the Bill-of-Materials level. Additionally,
Modrak and Soltysova [8] demonstrate that the mere number of possible variants is insuficient to
capture true system complexity and propose information-theoretic measures to better reflect underlying
dependencies.
      </p>
      <p>Despite these contributions, no study has systematically connected quantitative complexity metrics
with visual representations of configuration logic. This paper combines measures such as Shannon
entropy and Hufman tree path lengths with the size of decision diagrams to investigate how
configuration complexity afects the compactness of decision diagrams. To the best of our knowledge, no prior
study has empirically investigated this relationship between data complexity and the visualisation of
configuration logic.</p>
      <sec id="sec-1-1">
        <title>1.1. Case Study Context and Data Source</title>
        <p>The study is based on industrial data sourced from the product data management systems of a major
automotive manufacturer, specifically focusing on part selection rules. The nature of this data reflects
the challenges encountered in large-scale industrial configuration environments.</p>
        <p>The data set, its scope and source, as well as the data processing, are described in Sect. 3.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Foundations</title>
      <p>This section details the case study setup, i.e. the complexity indicators chosen to characterise the data,
and the performance metrics used for the assessment of Poseidon.</p>
      <sec id="sec-2-1">
        <title>2.1. Variation Points in the Bill of Materials</title>
        <p>
          In product line engineering, a variation point is a bundle of alternative parts with corresponding
selection criteria, often modelled simply as a list of formulas. The variation points analysed by the
algorithm are documented in the so-called 150% bill of materials and are influenced by the product
overview—also commonly referred to as the high-level configuration or feature model [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. Each variation
point in the bill of materials consists of at least one variant (i.e. physical part) and while there might
also be several variants in one variation point, for each configured car, exactly one variant should be
chosen out of each variation point. To ensure this, for every variant there is an item selection rule —
based on the car configuration codes — and dedicated tools, that check, if documentation rules are
satisfied. Detailed information on these tools is published in [ 9]. In Fig. 1 an example for a variation
point containing three variants is visualised in POSEIDON.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Configuration rules</title>
        <p>
          Configuration rules are documented in the product overview. There, properties (or features) of the cars
— called codes — and consistency rules are defined. Following those rules, only certain combinations of
codes are allowed to be configured. The resulting set of all possible car configurations is called variation
space. In [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] we call this context and model it as a Boolean formula  encoding the conjunction of all
configuration constraints. In the example in Fig. 1 we assume that the context consists of only one rule,
namely ¬( ∧  ∧ ), leading to the −  label generated on one edge.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Algorithm Description</title>
        <p>
          The core algorithm evaluated in this study transforms the input Boolean formulas into a multi-way
decision diagram representation, as previously detailed in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. The key characteristics relevant to this
evaluation are:
• Transformation: It systematically converts the propositional logic of the Boolean formulas into a
directed acyclic graph (DAG).
• Multi-way Decision Nodes: Internal nodes represent decision points (corresponding to product
feature alternatives). Each node can have multiple outgoing edges, representing the selection
of specific alternatives for a given feature, plus a dedicated "otherwise" edge capturing the case
where none of the explicitly listed options apply.
• Part Terminals: Terminal nodes (leaves) of the diagram directly represent the specific automotive
part(s) selected when the conditions along the path from the root node are met.
• Application: The resulting diagram is the central data structure used in a visualiser/editor tool
designed for configuration data engineers.
• Completeness: The diagram guarantees that every possible configuration path leads to a terminal
node, ensuring that the representation is exhaustive and no valid variant is omitted.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Mathematical preliminaries</title>
        <p>To make an evaluation of how eficient the graphical illustrations of POSEIDON are, we started with
entropy, which is a well known concept in information theory when it comes to measuring the
uncertainty of information [10]. We continued to calculate other complexity metrics; the three metrics
used in this study are defined below.</p>
        <p>For all calculations in this study consider a finite set
 = {1, ..., }
with
| | = ,
(1)
which in this study represents the set of all valid vehicle configurations under consideration. The
assignment of vehicle configurations to a specific variant is modelled by a disjointed partition  of  ,
i.e. a set of sets (bundles). These bundles are technically component equivalence classes, as each bundle
 contains all configurations that lead to the same installed part.</p>
        <p>= {1, ..., }
with
| | = 
such that the following properties hold:
⋃︀</p>
        <p>=1  = 
 ∩  = ∅,
if  ̸= 
Visualisations of partitions can be found in Fig. 2 and Fig. 3.
(2)
(3)
(4)</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Complexity Indicators</title>
        <sec id="sec-2-5-1">
          <title>2.5.1. Model Counting</title>
          <p>An important factor for our calculations is the number of satisfying assignments (called models) for the
Boolean formula, or relevant sub-formulas corresponding to alternative selection bundles. It reflects the
total number of configurations which are valid as defined by the context. In our setting, this corresponds
to the number of valid vehicle configurations represented by the set  . The variables (codes) used
to describe the configurations are only those present in the item selection rules. Thus we project the
context to them. This greatly reduces the efective model count by discarding all irrelevant dimensions
for telling configurations apart.</p>
          <p>In the context of our modelling, we denote:
| | = Number of valid configurations
(5)
Remark 2.1. Note that for the context of this work the variant space is projected to the variables occurring
in a single variation point. For instance, in the example in Fig. 1, only the variables ,  and  are taken
into consideration. Other variables might exist, but are ignored in the context of this variation point. Thus,
in this example we have | | = 23 − 1, as one possible combination of variables is excluded by context.
Remark 2.2. This metric provides a direct measure of the solution space size and indicates how many
valid feature combinations exist. It serves as the base set for calculating derived measures such as the
relative frequencies of variants in entropy-based evaluations.</p>
        </sec>
        <sec id="sec-2-5-2">
          <title>2.5.2. Shannon Entropy</title>
          <p>We use Shannon entropy [10], a well-established metric from information theory, to quantitatively
describe the complexity of a given variation point. It captures the uncertainty in the distribution of
part variants across all valid vehicle configurations. A high entropy value indicates a balanced usage of
many variants, while low entropy suggests the presence of dominant, frequently used options. This
allows for direct comparisons between positions or even across diferent model series.</p>
          <p>For some partition  , Shannon Entropy  ( ) is defined as:
 ( ) =
∑︁  · log2
∈
Definition 2.3. Let  1 and  2 be two partitions of the same set  . We say that  1 is more complex in
terms of variant distribution than  2 if
i.e., if the Shannon entropy of  1 is strictly greater than that of  2.</p>
          <p>Remark 2.4. The Shannon entropy has the following property:</p>
          <p>( 1) &gt;  ( 2),
0 ≤  ( ) ≤ log2()
(6)
(7)
Example 2.5. To get an intuition for this complexity measure, take a look at the examples defined in Fig. 2
and Fig. 3:
(8)
(9)
· 2 (7) +</p>
          <p>· 2</p>
        </sec>
        <sec id="sec-2-5-3">
          <title>2.5.3. Hufman Trees</title>
          <p>In addition, we consider the average path length of a Hufman tree constructed using the relative
frequencies of the part variants [11]. This length reflects the average number of binary decisions
required to uniquely identify a variant. Thus, the Hufman tree simulates an optimal binary decision
tree and serves as a theoretical benchmark for evaluating actual representations, such as those used
in tools like POSEIDON. Constructing a Hufman tree can be done by implementing the following
algorithm for some partition  = {1, ..., }:
1. Create a working list of nodes for each  with value ||.
2. Look for two nodes  and  with smallest value, i.e. || ≥ | | and || ≥ |  | for all  ̸= , .
3. Remove the nodes  and  from the list and add a node which is their (new) parent with the sum
of the children’s values as its value.
4. Repeat until one node is left, this is the root of the resulting Hufman tree.</p>
          <p>Looking at this algorithm in terms of building a decision diagram, we combine the two least common
endpoints in a sub tree and add up their relative frequencies. This is repeated, until the whole decision
diagram is built. By evaluating this Hufman tree, we get a binary encoding — consisting of all
yes-nodecisions that have to be taken to get to the regarding endpoint — for every subset  of the partition  .
We denote the length of this binary code as .</p>
          <p>With this, we can analyse the average path length of the Hufman trees
1 ∑︁ ,
as well as the weighted average path lengths</p>
          <p>∑︁  · , where  is the relative frequency at which a path is chosen.</p>
          <p>=1</p>
          <p>This indicators reflect how many binary decisions are needed on average to uniquely identify a
variant, treating the diferent parts equally or taking into account their relative frequencies respectively.</p>
        </sec>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Performance of POSEIDON</title>
        <p>To evaluate the efectiveness of the algorithm and the comprehensibility of the resulting graph,
several performance metrics are applied. These refer both to the structure of the generated decision
diagram—particularly with regard to its suitability as a compact and interpretable representation—and
to characteristics of the underlying binary structure used during its construction. In addition, the
Pearson correlation coeficient is used to analyse relationships between selected complexity metrics.</p>
        <sec id="sec-2-6-1">
          <title>2.6.1. Average Path Length in Multi-valued Decision Diagrams (MDD)</title>
          <p>The average path length in a multi-valued decision diagram (MDD) quantifies how many decision steps
are required on average to reach a terminal node from the root node. A terminal node corresponds to a
specific part variant.</p>
          <p>Let  denote the number of valid paths in the MDD that lead to a concrete output. For each such path
 (with  = 1, . . . , ), let  be the number of decisions made along that path, where each decision
may involve selecting one of multiple alternatives.</p>
          <p>Then, the average path length MDD is computed as:
(10)
(11)
MDD = 1 ∑︁</p>
          <p>This metric reflects the compactness and interpretability of the diagram structure: shorter paths
imply fewer decision steps to reach a result. Accordingly, a lower MDD indicates a more concise and
potentially more comprehensible configuration logic.</p>
        </sec>
        <sec id="sec-2-6-2">
          <title>2.6.2. Binary Average Path Length in Binary Decision Diagrams (BDD)</title>
          <p>The binary average path length quantifies the expected number of binary decisions required to reach
a terminal node from the root node in a binary decision diagram (BDD). In contrast to the average
path length in an MDD—which may count multi-valued decisions as single steps—this metric assumes
that all variation points are represented through binary decisions. That is, each possible alternative is
encoded as a separate yes/no condition that must be evaluated along the path.</p>
          <p>Let  be the number of valid paths in the BDD that lead to product variants. For each path  (with
 = 1, . . . , ), let  denote the number of binary decisions made along that path.</p>
          <p>Then, the binary average path length BDD is computed as:
BDD = 1 ∑︁ 
Remark 2.6. Comparing the binary average path length to the average path length of the multi-way
decision diagram generated by POSEIDON allows us to assess how eficiently the algorithm reduces decisions
required.</p>
        </sec>
        <sec id="sec-2-6-3">
          <title>2.6.3. Pearson Correlation</title>
          <p>To quantitatively assess the linear relationship between two complexity metrics, we apply the Pearson
correlation coeficient  [12]. This coeficient indicates whether and to what extent two metric variables
are linearly correlated. A positive value suggests a direct relationship, a negative value an inverse
relationship, and a value close to 0 indicates no linear correlation.</p>
          <p>For two variables  and  with  observations,  is defined as follows:
 =
∑︀=1( − ¯)( − ¯)
( − 1) ·  · 
(12)
Where:
• , : individual observations of the variables  and 
• ¯, ¯: arithmetic means of  and 
• , : empirical standard deviations of  and 
Remark 2.7. The Pearson coeficient is a dimensionless value ranging from − 1 to 1. Values close to |1|
indicate a strong linear relationship, while values near 0 suggest a weak or no linear correlation.
Remark 2.8. In this study, the coeficient is used to examine the extent to which structural metrics, such
as average path length, align with information-based metrics. Each observation  and  represents the
value of a specific complexity metric (e.g., entropy, path length, model count) measured at an individual
BOM position. The results of these analyses are visualized in the correlation matrices shown in Fig. 5.
While alternative complexity measures such as Kolmogorov complexity or spectral entropy were also
considered, they were found to be impractical for this specific use case due to either their computational
intractability or their limited interpretability in the context of real-world configuration logic.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Data Set — Engineering Bill of Materials</title>
      <sec id="sec-3-1">
        <title>3.1. Data Source and Scope</title>
        <p>This analysis is based on real-world data from the product configuration system of a major automotive
manufacturer. The central data source is the so-called 150% Bill of Materials (BOM), a maximal BOM
containing all potentially installable components and assemblies across all available product variants.
In this work, we focus on the engineering stage, where the 150% BOM is represented as the Engineering
Bill of Materials (E-BOM), which serves as the foundation for the systematic analysis of variant logic at
the position level. Following Eigner et al. [13, p. 272], the E-BOM denotes the bill of materials during
the engineering stage and is often referred to as a generic BOM [14].</p>
        <p>In addition, we incorporate configuration rules from the context. These define which combinations
of feature characteristics lead to valid — i.e., buildable — vehicle configurations. Based on this data, it
is possible to determine for each variant how often each part is used, denoted as valid configurations.
This distribution forms the basis for relative frequency calculations used in the complexity metrics.</p>
        <p>The data under investigation pertains to specific model series. A model series describes a technical
base model of a vehicle that is being produced in various variants—for example, with diferent engine
types, equipment levels, or drivetrains. The selected model series have been produced since 2023 and
2024 respectively. As such, the underlying dataset is derived from real-world production data and is
well-suited for in-depth analysis. Note, however, that real-world data is always evolving, e.g. our dataset
might include currently unused future part variants etc. We address this further in Sect. 3.4.</p>
        <p>For the subsequent analysis, we distinguish between two datasets, each representing a separate model
series. These will be referred to as Model Series A and Model Series B throughout the remainder of this
study.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Structure and Representation of Variant Logic</title>
        <p>Each BOM position is linked to a set of code rules, expressed in the form of Boolean formulas, which
define the conditions under which a particular component is installed. These rules form the central
logical structure for variant management.</p>
        <p>The internal tool POSEIDON is used for visualising these code rules. It transforms the Boolean
formulas into decision diagrams represented as directed graphs. This form of representation enables a
structured understanding of rule complexity and also supports the validation of rule sets with regard to
uniqueness, completeness, and satisfiability [15].</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Analytical Objective</title>
        <p>The objective of this analysis is to quantitatively assess and evaluate the complexity of variant logic at
selected BOM positions. For this purpose, established metrics such as Shannon entropy and the average
path length in the corresponding Hufman tree are employed. These indicators were discussed in detail
in Sect. 2.5 and enable a standardized assessment of variant diversity across diferent positions.</p>
        <p>Based on those general measures of complexity, we aim to evaluate the performance of the POSEIDON
algorithm by correlating the results obtained using POSEIDON with the underlying data complexity.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Data Cleaning</title>
        <p>Analysis shows that some decision diagrams contain paths that end in leaf nodes without an associated
component. These incomplete endpoints may result from various causes—for instance, a component
may not be required for certain feature combinations, a part may not yet be maintained in the system,
or inconsistencies may exist in the rule set.</p>
        <p>To accurately capture the true structure of the variant logic, these paths are explicitly included in the
calculation of the complexity metrics. In other words, whenever the data implies no part selection, we
treat this as intentional and make that option explicit in the data before doing our calculations.</p>
        <p>Furthermore, there are part selection rules that are logically contradictory or unsatisfiable within the
context. These parts were completely ignored in the analysis, as they have a model count of zero and
thus no influence on the output data. The item selection rule in such a case is semantically equivalent
to false with regard to the valid vehicle configurations.</p>
        <p>In addition, all outdated positions for which no feasible part remained were removed from the dataset.</p>
        <p>This approach allows for a comprehensive assessment that considers not only the defined variants
but also potential rule gaps or modelling issues.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Encoding and Visualisation of Variation Points</title>
      <p>This section discusses encoding options for variation points such as truth tables, KV diagrams, Venn
diagrams, and decision diagrams, while highlighting the advantages and disadvantages of these respective
options, especially with regard to integrability of context information.</p>
      <p>While various representations for Boolean configuration logic exist—such as tabular formats,
Karnaugh maps, and Venn diagrams—this study focuses on multi-way decision diagrams for encoding
variation points.</p>
      <p>All these methods can be used to determine which of the variant(s) in a given variation point can be
chosen for a given Boolean configuration. However, in the use case of the variation point visualisation,
a considerable part of the theoretical variant space is not relevant a priori, because it is not buildable
under consideration of the product overview. A key consideration for the applicability of an encoding
is how efectively it conceals the non-feasible variant space.</p>
      <p>In the following paragraphs, the methods for presenting Boolean logic, the necessary modifications
for displaying mappings, and the relevant literature are discussed.</p>
      <sec id="sec-4-1">
        <title>4.1. The Role of Visualisation in Data Quality and Related Work</title>
        <p>Data visualisation can play a key role in data quality assurance by providing an intuitive and interactive
way to represent and analyse data. By visualising the data, it becomes easier for data workers to identify
patterns, trends, and anomalies in the data, which can help identify errors, inconsistencies, or other
issues that can afect the quality of the data. Visualisation can also help to highlight the relationships
and dependencies between diferent data elements, making it easier to understand the context in which
the data is used and how it may be afected by changes.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Visualisation Option: Formulas</title>
        <p>Encoding variation points using Boolean formulas is a common industrial practice [16]. For instance,
the BOM uses Boolean formulas as item selection rules. Some companies require formulas to match
certain normalizing criteria, whereas others allow any syntactically correct formula.</p>
        <p>Formulas have the advantage of acceptable scaling, both in the case of many involved variables, as
well as in the case of complex inner relations. Though in the worst-case-scenario a Boolean formula
will double in size with the introduction of a new variable, in the average case it will grow significantly
less. However Formulas have the disadvantage of being hard to evaluate mentally. In addition, the
known normal forms for Boolean formulas are either not canonical, hard to comprehend for humans,
or too large.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Visualisation Option: Truth- &amp; Decision Tables</title>
        <p>A common approach to visualise variability data are various forms of tables. The naive idea is to list all
possible feature combinations, i.e. a truth table.</p>
        <p>Each row of the table represents a possible combination of values for a set of Boolean inputs, and the
corresponding column indicates whether the combination satisfies a given Boolean expression (output).
Truth tables can be a useful tool for analysing and manipulating Boolean data, as they provide a clear
representation of the possible combinations of values and the relationships between them.</p>
        <p>However, truth tables can become unwieldy as the number of Boolean variables increases. As
the number of variables of the expressions increases, the number of rows in the truth table grows
exponentially, making it dificult to understand and manipulate the data.</p>
        <p>To limit the size of the table, combinations that do not satisfy a validity constraint (in our case, the
product overview) can be hidden.</p>
        <p>A similar table standard is used by Tidstam et al. [17] for assigning part variants to feature
combinations in a configuration context. In Table 2, taken from [ 17], the red cells represent invalid feature
selections.</p>
        <p>One way of improving the scaling of the table is that after successful assignment, several vehicle
properties/variables can often be bundled into mutually exclusive families instead of as a separate
column, as in a truth value table, which enables the consolidation of many Boolean columns to
multivalue columns. For instance, Table 3 doesn’t show dedicated Boolean columns for Gasoline and Diesel as
in the case of Turbo, Sport and City, but instead the multi-value column Fuel with the exclusive variants
gasoline and diesel.</p>
        <p>A further reduction is possible if only slightly diferent vehicle variants are assigned to the same
component. For instance, the last two lines in Table 3, that only difer in their value in the city column,
show the same component. For these kind of constellations, some table representations use don’t-care
symbols. In this case, a line Volume=1.2, Turbo=no, Sport=no, City=*, Fuel=gasoline, Item(s)=E12 could
have subsumed the last two lines and thus replace them.</p>
        <p>However, this approach doesn’t terminate quickly and yields its own optimisation issue, addressed
by the Quine-McCluskey algorithm [18][19].</p>
        <p>A table, that has been optimised that way, eventually contains only prime implicants for each item and
in general there are multiple table rows required. Due to the potential overlap of the prime implicants,
the use of all prime implicants is only necessary in the worst case.</p>
        <p>In terms of human readability, the table has the considerable advantage that it is understood intuitively
and that there are good tools for editing tables. In contrast, the scaling properties of tables are no longer
suficient for many practical data constellations.</p>
        <p>For these reasons, truth tables may not be well-suited for visualising complex Boolean data for
data workers. The scaling properties make them undesirable or outright impossible to apply in some
real-world scenarios. In such cases, other visualisation methods, such as decision diagrams or graphical
models, may be more efective for helping data workers to understand and analyse the data.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Visualisation Option: Decision Diagrams</title>
        <p>Decision Diagrams have been selected as the visualisation method for POSEIDON. Our algorithm
utilizes Binary Decision Diagrams (BDDs) to transform data into multiple decision diagrams (MDDs).</p>
        <p>By leveraging BDDs, we obtain a data structure that meets numerous requirements. The exclusion
of unbuildable spaces is straightforward, achieved by removing the corresponding subgraphs.
Configurable variable ordering, along with ordering heuristics to minimize graph size, provides customizable
perspectives on the data. Reduced decision diagrams enhance eficiency by reusing equivalent subtrees,
resulting in more concise representations</p>
        <p>In [20] the authors encode vehicle configuration at Renault into a Constraint Satisfaction Problem
(CSP). Whereas Amilhastre et al. [21] encode context validity in a CSP, compile it into an automaton and
represent the automaton graphically. The automaton accepts valid feature selections, but, in contrast to
our approach, does not compute which part selection follows.</p>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. Other Visualization Options</title>
        <p>Karnaugh Maps [22], also known as KV diagrams, are a graphical representation of Boolean expressions
used to simplify and minimise the expressions. They are based on the idea of grouping adjacent terms
in a truth table that represent the same logical function. KV diagrams appear to be unsuitable for the
visualisation of variation points, since they cannot simply hide a cell because the relative positions of
cells to each other are of vital importance, and their meaning is defined via their geometric position.</p>
        <p>Similar issues can be observed for Venn Diagrams. They also become notoriously dificult to work
with, even for a small number of variables.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental Results and Evaluation</title>
      <sec id="sec-5-1">
        <title>5.1. Quantifying BOM Complexity</title>
        <p>In addition to the number of valid configurations (Model Count), the evaluation incorporates indicators
of distributional characteristics of the solution space. These indicators include entropy-based measures
and average path lengths derived from Hufman Trees, Multi-valued Decision Diagrams (MDDs), and
Binary Decision Diagrams (BDDs).</p>
        <p>For each metric, key descriptive statistics were calculated, including minimum, maximum, arithmetic
mean, median, and standard deviation. The results are detailed in Table 4 and Table 5.
Model Count. The mean is 18.34 (A) and 17.39 (B), while the median is considerably lower at 5.00
and 6.00, respectively. This indicates a right-skewed distribution—most items have few valid variants,
while a few show high combinatorial complexity, as also reflected by the high standard deviations (74.66
and 49.64).</p>
        <p>Shannon Entropy. With a mean of 0.96 in both datasets, this suggests low combinatorial
uncertainty—i.e., in most cases, only a few equally probable variants exist.</p>
        <sec id="sec-5-1-1">
          <title>Metric</title>
          <p>Model Count
Shannon Entropy
Hufman Weighted Avg. Path
Hufman Avg. Path
MDD Avg. Path
BDD Avg. Path</p>
        </sec>
        <sec id="sec-5-1-2">
          <title>Metric</title>
          <p>Model Count
Shannon Entropy
Hufman Weighted Avg. Path
Hufman Avg. Path
MDD Avg. Path
BDD Avg. Path</p>
          <p>Min</p>
          <p>Hufman Tree Weighted Avg. Path length. Mean values of 2.03 (A) and 2.04 (B) indicate a more
diferentiated distribution, where rare but valid configurations contribute significantly to the total
information content.</p>
          <p>Hufman Tree Avg. Path length. Values of 2.19 (A) and 2.23 (B) suggest that, on average, about two
binary decisions are required to uniquely identify a specific variant. Note: This is under the assumtion
that such binary dimensions already exist and their interrelationship with other variables is modelled
elsewhere. This is why the Hufman Tree represents a lower bound for real world BDDs.
MDD Avg. Path Length. The mean is 2.93 (A) and 3.06 (B). This metric reflects the average number
of decision steps required to reach a terminal node in the multi-valued decision diagram generated by
Poseidon and thus characterizes the structural depth of the rule logic.</p>
          <p>BDD Avg. Path Length. The mean is 5.39 (A) and 6.00 (B). A purely binary encoding requires
roughly twice as many steps as the MDD, highlighting the structural compactness of the multi-valued
representation.</p>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Evaluating MDD generation with regards to BOM Complexity</title>
        <p>This section aims to investigate to what extent the decision diagrams generated by POSEIDON reflect
the underlying variant structure. As reference measures, both information-theoretic and structural
metrics are considered, including Shannon entropy and (weighted) Hufman Tree average path lengths.</p>
        <sec id="sec-5-2-1">
          <title>5.2.1. Correlation Analysis of Complexity Metrics</title>
          <p>To evaluate the relationships between the selected complexity metrics, the Pearson correlation coeficient
 was calculated (cf.Sect. 2.5). The correlation matrices in Fig. 5 display the dependencies among
entropybased measures, the (weighted) average path length in the Hufman Tree, and the path lengths in decision
diagrams based on MDDs and BDDs.</p>
          <p>MDD Structure The average path lengths in the MDDs also show a positive correlation with
entropybased measures, albeit to a lesser extent. For Model Series A, correlation values range between  = 0.65
and 0.70, and for Model Series B, between  = 0.60 and 0.66. These correlations indicate that the
structure of the decision diagrams reflects key aspects of the information-theoretic complexity of the
configuration logic. This suggests a non-exponential scaling behaviour of the MDD algorithm with
respect to increasing variant diversity — even though the underlying algorithm does not explicitly
optimize for entropy.</p>
          <p>Comparison with BDDs In contrast, the BDDs (Binary Decision Diagrams) exhibit significantly
lower correlation with entropy-based metrics. In particular, the value for Shannon entropy in Model
Series B drops to  = 0.47, indicating that BDDs are less sensitive to the actual distribution of
configuration options. This reflects the binary nature of BDDs, where path lengths are primarily determined
by the number of Boolean decisions rather than actual occurrence probabilities.</p>
          <p>Model Comparison The comparison of the two model series reveals overall consistent correlation
patterns. In Model Series A, the correlations are slightly stronger throughout, which may point to a
more homogeneous variant structure or a lower degree of extreme configuration cases.
Additional observations Unsurprisingly, in both model series, a very high correlation is observed
between Hufman Tree weighted Avg. Path length and Hufman Tree path lengths, as well as BDD and
MDD path lengths. This is due to the fact that both Hufman Tree Path lengths are obtained from the
same tree, while the MDDs are derived from the BDDs.</p>
          <p>Conclusion Overall, the findings suggest that the structure of the MDD scales linearly with
increasing variant complexity. This supports the conclusion that the logical complexity embedded in the
configuration data is represented in a comprehensible and traceable manner.</p>
        </sec>
        <sec id="sec-5-2-2">
          <title>5.2.2. MDD compared to Complexity Metrics</title>
          <p>In Fig. 6, we investigate how structural and information-theoretic complexity metrics relate to the
average path length in the MDD representation, based on all configuration positions.</p>
          <p>Across all three metrics, a clear and approximately linear correlation is observed: Positions with
greater entropy or more uneven variant distributions tend to produce deeper MDD structures. While
the correlation strength difers slightly between metrics, the overall pattern indicates that the MDD
depth increases in line with the underlying distributional complexity.</p>
          <p>These results suggest that the MDD structure captures distributional complexity linearly, indicating
that the MDD generation will continue to scale with increasing complexity.</p>
        </sec>
        <sec id="sec-5-2-3">
          <title>5.2.3. Comparison betweeen MDD and BDD Average Path Lengths</title>
          <p>In Fig. 7 we present the quotient between the Avg. Path Lengths of the BDD and the MDD for each
variation point in Model Series A (top) and B (bottom) are visualised in relation to the Shannon Entropy.
It appears that, independent of the underlying entropy, the MDD generation reduces the path length of
the BDD by a factor of about 1.7.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Implications for Practical Applicability</title>
        <p>The results presented in this chapter demonstrate that the algorithm used to transform configuration
logic into decision diagrams performs robustly even as variant complexity increases. Despite increasing
entropy and growing structural depth, the generated diagrams remain compact and interpretable.</p>
        <p>This is particularly important for integration into the POSEIDON tool: since the generated structures
are directly edited and reviewed by data engineers, a clear and concise representation is essential for
maintainability, error prevention, and eficiency in handling complex product logic. The algorithm’s
ability to suppress non-configurable regions of the solution space while clearly structuring relevant
branches ofers a distinct advantage over alternative visualisation techniques.</p>
        <p>Overall, the analysis shows that the measured complexity indicators are not merely theoretical
constructs but have tangible implications for the usability and scalability of the tool in real-world
application scenarios.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>This paper investigated the relationship between complexity indicators and the scalability of the
POSEIDON algorithm for visualizing and manipulating automotive configuration data. This study
demonstrated that the generation of POSEIDON’s multi-way decision diagrams (MDDs) exhibit a linear
scaling behavior with increasing variant complexity, as measured by Shannon entropy and Hufman
tree path lengths. Compared to binary decision diagrams (BDDs), POSEIDON ofers a distinct advantage
by suppressing non-configurable regions of the solution space and providing a more compact and
interpretable representation of the configuration logic, as demonstrated by the lower average path
lengths observed in Tables 4 and 5.</p>
      <p>The findings, derived from real-world configuration data, have significant implications for the practical
applicability of POSEIDON. The linear scaling of MDD complexity ensures that the generated diagrams
remain compact and interpretable even as variant diversity and logical interdependencies increase. This
is crucial for data engineers who directly edit and review these structures, as it supports maintainability,
error prevention, and eficient handling of complex product logic. The study confirms that the measured
complexity indicators are not merely theoretical constructs but have tangible implications for the
usability and scalability of the tool in real-world application scenarios.</p>
      <p>This work contributes a comprehensive evaluation, combining theoretical indicators with empirical
results, to demonstrate POSEIDON’s efectiveness in managing the rising complexity observed in
automotive product configuration.</p>
      <p>While the evaluation focused on automotive data, future research should investigate POSEIDON’s
applicability to non-automotive domains and more heterogeneous data sources, for instance by analysing
codebases and rule structures from other industries.</p>
      <p>Additionally, further research could explore optimizing the MDD generation algorithm, especially
for product lines with extremely high variant diversity. Another promising direction is to investigate
alternative visualisation algorithms that maintain linear scalability while achieving a lower growth rate
of the visualisation, enabling even more compact and eficient diagram representations.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <p>During the preparation of this work, the authors used an internal GenAI tool (based on GPT-4o) and
Thesify for grammar, spelling checks, and literature proposals. After using this tool/service, the authors
reviewed and edited the content as needed and take full responsibility for the publication’s content.
plexity in manufacturing: Integrating entropic methods, programming and simulation, Entropy
27 (2025) 50. doi:10.3390/e27010050.
[8] V. Modrak, Z. Soltysova, Assessment of product variety complexity, Entropy 25 (2023) 119.</p>
      <p>doi:10.3390/e25010119.
[9] C. Sinz, A. Kaiser, W. Küchlin, Formal methods for the validation of automotive product
configuration data, Ai Edam 17 (2003) 75–97.
[10] C. E. Shannon, A mathematical theory of communication, The Bell System Technical Journal 27
(1948) 379–423. doi:10.1002/j.1538-7305.1948.tb01338.x.
[11] D. A. Hufman, A method for the construction of minimum-redundancy codes, Proceedings of the</p>
      <p>IRE 40 (1952) 1098–1101.
[12] K. Pearson, Mathematical contributions to the theory of evolution. iii. regression, heredity, and
panmixia, Philosophical Transactions of the Royal Society of London. Series A 187 (1896) 253–318.
doi:10.1098/rsta.1896.0007.
[13] M. Eigner, D. Roubanov, R. Zafirov (Eds.), Modellbasierte virtuelle Produktentwicklung, 1 ed.,</p>
      <p>Springer Vieweg Berlin, Heidelberg, 2014. doi:10.1007/978-3-662-43816-9.
[14] H. Hegge, J. Wortmann, Generic bill-of-material: a new product model, International Journal of
Production Economics 23 (1991) 117–128. doi:https://doi.org/10.1016/0925-5273(91)
90055-X.
[15] D. Bischof, W. Küchlin, Adapting binary decision diagrams for visualizing product configuration
data, in: INFORMATIK 2017, Gesellschaft für Informatik, Bonn, 2017, pp. 1499–1509. doi:10.
18420/in2017_149.
[16] A. Voronov, A. Tidstam, K. Åkesson, J. Malmqvist, Verification of item usage rules in product
configuration, in: IFIP International Conference on Product Lifecycle Management, Springer, 2012,
pp. 182–191.
[17] A. Tidstam, L.-O. Bligård, F. Ekstedt, A. Voronov, K. Åkesson, J. Malmqvist, Development of
industrial visualization tools for validation of vehicle configuration rules, in: Proceedings of the
Tools and Methods of Competitive Engineering (TMCE), Organizing Committee of TMCE 2012,
Karlsruhe, Germany, 2012, pp. 305–318.
[18] W. V. Quine, The problem of simplifying truth functions, The American mathematical monthly 59
(1952) 521–531.
[19] E. J. McCluskey, Minimization of boolean functions, The Bell System Technical Journal 35 (1956)
1417–1444.
[20] J. Astesana, L. Cosserat, H. Fargier, Constraint-based modeling and exploitation of a vehicle range
at renault’s: new requests for the csp formalism, in: International Conference on Tools with
Artificial Intelligence (ICTAI), IEEE Computer Society, Arras, France, 2010, pp. 68–75. doi: 10.
1109/ICTAI.2010.19.
[21] J. Amilhastre, H. Fargier, P. Marquis, Consistency restoration and explanations in dynamic
csps—application to configuration, Artificial Intelligence 135 (2002) 199–234.
[22] M. Karnaugh, The map method for synthesis of combinational logic circuits, Transactions of
the American Institute of Electrical Engineers, Part I: Communication and Electronics 72 (1953)
593–599.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Bischof</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Küchlin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Kopp</surname>
          </string-name>
          ,
          <article-title>Poseidon: A graphical editor for item selection rules within feature combination rule contexts</article-title>
          , in: F.
          <string-name>
            <surname>Noël</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          <string-name>
            <surname>Nyfenegger</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          <string-name>
            <surname>Rivest</surname>
            ,
            <given-names>A</given-names>
          </string-name>
          . Bouras (Eds.),
          <source>Product Lifecycle Management. PLM in Transition Times: The Place of Humans and Transformative Technologies</source>
          , volume
          <volume>667</volume>
          <source>of IFIP Advances in Information and Communication Technology</source>
          , Springer Nature Switzerland, Cham,
          <year>2023</year>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>14</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -25182-
          <issue>5</issue>
          _
          <fpage>1</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Akers</surname>
          </string-name>
          ,
          <article-title>Binary decision diagrams</article-title>
          ,
          <source>IEEE Transactions on computers 100</source>
          (
          <year>1978</year>
          )
          <fpage>509</fpage>
          -
          <lpage>516</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>S.</given-names>
            <surname>Shafiee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kristjansdottir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hvam</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Felfernig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Myrodia</surname>
          </string-name>
          ,
          <article-title>Analysis of visual representation techniques for product configuration systems in industrial companies</article-title>
          , in: 2016 IEEE International Conference on Industrial Engineering and Engineering Management (IEEM), IEEE,
          <year>2016</year>
          , pp.
          <fpage>793</fpage>
          -
          <lpage>797</lpage>
          . doi:
          <volume>10</volume>
          .1109/ieem.
          <year>2016</year>
          .
          <volume>7797985</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>A.</given-names>
            <surname>Ghosh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Kristjandottir</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Hvam</surname>
          </string-name>
          , E. Vareilles,
          <article-title>Measuring the complexity of product configuration systems</article-title>
          ,
          <source>in: Proceedings of the 20th International Configuration Workshop</source>
          , volume
          <volume>2220</volume>
          , CEUR Workshop Proceedings,
          <year>2018</year>
          , pp.
          <fpage>61</fpage>
          -
          <lpage>68</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schmidt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Schwöbel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lienkamp</surname>
          </string-name>
          ,
          <article-title>Developing key performance indicators for variant management of complex product families</article-title>
          ,
          <source>in: Proceedings of NordDesign</source>
          <year>2018</year>
          , Technical University of Munich, Linköping, Sweden,
          <year>2018</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>V.</given-names>
            <surname>Modrak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Bednar</surname>
          </string-name>
          ,
          <article-title>Entropy based versus combinatorial product configuration complexity in mass customized manufacturing</article-title>
          ,
          <source>Procedia CIRP 41</source>
          (
          <year>2016</year>
          )
          <fpage>183</fpage>
          -
          <lpage>188</lpage>
          . doi:
          <volume>10</volume>
          .1016/j.procir.
          <year>2015</year>
          .
          <volume>12</volume>
          .100.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Herrera-Vidal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. R.</given-names>
            <surname>Coronado-Hernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I.</given-names>
            <surname>Derpich</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Paredes</surname>
          </string-name>
          , G. Gatica, Measuring com-
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>