<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Portfolio Management: How to Find Your Standard Variants</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Frank Dylla</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Thorsten Krebs</institution>
        </aff>
      </contrib-group>
      <abstract>
        <p>Product portfolio management is one of the most important tasks for companies to secure their future competitiveness. A crucial aspect for portfolio management decisions is the volume of products sold and the sales numbers development over time - one could say: What are your current or upcoming best selling products, often used as “standard products” in sales? Especially for these products it is worthwhile to take actions in reducing costs and improving revenue. Regarding discrete products the task is, simply said, looking for products with the highest quantities or profit sold or significant changes in these quantities over a certain period of time (business intelligence). In contrast, this approach does not work satisfactorily with complex multi-variant products. An aggregated view on products, i.e. ignoring the sales numbers of the variants with their individual features, does not give sufficient insights or may even lead to wrong decisions in portfolio management. The recurring combination of features across multiple types of products might be more important than the type of the product itself. In this paper we investigate differences in identifying potential standard products in comparison to identifying potential standard variants of products. Thereon we derive a high-level framework how standard variants may be deduced from a given set of variants described by characteristics and provide an algorithmic sketch and discuss resulting challenges from a pragmatic perspective.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        Portfolio management is a dynamic decision process evaluating,
prioritizing, reorganizing, cancelling, etc. products throughout their
lifecycle [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. As managers have to deal with uncertain and changing
information portfolio management is a complex task. One of the
major difficulties in product portfolio management is predicting what
the customers are willing to pay for. This includes knowing the
market, i.e. knowing the current customer demand and knowing how it
will most likely change in future. Thus, product portfolio
management is complex already when considering simple products, but gets
more complex when considering configurable and thus multi-variant
products.
      </p>
      <p>But what exactly is the challenging part of this task? Forecasts
are created in order to plan the supply chain and production
capacities. For simple products this is a rather straightforward task: one
can assign a sales forecast to the product identifiers, e.g. material
numbers, and use the bill of materials in order to get a list of
components that are required. For variant-rich products such as skateboards,
however this is not that easy. In general necessary components of a
skateboard2 are the deck, i.e. a plank (in general wooden, but not
necessarily), two trucks, i.e. spring mounted axles, and four wheels
with bearings. Optional components may be sliptape, paintings,
risers, shock pads, nose/tail guards, etc. Consider that not all truck types
fit to each deck and that not all wheel/truck combinations fit. An
individual composition of these components is sought by the customer
– leading to very few skateboards that are sold with the exact same
composition of deck, axes, wheels, and so on.</p>
      <p>From our experience, for new multi-variant products it is common
that product managers guess which variants will be the top selling
ones in future, i.e. the decision is based on their gut feeling.
Evaluation of the quality of their initial decision is barely feasible as only
standard BI techniques are available. These techniques are not
sufficient for portfolio planning of multi-variant products as they ignore
the structural information of the variants themselves. Standard
techniques typically analyze the list of sales over a certain period of time
and use product identifiers as the key to identify which one is sold the
most and predict how this will change in future. But for variant-rich
products that are sold in lot size 1 the product identifier cannot be
used as a key criterion. It is rather important to compare
characteristics and their values. For example, comparing the product ID, which
identifies an individual composition, does not identify that a lot of
skateboards use the same wheels. Thus we consider it is important to
use the configuration model - containing product data and rule sets
as an input for a new kind of algorithm that does not compare on the
level of product identifiers but on the level of a set of product
characteristics, which supports better predictions of top-selling variants,
i.e. what the market really is willing to pay for.</p>
      <p>In order to support the step of evaluating past sales in comparison
to original plannings on the level of characteristics and their values,
we introduce the notion of central representative and propose a
potential calculation thereof. We discriminate against the term
”standard product”, standard variants respectively, as this term describes
products which were actually built many times. As you will see later
a central representative does not need to have been built once. We are
convinced that central representatives will help to recognize changes
in client behavior – or the market in general – over time and whether
adaptions are reasonable in order to meet the goals of portfolio
management.</p>
      <p>We start with introducing our understanding of product
configuration, which is constraint-based, and introduce diverse variant spaces
for later use (Sec. 2.1). We consider definitions of discrete standard
and basic products (Sec. 2.2) and elaborate how this relates to
standards for multi-variant products (Sec. 2.3). We sketch our approach
in Section 3. To avoid misunderstandings with varying definitions of
‘standard’ we introduce the term central representative of a given
variant space described by characteristics (Sec. 3.1). In order to find
such a central representative a measure of dissimilarity needs to be
defined (Sec. 3.2). In Section 3.3 we exemplify how a representative
can be computed and how a deviation can be derived thereon. We
summarize our considerations in an algorithm sketch (Sec. 3.4). We
discuss our approach from various pragmatic perspectives (Section
4). First, we revisit the choice of the set of product vectors P for
which the central representative should be computed (Sec. 4.1).
Furthermore, in general data is not available in a well defined form in
reality, i.e. not all characteristics and values are defined in a consistent
manner (Sec. 4.2). Additionally, multi-variant products are subject to
change such that older products may falsify the results that should
reflect the current state (Sec. 4.3). Finally, we consider derivation of
parameters and further prerequisites necessary in order to apply the
algorithm presented to real data. (sec. 4.4).
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>Theoretical background</title>
    </sec>
    <sec id="sec-3">
      <title>What is product configuration?</title>
      <p>
        Felfernig et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] base their understanding of configuration on a
definition in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]: configuration is a special case of design activity where
the artifact being configured is assembled from instances of a fixed
set of well defined component types which can be composed
conforming to a set of constraints. A configuration task is the selection of the
components and their properties to get a valid combination of the
product components, the outcome is also called product variant [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ].
      </p>
      <p>As a result the component types span a space of potential
configurations which are further restricted by constraints, which limit the
possibilities of how components can be combined. Practice shows
that the restrictions may arise from technical feasibility, legal
requirements, product-design, or marketing purposes. In general,
components or properties of a product are described by characteristics in
formal product representations. There are additional notions to
describe properties of components like attributes or features. For
reasons of simplicity we will restrict to the term characteristics
throughout this paper. Based on this we can define a product characteristics
vector, product vector for short.</p>
      <p>Definition 1. Given a set of characteristics ki ∈ K with i ∈
{0; : : : ; N − 1} with values from domain Di ∈ D each, we define
[k0; k1; : : : ; kN−1] as the product (characteristics) vector p⃗.</p>
      <p>We note that N denotes the maximum number of possible
characteristics. Especially, if a characteristic is optional, a specific domain
value must be available defining that this characteristic is not chosen,
evaluated respectively. As combinations of domain values are not
restricted the product vector may reflect a product which is technically
not feasible.</p>
      <p>
        Naturally, a configuration task can be considered as a constraint
satisfaction problem (CSP), see e.g. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ].
      </p>
      <p>Definition 2. Constraint Satisfaction Problem (CSP): ⟨K; D; C⟩: A
CSP is defined as a set of variables ki ∈ K with i ∈ {0; : : : ; N − 1}
with values from domain Di ∈ D together with a set of constraints
cj ∈ C and j ∈ {0; : : : ; M − 1} defining which combinations of values
are allowed or not. A solution of a CSP is a consistent evaluation to
all variables (value assignment to all ki), i.e. no constraint is
violated. Otherwise the assignment is called inconsistent. Furthermore,
within an assignment values of ki do not need to be unique, i.e. that
ki may contain multiple valid values which can be considered as
alternatives. Given a solution with a unique value per ki, it is called
an atomic solution or according to variant management a variant.</p>
      <p>In more detail, if an assignment contains multiple values for one
or more characteristics ki it contains at least two different atomic
solutions. For example, given an assignment where a characteristic
contains a value, e.g. for deck A and deck B, this means that the
customer at some configuration front-end can still decide for either
deck A or B resulting in a valid assignment definitely. In the
remainder of this paper we will use ki synonymously for referring to
the characteristic itself as well as for its evaluation, i.e. value
assignment, as the meaning becomes clear from the context in most cases.
In case of ambiguities we clarify the meaning.</p>
      <p>In order to discuss notions of standard variant, we need to define
several solution spaces based on the CSP definition.</p>
      <p>Definition 3. Theoretical Configuration Space (S∅): This space
contains all combinations of characteristics which are possible from
a mereological perspective, i.e. from all minimalist configurations
to all maximum configurations containing all optional components,
but ignoring further constraints. In terms of CSP this is reflected by
⟨K; D; ∅⟩.</p>
      <p>Consider the skateboard example. The minimalist configurations
consist of a deck, two trucks, and four wheels as these components
are necessary to obtain a functional skateboard from a
mereological perspective. A configuration with two decks is not part of S∅,
whereas configurations with different truck or wheel sizes are part
of S∅, although this may make the skateboard unusable. Maximum
configurations consist of the above components plus all optional
components which can be installed in parallel. As risers and shock
pads are installed in the same place3, there is no maximum
configuration containing both components. As we are interested in valid
configurations in the end, we need to define a second configuration
space.</p>
      <p>Definition 4. Valid Configuration Space or variant space (S): This
space contains only valid configurations, i.e. configurations that
satisfy all constraints of the underlying configuration model. Therefore
it is given S ⊆ S∅. As valid configurations are also called variants,
we will talk of ’variant space’ in the remainder of this paper. The
variant space directly relates to the space of atomic solutions of a
CSP.</p>
      <p>Taking the skateboard example again, configurations with
different wheel sizes are not part of S, whereas configurations with
different wheel colors may be, depending whether such configurations are
permissible with reference to the configuration model.</p>
      <p>Other variant spaces may be defined on the ‘trading status’ of each
p⃗ contained, for example:
Definition 5. Offered variant space (SO) and sold variant space
(S$): SO is defined as the space of all variants which have been
quoted to customers. S$ contains only those variants which have
been sold.</p>
      <p>As in mass customization the variant space is rather large, in
general it can be assumed that not all variants were sold or offered.
Nevertheless, in the very extreme case all possible variants have been
offered and sold and thus S$ ⊆ SO ⊆ S. Based on the presented
definition of variant space, an arbitrary number of variant spaces based
on relevant criteria can be defined for investigation and comparison.
3 between trucks and deck
2.2</p>
    </sec>
    <sec id="sec-4">
      <title>Discrete standard and basic products</title>
      <p>In the context of discrete products a central term for entrepreneurial
considerations and decisions is standard product.</p>
      <p>According to the Lexico dictionary (Oxford)4 on a general level
a standard is (a) a certain quality or attainment level reached or (b)
something considered exemplary or as a measure or model according
to which others assess to (cf. benchmark, scale, guideline).</p>
      <p>Following information given by Wikipedia a technical standard is
an established norm or requirement in regard to technical systems.
It is usually a formal document that establishes uniform engineering
or technical criteria, methods, processes, and practices. In contrast,
a custom, convention, company product, corporate standard, and so
forth that becomes generally accepted and dominant is often called
a de facto standard.5</p>
      <p>Specifically considering discrete products a wide variety of
definitions is available which take different aspects into account. For
example, in the Gabler Wirtschaftslexikon standard product is
defined with a focus on quality: Products that have a generally agreed
(standardized) minimum quality. Product changes focus on
quantities, prices and times. Standard products can be traded on the stock
exchange.6 Other definitions base on the criteria whether they are
ready for batch production.7</p>
      <p>From our experience the term standard product is mainly used in
two different ways in manufacturing industry:
1) Either as a label of a product which should be presented as a
standard (defined before product is sold at all)
2) or as a product which is established on the basis of different
criteria e.g. it is sold the most within a given context, e.g. a region
or a specific type of customer.</p>
      <p>In order to dissolve this ambiguity we speak of a predefined standard
in case of 1) and a derived standard in case of 2).</p>
      <p>Furthermore, a basic product – also called generic product – is
defined to realize the core benefit of the product. This implies that
a basic product cannot be further reduced without losing the
possibility of intended product usage.8 In case of a skateboard this is the
ability to ride on such a board with pushing oneself forward by foot.
A basic product may not be saleable, e.g. due to legal restrictions. An
extended product is one which offers additional benefit to customers.
In the context of manufacturing companies ... a basic product might
be a rather simple good that experiences relatively consistent
consumer demand ....9 Sometimes a core product is differentiated from
the product: The core product of a book is information. It is not the
book itself.10 The book itself is then the basic product.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Multi-variant products and standard variants</title>
      <p>The term mass customization defines the challenge of anticipating
individualized products to be manufactured simultaneously with the
4
5</p>
      <p>www.lexico.com/en/definition/standard
2.8.2019)</p>
      <p>
        en.wikipedia.org/wiki/Technical_standard
2.5.2019)
6 wirtschaftslexikon.gabler.de/definition/
standardprodukte-42877 (retrieved 6.5.2019, in German)
7 e.g. www.lawinsider.com/dictionary/standard-products
(retrieved 2.5.2019)
8 wirtschaftslexikon.gabler.de/definition/
produkt-42902 (retrieved 2.5.2019, in German)
9 www.businessdictionary.com/definition/
basic-product.html (retrieved 8.5.2019)
10 www.marketing91.com/five-product-levels/ (retrieved
8.5.2019)
(retrieved
(retrieved
efficiency of mass production or as stated in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]: . . . is based on the
idea of the customer-individual production of highly variant
products under near mass production pricing conditions. In general, in
this context products are multi-variant, i.e. there is more than one
option available. One important question for variant management is
how the variants can be compared in a reasonable manner. Buchholz
states that all variants need to be considered with respect to their
product type and that relevant characteristics need to be selected for a
reasonable comparison [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. Buchholz also discusses the relationship
between variants and standard. It is critically scrutinised whether a
standard variant is the one with maximum quantity, some sort of
average or a yardstick for other variants. Nevertheless, it is specifically
emphasized that a standard variant is something special compared to
other variants. For comparison a measure of discrimination between
variants is necessary, but not all characteristics are important such
that relevant characteristics need to be selected. In our notation this
means, that the product vector K = [k0; k1; : : : ; kN−1] is abstracted
to a reduced product vector K′ ⊂ K with N ′ &lt; N .
      </p>
      <p>
        Buchholz also presents different views from literature whether
such a standard variant needs to be part of the variant space itself
or not. For example, according to Boysen a basic or standard product
may be a theoretical construct that has never been physically
manufactured [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. Whether it needs to be manufacturable at all remains
unclear. For further details we refer to [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>On the one hand, to define a standard variant based on aggregated
sales numbers over all variants of a variant space is unreasonable
from our perspective as it exactly ignores the possible differences of
the available variants. Such an approach could be rather considered
as a ’standard variant space’. On the other hand to only consider the
sales numbers of each variant individually bears problems as well, it
even may lead to wrong interpretations. In general, the exact same
variant is not sold more than ’a few times’. For example, consider
100 skateboards of 96 different variants sold. This means that most
variants were sold once and two may have been sold three times each.
This also means that the standard variants may change within a few
new sales. Therefore, from our perspective it would not be useful to
define these ”top selling” variants as standard variants.</p>
      <p>From the perspective of the product management and with the aim
of an efficient portfolio handling, it is also useful for multi-variant
products on the one hand to offer and place a standard variant in the
market and on the other hand to analyze which product variant is sold
most or is never sold at all.</p>
      <p>From our point of view the notion of a basic product can be
directly transferred to a basic variant: to cover the basic functionality
necessary characteristics must be set with corresponding values
reflecting a ”basic” quality. In case of a skateboard a deck, two trucks,
and four wheels each of rather low quality. In case only one
component (characteristic) is missing, it is no variant of a skateboard
anymore as it is non-functional. In addition top-level variants can be
given: variants with a maximum number of characteristics evaluated
with corresponding values reflecting a high level of quality, i.e. based
on the configuration model no further feature can be selected
without deselecting at least one other feature. In some cases, depending
on the context, it might appear that not more options are chosen in
case of a professional board compared to a basic one, but components
of better quality, e.g. the material types of the deck or the wheels. In
the end this must be reflected in the underlying metrics.</p>
      <p>In Figure 1 we depict relations between basic (bi), top-level (ti),
and ‘regular’ (pi) product variants. Furthermore, each variant may
also be computed or defined as a standard variant (marked with ☆).</p>
      <p>The level, i.e. the number of selected characteristics and ’rank’ of
corresponding values is reflected by height. The edges depict that
the variants differentiate in a single characteristic.11 Naturally, basic
variants are rather at the bottom and top-level variants at the top of
the figure. Nevertheless, it is possible to have feature combinations
that aren’t separable and so basic as well as top-level variants can
exist on different levels. But irrevocably basic variants must not have
another connected variant ‘below’ them, top-level variants ‘above’
respectively. All other variants inbetween have ‘smaller’
predecessors and ‘larger’ successors. Standard variants can be defined on any
of these levels. Consider our skateboard example. We define a basic
variant as standard skateboard for beginners, a mid-range skateboard
as a standard for trained half pipe skaters and a top-level variant as a
standard for skate competitions.
3</p>
    </sec>
    <sec id="sec-6">
      <title>Approach</title>
      <p>We believe that the availability of a standard variant in the sense of
an average product of the most selling variants is very helpful in
portfolio management. In order to prevent misunderstandings with other
definitions (see Sec. 2.2 and 2.3) we will talk of a central
representative of a variant space instead. One possibility to exploit the central
representative in portfolio management is to compare it with
predefined standards and adapt them accordingly. In order to discuss the
challenges in defining such a central representative in the context of
multi-variant products, we need to give some formal definitions
regarding configuration spaces (Section 3.1). We define a measure M
(Sec. 3.2) for computation of a central representative (Sec. 3.3). We
close this section with an algorithmic sketch, integrating definitions
from preceding subsections (Sec. 3.4).
3.1</p>
    </sec>
    <sec id="sec-7">
      <title>Definition of a central representative of a variant space</title>
      <p>In Section 2.1 we introduced the notion of a product (configuration)
vector p⃗, which holds all characteristics which define a certain
product. Let P = {p⃗0; p⃗1; : : : ; p⃗P −1} be a set of P product vectors. With
11 For reasons of simplicity we neglect that connected variants may differ in
more than one characteristic as they are inseparable due to the rule set.
S∅, S, SO and S$ (see Sec. 2.1) we already defined specific P,
i.e. sets where all p⃗j fulfill certain properties. As we are interested
in the ”best representative” of P we define a central representative
of P based on a measure of similarity or dissimilarity.</p>
      <p>Definition 6. Central representative r⃗P and deviation ⃗P : r⃗P is the
product vector of a product space P which has the overall minimal
dissimilarity to all p⃗j ∈ P considering a measure M. Furthermore,
we define the deviation ⃗P to be the vector of the individual
deviations i of assigned values per characteristic ki (see Figure 2).</p>
      <p>Simplified, one could say r⃗P is the average product of P regarding
the measure M or more specific, the one that minimizes the
dissimilarity to all pi ∈ P. The deviations i can be defined in multiple ways.
We detail this in Section 3.3. We note that, based on this definition, it
is not necessary, that r⃗P ∈ P. Furthermore, as several solutions may
have the same aggregated distance regarding pi ∈ P based on M,
there may be no unique central representative r⃗P . We sketch how a
measure M can be defined below.</p>
      <p>0
r
⃗P
1</p>
      <p>
        P
M could be either a measure of similarity or dissimilarity. Although
M can be defined arbitrarily, e.g. based on ∑, ∏, min, max or some
complex aggregation function, we stick to a specific distance based
measure, and thus dissimilarity, for reasons of simplicity. For future
research a promising link is given by case-based reasoning (CBR) as
the notion of similarity is central to this approach [9, 16, e.g.].
Nevertheless, although CBR has been applied to product configuration,
to our knowledge specific product similarities have not been
extensively investigated in the literature; exceptions are [
        <xref ref-type="bibr" rid="ref12 ref20 ref21">12, 21, 20</xref>
        ].
Aspects of similarity have been studied in the context of CSP [7, 5, e.g.]
resulting in the need of Euclidian distance measures from a practical
perspective. In the following of this section we summarize aspects of
similarity measures relevant to our approach.
      </p>
      <p>A Euclidian distance measure for some entities o, p and q is
reflexive: (p; p) = 0, symmetric: (p; q) = (q; p), and transitive:
(o; q) ≤ (o; p) + (p; q). For reasons of simplicity, we will talk of
distance in the remainder of this paper.</p>
      <p>In order to define a distance measure M consider a variant space,
e.g. S, and a subset thereof, e.g. S$ (S$ ⊆ S). This implies that p⃗ ∈ S
and q⃗ ∈ S$ contain the same characteristics kx with x ∈ {0; ⋯; N −1}
in the same order. First, we need a distance between values from the
same characteristic x for all x ∈ {0; ⋯; N − 1}, for example:
x(kxp; kxq) = Skx − kxS
p q
(1)
with kxp denoting the value of the x-th characteristic of product
vector p⃗, kxq of q⃗ respectively. Depending on the type of scale of
the characteristic (i.e. nominal, ordinal, interval or ratio scale)
certain calculations may not be possible, e.g. subtraction or addition on
nominal scale is not reasonable. On nominal scale only the equality
between values can be determined, i.e. are two values the same or
not. If a level of similarity is required at least an ordinal scale for the
values must be available, i.e. a linear order for the values for the
definition of a median. For interval or ratio scale a mean can be defined.</p>
      <p>This results in a distance vector of distances per characteristic
0(k0p; k0q)
⃗(p⃗; q⃗) = ⎢⎢⎢⎡⎢⎣⎢ N−1(kNp⋮−1; kNq−1) ⎥⎥⎥⎤⎥⎥⎦ = ⎢⎢⎡⎢⎢⎢⎣dNd⋮0−1 ⎥⎥⎥⎤⎥⎥⎦ (2)
The next step is to aggregate these individual distances into a
single distance value describing the distance between two product
vectors. It needs to be reflected that not all characteristics are equally
important. Therefore a weighting factor wi needs to be integrated
for each characteristic. If characteristic ki should not be considered,
the corresponding wi needs to be set to zero. Furthermore, not all
distances for individual characteristics may have the same range and
thus, one characteristic may dominate others, therefore a
normalizing factor vi is necessary. For example, consider a distance vector
with N = 3 where d0 represents a binary distance (d0 ∈ {0; 1}),
d1 represents a distance between zero and five (d1 ∈ [0; 5]), and d2
represents a distance between zero and thousand (d2 ∈ [0; 1000]).
In most cases d2 would dominate or overrule d1, which in turn also
dominates d0. Therefore, it is import that all value ranges of the ki
are normalized, e.g. to values between zero and one. This results in a
distance between two product vectors p⃗ and q⃗.</p>
      <p>1 N−1
(p⃗; q⃗) = N iQ0 wividi
=
(3)</p>
      <p>We give a schematic impression of a distance between two
product vectors r⃗S and r⃗S$ in Figure 3. Nevertheless, it still remains
open how central representatives like r⃗S and r⃗S$ can be determined
based on .</p>
      <p>r
⃗S</p>
      <p>S
r⃗S$ S
$
We defined the central representative r⃗P as a variant which minimizes
the overall dissimilarity (cf. Definition 6). Furthermore, it is not a
requirement that r⃗P is itself an element of P. Consider these two
definitions of central representatives of S$.</p>
      <p>r⃗S$ = argmin Q
r⃗∈S$ i=0</p>
      <p>N−1
(r⃗; p⃗i) with p⃗i ∈ S
$
(4)
r⃗S$ = argmin Q
r⃗∈S i=0
(r⃗; p⃗i) with p⃗i ∈ S
$
In the first case (Eq. 4) r⃗ has been sold itself as r⃗ ∈ S$, whereas
in the second case (Eq. 5) r⃗ is a general technically feasible variant
(r⃗ ∈ S). One could even relax that the representative not even needs
to be technically feasible, and thus select r⃗ ∈ S∅ (cf. 2.3).</p>
      <p>In conjunction with the central representative it is also of interest
’how large’ or ’how widespread’ the set is, which it represents. For
this we need a notion of deviation, diameter, or variance. For now,
we stick with the notion of average deviation per characteristic ( i)
for all p⃗j ∈ P as it suffices our needs.</p>
      <p>1 P −1
i = P j=0</p>
      <p>Q
i(ri; kij )
(5)
(6)
Then ⃗ = [ 0; : : : ; N−1] denotes a vector of all deviations per
characteristic.</p>
      <p>It is not beneficial if a central representative covers a ’too wide
range’ of variants, i.e. one or several i are rather high for some
characteristics ki, as it would not give much help for portfolio
optimization, especially if members of the set of product vectors are
not distributed uniformly. Consider the case depicted in Figure 4.
Products were sold in two rather distant regions of the variant space.
Considering them as one set would lead to a representative which
does not reflect the situation at hand (orange space). We need to look
for separate subsets, i.e. clusters, instead, to come to a result depicted
by the two separate regions S0$ and S1$ (light blue). As we have
defined a central representative and a deviation thereof, various cluster
analysis methods are applicable, e.g. centroid-based or density based
clustering. For an overview of existing clustering methods we refer
to [13, 23, 17, e.g.]. The adequate selection of a clustering method
will be a crucial task for the successful application of the approach
proposed.</p>
      <p>For pragmatic reasons we restrict our considerations to clustering
parameters (assuming a clustering method given) to maximum
deviation per characteristic and a minimum number of members per
cluster. Therefore, a vector of thresholds ⃗ = [ 0; : : : ; N−1] for the
corresponding characteristics ki and # for the minimum number
needs to be given.</p>
      <p>r⃗S1$ S1
$
r
⃗S
r⃗S$</p>
      <p>S
$
r⃗S0$</p>
      <p>S</p>
      <p>$
S0
We summarize the parts of how to find adequate representatives for
a given set of product vectors P (e.g. sold variants S$) out of
another given set of product vectors Q (e.g. the overall variant space
S) in Algorithm 1. In the beginning only the single cluster P exists
for which central representative r⃗P and deviation ⃗P is calculated.
If there is any deviation i which is above its defined threshold i P
needs to be splitted in two clusters.12 In the next iteration at least two
clusters need to be considered. At some point clusters with only few
members are computed (&lt; #). We ignore these clusters from
further consideration in this iteration. We continue with increasing the
number of clusters until we obtain a set of clusters with each
containing a central representative with each deviation per characteristic
below the given threshold (∀i i ≤ i). We note, that we increase the
number of clusters iteratively and start the cluster splitting from the
original set P on purpose. If not doing so the order of considering
pi ∈ P might have an effect and thus, would lead to different results
if pi are represented in a different order.</p>
      <p>Input: P, Q, ⃗, M, #
Result: S ∶= set of central representatives for P out of Q
no of clusters ∶= 1 ;
S = {P};
R ∶= calculate list of representatives from Q for all sj ∈ S based
on M;
∶= calculate list of all deviations for corresponding rj and sj
based on M;
while ∃i; j with ij ∈ &gt; i for any sj ∈ S do
no of clusters ∶= no of clusters +1 ;
S ∶= clusterSplitting(P; M; no of clusters);
delete all sj ∈ S from S where Ssj S &lt; #;
R ∶= calculate list of representatives from Q for all sj ∈ S
based on M;
∶= calculate list of all deviations for corresponding rj and
sj based on M;
end
Algorithm 1: Algorithmic sketch for deducing central
representatives out of the variant space Q based on the variants given by the
variant space P.
4</p>
    </sec>
    <sec id="sec-8">
      <title>Pragmatic considerations</title>
      <p>Not all characteristics of product vectors must be considered as
relevant information might be covered by other characteristics (Sec. 4.1).
In general, data provided by companies needs some preparation as
this data is often not consistent concerning characteristics’ and
values’ denomination (Sec. 4.2). We consider temporal restriction of
data and how observations over time can be derived (Sec. 4.3).
Before Algorithm 1 can be applied value ordering and weighting factors
for each characteristic must be available 4.4.
4.1</p>
    </sec>
    <sec id="sec-9">
      <title>Contentual evaluation</title>
      <p>
        In order to support a business question a contentual focus on data is
necessary. Simplified, two levels of contentual constraints can be
differentiated. First, the context of each variant (p⃗i) can be considered.
Context can be defined on different perspectives, e.g. in which shop
or region the variant has been generated, by whom, whether it has
been sold, only offered, or never even offered (cf. S, SO, S$ in Sec.
2.1), or for which application, domain respectively, it was bought if
this information is available. Second, the relevance of each
characteristic should be checked as consideration of all characteristics may
block the view on relevant information, for example, the color of the
trucks or some non-visible strings on some component. Chizi and
12 How this is actually done depends on the clustering algorithm chosen.
Maimon state that a focus on relevant characteristics has several
advantages [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. For example, removal of irrelevant characteristics
improves efficiency as well results are more conclusive and easier to
interpret due to the focus on key features. Nevertheless, a too
limited choice of characteristics leads to information loss and reduces
the quality of the results. For further information on feature selection
methods we refer to [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]. If a characteristic is considered irrelevant
for an evaluation at hand wi (cf. Eq. 3) should be set to zero in the
calculations. For all characteristics with wi &gt; 0 the relative relevance
needs to considered very carefully as slight changes may lead to
significant changes in the classification of the data. For example, if the
results are designed for adapting standard products a slight change in
the parameters might lead to a different variant.
4.2
      </p>
    </sec>
    <sec id="sec-10">
      <title>Data preparation</title>
      <p>
        Practice shows that within companies often master data is not
coordinated. In general, this leads to multiple characteristics
containing the same information, potentially represented differently, e.g.
using different text strings, numbers, or different units. As products
are subject to permanent change, the inconsistency of data increases
over time. In order to ease and automatize analysis in the long run,
data synchronization is inevitable. Nevertheless, considering given
data, data cleansing is essential to prevent bad decisions based on
bad analysis results [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Maletic described the data preparation as
a multistep procedure comprising (1) definition of error types, (2)
finding instances of these errors, and (3) correction of them [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. He
emphasizes that each of these steps is a complex task in itself.
      </p>
      <p>
        To give an idea of the effort that needs to be taken, we present a
non-exhaustive list of different error types in (master) data below. A
common error type is conditioned by different notions or
representations, i.e. characteristics and values holding the same information,
but represented with different spellings. These errors often arise from
inconsistent usage of blanks, hyphens, prefixes, suffixes or
abbreviations. Different units may also be used, e.g. due to different intended
usage. Characteristics holding complex information, i.e. connected
information, are problematic as well as further processing might be
limited. A common example is a combined string representation of
length, width, and height (sometimes without a given unit) instead of
having individual numerical characteristics for each of them. A tricky
type of errors comprises misleading value specifications, e.g. frame
sizes termed with numerical values which have to be interpreted in
a specific manner so that naive calculation is not possible. Consider
frame sizes 5, 8, and 12 which reflect three consecutive frame sizes.
The physical difference in size cannot be calculated from these
values, instead other data like length, width, and height of certain
components need to be considered. Furthermore, the conceptual distance
cannot be calculated from these ’values’: as the categories are
consecutive the distance is 1 and not 3 and 4. In order to prevent
trimming of leading zeros, such terms may be even stored as strings.
Elimination of errors of this type requires very specific semantic
knowledge, which makes it not only hard to spot these errors, but
also to correct them. For further information on data cleansing and
data quality we refer to [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ].
      </p>
      <p>As a result of data preparation we get a set P of product vectors
p⃗i with consistent [k0; k1; : : : ; kN−1], i.e. with comparable
information stored in the same characteristic with the same value for every
product variant.
4.3</p>
    </sec>
    <sec id="sec-11">
      <title>Temporal evaluation</title>
      <p>Products are subject to permanent change. They are designed,
developed, sold, and refined, potentially several times. Such refinements
and changes in expectations of the market may result in changes of
central representatives. Therefore, regardless whether from technical
or sales perspective, it is not reasonable to consider outdated data,
which leads to the application of methods from time series analysis.
Furthermore, as sales numbers for the products of interest may vary
significantly over time, consideration of single time points (or rather
small time intervals only) may show varying results for each of these
time points.</p>
      <p>
        One applicable method in order to generate smoothed results is
the sliding window approach (SWA), see for example [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. The
basic idea is to evaluate overlapping intervals, so called windows, to
get smoother and more consistent results. We depict relevant
parameters for the SWA in Figure 5. Let d denote the overall period under
review (one year in the given example). The window size is denoted
by w (three month) with w ≪ d and the corresponding step size by
s (1 month) with s ≤ w. Analysis is then performed for data in each
window separately.
      </p>
      <p>The choice of specific values for d, w and s is very crucial and
must be considered carefully, especially if conclusions on future
developments are drawn. For example, if d is chosen too small the
corresponding data set may be too small to generate significant
results. Statistical or learning methods support a reasonable choice,
[19, e.g.].</p>
      <p>Algorithm 1 can be extended in such a way that not only a
single time point is considered (P), but subsequent sets, i.e. subsequent
windows. On this basis developments of the central representatives
and their corresponding deviations can be observed: how they
’wonder around’ and how the number of clusters increases or decreases.</p>
      <p>d
jan feb mar apr may jun jul aug sep oct nov dec
w
s</p>
      <p>w</p>
    </sec>
    <sec id="sec-12">
      <title>Weighting factors and value ordering</title>
      <p>The approach is significantly based on the definition of the measure
M containing the distances and , which in turn contains
weighting factors wi for each characteristic. First experiments have shown
that distance measures on nominal data very much influences the
results significantly as the distance can be only either zero or one. A
rather low weighting factor for these characteristics compared to the
other ones may be a solution, but must be evaluated further in future.
For now we tend to ignore these characteristics as dissimilarity is in
most cases reflected in other characteristics as well. Our gut feeling,
but without proof, tells us that similar effects may be the case for
integrating ordinal scale data with interval and ratio scale data. For
interval and ratio scale data naturally a distance is given –
assuming the characteristic is not misinterpreted as such and is ’only’ on
ordinal scale (cf. Sec. 4.2). For ordinal data this is not the case, a
linear ordering has to be defined manually. Although, an ordering of
terms like ”basic”, ”advanced”, ”expert”, and ”professional” might
be considered trivial in the first place, it is a tricky, currently
manual and time consuming task and thus, also error prone. Looking at
the terms ”expert” and ”professional” the question is whether
”expert” is before or after ”professional” or equal in the end as they
relate to completely different aspects of the product. It may be
possible that a reasonable distance between terms like ”basic” and
”advanced” is definable, i.e. how far is ”basic” from ”advanced”,
”advanced” from ”expert” and so forth. We refrain from this as the
resulting costs would not be in a reasonable cost-benefit relation for
an industrial company. For a start an equidistant conceptual distance
measure should suffice, i.e. all preceding and succeeding terms in a
linear order have the same distance.</p>
      <p>In business intelligence it is common to not only consider the
number of sold units, but also profit or the number of sold units per quote
is part of the analysis for example. On the one hand a pragmatic way
without changing the algorithm is to modify the original set by
reducing or multiplying the number of equal product vectors in P. On
the other hand an additional weighting factor per pi could be
introduced, which would be much more efficient regarding run-time of
the algorithm.
5</p>
    </sec>
    <sec id="sec-13">
      <title>Summary and Outlook</title>
      <p>To support portfolio management for multi-variant products we
examined definitions of ’standard’ for discrete and multi-variant
products. To differentiate from these definitions we introduced the term
central representative of a variant space. We derived an algorithmic
sketch based on a measure M to calculate representatives for clusters
with reasonable size. Finally, we discussed tasks necessary before the
algorithm can be applied to real data.</p>
      <p>As the work on central representatives for a variant space is in an
early stage many tasks and questions remain open. The
straightforward next step is to experiment with large scale real data instead of
few small toy examples. Furthermore, the determination of weighting
factors wi is a challenging task. We need to investigate to what
extent learning methods, either supervised or unsupervised, may ease
the task. Once real data is available it will be a worthwhile task to
reconsider alternative definitions of distance functions, e.g.
investigating the impacts of choosing ∏, min, max or some other function
as aggregation operators. In theory it is possible that multiple central
representatives are available. If this case also appears with real data,
we need to investigate how to deal with it.</p>
    </sec>
    <sec id="sec-14">
      <title>Acknowledgement</title>
      <p>We thank the anonymous reviewers for critically reading the
manuscript and providing helpful comments for clarification and
improvement of the manuscript.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Nils</given-names>
            <surname>Boysen</surname>
          </string-name>
          , Variantenfließfertigung, volume
          <volume>49</volume>
          ,
          <string-name>
            <surname>Deutscher</surname>
            <given-names>Universita</given-names>
          </string-name>
          ¨tsverlag,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>M.</given-names>
            <surname>Buchholz</surname>
          </string-name>
          , Theorie der Variantenvielfalt:
          <article-title>Ein produktions- und absatzwirtschaftliches Erkla¨rungsmodell</article-title>
          , SpringerLink : Bu¨cher, Gabler Verlag,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Barak</given-names>
            <surname>Chizi</surname>
          </string-name>
          and Oded Maimon, '
          <article-title>Dimension reduction and feature selection', in Data Mining and Knowledge Discovery Handbook</article-title>
          , 2nd ed., eds.,
          <source>Oded Maimon and Lior Rokach</source>
          ,
          <fpage>83</fpage>
          -
          <lpage>100</lpage>
          , Springer, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Bjørn</given-names>
            <surname>Christensen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thomas D.</given-names>
            <surname>Brunoe</surname>
          </string-name>
          , '
          <article-title>Product configuration in the eto and capital goods industry: A literature review and challenges'</article-title>
          ,
          <source>in Customization 4</source>
          .0, eds.,
          <string-name>
            <surname>Stephan</surname>
            <given-names>Hankammer</given-names>
          </string-name>
          , Kjeld Nielsen, Frank T. Piller, Gu¨nther Schuh, and Ning Wang, pp.
          <fpage>423</fpage>
          -
          <lpage>438</lpage>
          , Cham, (
          <year>2018</year>
          ). Springer International Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Jean-Franc¸ois Condotta</surname>
          </string-name>
          , Souhila Kaci, Pierre Marquis, and Nicolas Schwind, '
          <article-title>A syntactical approach to qualitative constraint networks merging', in Logic for Programming</article-title>
          ,
          <source>Artificial Intelligence, and Reasoning - 17th International Conference, LPAR-17</source>
          , Yogyakarta, Indonesia,
          <source>October 10-15</source>
          ,
          <year>2010</year>
          . Proceedings, eds., Christian G.
          <article-title>Fermu¨ller and Andrei Voronkov</article-title>
          , volume
          <volume>6397</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>233</fpage>
          -
          <lpage>247</lpage>
          . Springer, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Robert</given-names>
            <surname>Cooper</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Scott</given-names>
            <surname>Edgett</surname>
          </string-name>
          , and Elko Kleinschmidt, '
          <article-title>Portfolio management - fundamental to new product success'</article-title>
          ,
          <source>The PDMA Toolbook for New Product Development, (01</source>
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Frank</given-names>
            <surname>Dylla</surname>
          </string-name>
          , Jan Oliver Wallgru¨n, and Jasper van de Ven, '
          <article-title>Merging qualitative information: Rationality and complexity'</article-title>
          ,
          <source>in QUAC2015: Workshop on Qualitative Spatial and Temporal Reasoning: Computational Complexity and Algorithms</source>
          , (
          <year>September 2015</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Felfernig</surname>
          </string-name>
          , Lothar Hotz, Claire Bagley, and Juha Tiihonen,
          <source>Knowledge-based Configuration:</source>
          From Research to Business Cases, Morgan Kaufmann Publishers Inc., San Francisco, CA, USA,
          <volume>1</volume>
          <fpage>edn</fpage>
          .,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Gavin</given-names>
            <surname>Finnie</surname>
          </string-name>
          and Zhaohao Sun, '
          <article-title>Similarity and metrics in case-based reasoning'</article-title>
          ,
          <source>Information Technology papers</source>
          ,
          <volume>17</volume>
          , (03
          <year>2002</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Yupeng</surname>
            <given-names>Hu</given-names>
          </string-name>
          , Cun Ji, Ming Jing, Yiming Ding, Shuo Kuai, and
          <string-name>
            <given-names>Xueqing</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>'A continuous segmentation algorithm for streaming time series'</article-title>
          , in Collaborate Computing: Networking, Applications and Worksharing - 12th International Conference, CollaborateCom
          <year>2016</year>
          , Beijing, China,
          <source>November 10-11</source>
          ,
          <year>2016</year>
          , Proceedings, eds.,
          <source>Shangguang Wang and Ao Zhou</source>
          , volume
          <volume>201</volume>
          <source>of Lecture Notes of the Institute for Computer Sciences, Social Informatics and Telecommunications Engineering</source>
          , pp.
          <fpage>140</fpage>
          -
          <lpage>151</lpage>
          . Springer, (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Jonathan</surname>
            <given-names>I.</given-names>
          </string-name>
          <article-title>Maletic and Andrian Marcus, Data Cleansing: A Prelude to Knowledge Discovery</article-title>
          ,
          <fpage>19</fpage>
          -
          <lpage>32</lpage>
          ,
          <string-name>
            <surname>Springer</surname>
            <given-names>US</given-names>
          </string-name>
          ,
          <volume>07</volume>
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Hiroya</surname>
            <given-names>Inakoshi</given-names>
          </string-name>
          , Seishi Okamoto, Yuiko Ohta, and Nobuhiro Yugami, '
          <article-title>Effective decision support for product configuration by using CBR'</article-title>
          ,
          <source>in International Conference on Case-Based Reasoning</source>
          , (
          <volume>01</volume>
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Leonard</given-names>
            <surname>Kaufman and Peter J. Rousseeuw</surname>
          </string-name>
          ,
          <article-title>Finding Groups in Data: An Introduction to Cluster Analysis</article-title>
          , John Wiley &amp; Sons,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Lukasz</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Kurgan</surname>
          </string-name>
          and Petr Musilek, '
          <article-title>A survey of knowledge discovery and data mining process models'</article-title>
          ,
          <source>Knowl. Eng. Rev.</source>
          ,
          <volume>21</volume>
          (
          <issue>1</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          , (
          <year>March 2006</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Ohbyung</surname>
            <given-names>Kwon</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Namyeon</given-names>
            <surname>Lee</surname>
          </string-name>
          , and Bongsik Shin, '
          <article-title>Data quality management, data usage experience and acquisition intention of big data analytics'</article-title>
          ,
          <source>International Journal of Information Management</source>
          ,
          <volume>34</volume>
          (
          <issue>3</issue>
          ),
          <fpage>387</fpage>
          -
          <lpage>394</lpage>
          , (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Michael</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Richter</surname>
          </string-name>
          and
          <string-name>
            <surname>Rosina O. Weber</surname>
          </string-name>
          ,
          <source>Case-Based Reasoning - A Textbook</source>
          , Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Lior</surname>
            <given-names>Rokach</given-names>
          </string-name>
          , '
          <article-title>A survey of clustering algorithms', in Data Mining and Knowledge Discovery Handbook</article-title>
          , 2nd ed., eds.,
          <source>Oded Maimon and Lior Rokach</source>
          ,
          <fpage>269</fpage>
          -
          <lpage>298</lpage>
          , Springer, (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>D</given-names>
            <surname>Sabin</surname>
          </string-name>
          and
          <string-name>
            <given-names>R</given-names>
            <surname>Weigel</surname>
          </string-name>
          , '
          <article-title>Product configuration frameworks-a survey'</article-title>
          ,
          <source>Intelligent Systems and their Applications</source>
          , IEEE,
          <volume>13</volume>
          ,
          <fpage>42</fpage>
          -
          <lpage>49</lpage>
          , (08
          <year>1998</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>Hela</given-names>
            <surname>Sfar</surname>
          </string-name>
          and Amel Bouzeghoub, '
          <article-title>Dynamic streaming sensor data segmentation for smart environment applications'</article-title>
          ,
          <source>in Neural Information Processing - 25th International Conference, ICONIP</source>
          <year>2018</year>
          ,
          <string-name>
            <given-names>Siem</given-names>
            <surname>Reap</surname>
          </string-name>
          , Cambodia,
          <source>December 13-16</source>
          ,
          <year>2018</year>
          , Proceedings, Part VI, eds., Long Cheng, Andrew
          <string-name>
            <surname>Chi-Sing Leung</surname>
          </string-name>
          , and Seiichi Ozawa, volume
          <volume>11306</volume>
          of Lecture Notes in Computer Science, pp.
          <fpage>67</fpage>
          -
          <lpage>77</lpage>
          . Springer, (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <surname>Sara</surname>
            <given-names>Shafiee</given-names>
          </string-name>
          , Katrin Kristjansdottir, and Lars Hvam, '
          <article-title>Automatic identification of similarities across products to improve the configuration process in eto companies'</article-title>
          ,
          <source>International Journal of Industrial Engineering and Management</source>
          ,
          <volume>8</volume>
          (
          <issue>3</issue>
          ),
          <fpage>167</fpage>
          -
          <lpage>176</lpage>
          , (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Hwai-En</surname>
            <given-names>Tseng</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chien-Chen Chang</surname>
          </string-name>
          , and
          <string-name>
            <surname>Shu-Hsuan</surname>
            <given-names>Chang</given-names>
          </string-name>
          , '
          <article-title>Applying case-based reasoning for product configuration in mass customization environments'</article-title>
          ,
          <source>Expert Syst. Appl.</source>
          ,
          <volume>29</volume>
          (
          <issue>4</issue>
          ),
          <fpage>913</fpage>
          -
          <lpage>925</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <surname>Cen</surname>
            <given-names>Wan</given-names>
          </string-name>
          ,
          <article-title>Hierarchical Feature Selection for Knowledge Discovery</article-title>
          ,
          <source>Advanced Information and Knowledge Processing</source>
          , Springer International Publishing,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Rui</given-names>
            <surname>Xu and Donald C. Wunsch</surname>
          </string-name>
          <string-name>
            <surname>II</surname>
          </string-name>
          , '
          <article-title>Survey of clustering algorithms'</article-title>
          ,
          <source>IEEE Trans. Neural Networks</source>
          ,
          <volume>16</volume>
          (
          <issue>3</issue>
          ),
          <fpage>645</fpage>
          -
          <lpage>678</lpage>
          , (
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Marcus</surname>
            <given-names>Zwirner</given-names>
          </string-name>
          , '
          <article-title>Datenbereinigung zielgerichtet eingesetzt zur permanenten Datenqualita¨tssteigerung'</article-title>
          , in Daten- und Informationsqualita¨
          <article-title>t: Auf dem Weg zur Information Excellence</article-title>
          , chapter
          <volume>6</volume>
          ,
          <fpage>101</fpage>
          -
          <lpage>120</lpage>
          , Springer Fachmedien Wiesbaden, (
          <volume>06</volume>
          <year>2018</year>
          ).
          <article-title>(in German)</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>