=Paper= {{Paper |id=Vol-2220/13_CONFWS18_paper_29 |storemode=property |title=Quasi-finite Domains: Dealing with the Infinite in Mass Customization |pdfUrl=https://ceur-ws.org/Vol-2220/13_CONFWS18_paper_29.pdf |volume=Vol-2220 |authors=Albert Haag |dblpUrl=https://dblp.org/rec/conf/confws/Haag18 }} ==Quasi-finite Domains: Dealing with the Infinite in Mass Customization== https://ceur-ws.org/Vol-2220/13_CONFWS18_paper_29.pdf
    Quasi-Finite Domains: Dealing with the Infinite in Mass
                       Customization
                                                                          Albert Haag1


Abstract. In this paper we propose to relax finiteness in relational                used in assembling the variant3 .
tables and tabular constraints in a controlled way. We preserve the                    MC adds customization to this setting by placing the emphasis
syntactic representation of a row in a table as a tuple of symbols.                 on individualization, i.e. there will be many variants of a product
Some of these symbols refer to an atomic value as usual. Others,                    and the business is prepared to produce only a single unit of each
which we call quasi-finite symbols (QF-symbols), refer to infinite                  one on demand (lot size one). Accordingly, the product properties
subsets of an underlying infinite domain. Practical examples for QF-                that distinguish the variants are central and potentially numerous4 .
symbols are references to (uncountable) real-valued intervals and                   In other work [13] we discuss the MC setting in more detail and
wildcards representing countably infinite sets. Our goal is to provide              show that compression of variant tables, tables listing combinations
a simple and smooth extension of the tabular paradigm, predomi-                     of product features, is a key element in managing the exponential
nant in business, that is compatible with compression of the table                  explosion of the number of variants caused by the increase in the
to c-tuples [14] or to a variant decomposition diagram [11], and is                 number of customization choices, which production technology
amenable to constraint processing, such as local propagation.                       now enables. Here, we propose to add to the expressivity of
   The approach is based on organizing the QF-symbols pertaining                    variant tables by presenting a quasi-finite (QF) framework that al-
to each product property in a specialization relation [8, 9]. A spe-                lows dealing with infinite sets of choices within the tabular paradigm.
cialization relation is a partial ordering that expresses specificity of
meaning. A QF-symbol can be ignored in the presence of a more                          If the domains of all descriptive properties are finite, then the num-
special one.                                                                        ber of variants is finite as well. Leaving the reference to the under-
   To ensure that the sets represented by two distinct QF-symbols                   lying generic MC product aside, each variant is defined by a value
pertaining to the same domain are disjoint, we further require that it              assignment to the properties, which we represent as a relational tuple
must be possible to represent the intersection and set-differences of               (r-tuple). If the number of offered variants is not large, these r-tuples
QF-symbols. In order to be able to remove duplicates implicated by                  can be maintained as rows in a database table or spreadsheet, which
a disjunction of QF-symbols from result sets of queries, we require                 then acts as a product model comprised of a single tabular constraint.
that it is possible to represent their normalized set-union.                        If desired or needed, this overall variant table can conceptually be
   QF-symbols may refer to any objects as long as the above require-                split into smaller tables that together form a constraint satisfaction
ments are met, e.g. regular expressions (unary predicates), rectangles              problem (CSP). Each CSP variable corresponds to a product prop-
(geometric shapes), etc.                                                            erty and each CSP solution to an offered product variant5 . We refer
                                                                                    to any tabular constraint on product properties as a variant table.
                                                                                       Variant tables are a form of modeling that is very acceptable to
1      Introduction                                                                 a business. Their downside is that they may not scale with a grow-
                                                                                    ing number of choices for individualization. However, we show in
This work expands on a common theme: that data in tabular form is a
                                                                                    [13] that expected regularities in the product variants will allow a
natural, non-proprietary medium for communicating between inter-
                                                                                    compressed form of the table to scale. Here, our choice of the com-
related business processes within an enterprise, as well as between
                                                                                    pressed form is a variant decomposition diagram (VDD) [10, 11],
enterprises. We focus on mass customization (MC), which we take to
                                                                                    and the associated c-tuples, a term adopted from [14] and used there
be mass production with a lot size of one. A non-configurable prod-
                                                                                    for a Cartesian product of sets of values.
uct, amenable to mass production, can have product variants2 . For
                                                                                       However, infinite domains occur in MC practice, and neither ta-
example, mass produced ballpoint pens come in several colors, but
                                                                                    bles of r-tuples nor classic CSP approaches allow infinite sets. In
are otherwise identical. The offered colors are in a one-to-one cor-
                                                                                    this paper we propose to relax finiteness in variant tables, and by ex-
respondence with the manufactured ballpoint variants. The business
                                                                                    tension in the associated constraint processing, in a controlled way.
attributes are maintained once for the generic ballpoint pen. Only one
                                                                                    3 The SAP tables relevant for configuration are listed in [6], Appendix A
generic bill of materials that covers all possibilities needs to be main-
                                                                                    4 For simplicity of exposition, we disregard the possibility of needing to
tained. For each variant the value of an additional product property,
                                                                                      deal with variant structures, i.e. variants that have variants as parts. Our
color, is needed to determine which ink filling and matching cap is                   approach here addresses tabular data in general and could be extended to
                                                                                      variant structures if needed.
1    Product Management GmbH, Germany, email: albert@product-                       5 In practice, product models are not limited to use only tabular constraints.
    management-haag.de                                                                However, the reasoning here shows that product variants could be exclu-
2 We use SAP terminology pertaining to the handling of products and product           sively expressed in tables in the finite case. The single overall table listing
    variants, citing [6] as a general reference. A brief sketch of the history of     all variants can be seen the the result of compiling the product model, e.g.
    the SAP Variant Configurator is given in [7]                                      to a decision diagram [2].
Our goal is to provide a simple and smooth extension of the tabular                As stated, the goal of this work is to smoothly extend the tabular
paradigm that retains its acceptance in business, and, particularly, al-        paradigm, not to compete with other dedicated problem solving
lows compression to a VDD and c-tuples. We preserve the syntactic               approaches, and we do not make any such comparisons here.
representation of a row in a table as a tuple of symbols, while allow-          The quasi-finite (QF) approach has not yet been tried in the field.
ing some of these, which we call quasi-finite symbols (QF-symbols),             Therefore, we cannot present results. Given that the VDD processing
to refer to infinite sets. A symbol for a real-valued interval, which           remains syntactically alike to the finite case, and given our positive
is uncountably infinite by definition, is an example of a QF-symbol.            experiences with specialization relations in other endeavors, we
A wildcard symbol that refers to an infinite domain is also a QF-               are confident that performance is not the issue. Instead, it will be
symbol. In contrast, a wildcard for a finite domain is just an alias for        a primary concern to establish usefulness in practice and evaluate
the finite set of values. We treat this as “syntactic sugar” and equiva-        acceptance by the business community.
lent to the expanded set of values.
   The approach we take here is illustrated by example in Section 4                 The paper is structured as follows:
and based on the following ideas6 :
                                                                                • We summarize a database approach to configuration in Section 2
• Each property domain is defined by a finite set of symbols:                     and the topic of compression to VDDs and c-tuples in Section 3.
                                                                                • We illustrate all ideas using an extensive example based on an MC
   – a finite domain by its values, which we refer to as r-symbols,
                                                                                  T-shirt in Section 4.
   – an infinite domain by one or more (disjoint) QF-symbols.                   • Constructing VDDs from QF-tuples is akin to constructing them
• We represent a value assignment to the product properties as a                  from c-tuples. This topic is beyond the scope of this paper. How-
  tuple of symbols. If the tuple contains only r-symbols, it is an r-             ever, we summarize the basic problem of ensuring disjoint c-tuples
  tuple. If it also contains QF-symbols, we call it a QF-tuple. Both              (QF-tuples) in Section 5.
  r-tuples and QF-tuples are interpreted a special cases of a c-tuple,          • We look at the motivating examples of QF-symbols and how they
  where an r-symbol in the tuple is treated as a singleton set.                   meet our requirements in Section 6 in some detail.
• We adapt the concept of a specialization relation from [8, 9] to              • We discuss queries to variant tables with QF-symbols in Section
  QF-symbols. When queries or constraint solving need to consider                 7.
  two different QF-symbols for the same property simultaneously,                • We present our ideas on specialization relations and their relation
  they can ignore both symbols and focus instead on the more spe-                 to constraint processing in Section 8. In particular we show that
  cial symbol for their set intersection.                                         local propagation works seamlessly.
• Only a finite number of QF-symbols is needed, which can be de-                • We also believe that using QF-symbols (and perhaps c-tuples in
  rived in advance from the product model (e.g. the property do-                  general) directly in the definition of a product variant has business
  mains and the variant tables)7 .                                                benefits, which we discuss in Section 9.
• Compression to a VDD, and through that to c-tuples, can be done               • We provide a summary and an outlook in Section 10.
  as in the finite case, if we can ensure certain requirements are met.
                                                                                2      Configuration in the Database Paradigm
   One difference between a QF-symbol and an r-symbol is that the
former still allows choice, i.e. it can be specialized or restricted            The easiest MC business setting is when the business offering is a
further when required. The consequence is that some constraints                 small finite set of product variants actually represented in extensional
may need to be formulated in non-tabular form, e.g. to express that             form in a relational database table or spreadsheet. Even when this is
for two real-valued properties length and width it should hold that:            not possible, due to the size such a table would have, tabular con-
length ≥ width. These constraints can be seen as inter-property                 straints can be used to define the valid variants.
predicates. Whereas we discuss unary intra-property predicates as                  The extensional form of a tabular constraint naturally supports var-
QF-symbols, we will not deal with other inter-property predicates in            ious data queries such as (1) and (2), here formulated in SQL, which
this paper, except to note in passing that restricting real-valued in-          are the most relevant for configuration as discussed in [11]8 .
tervals with numeric linear (in)equalities is an established technique             The query in (1) returns a result set of all variants matching the
(see Section 8.2) that can be smoothly integrated with our intended             user’s criteria9 . The k product properties are denoted by v1 , . . . , vk .
processing.                                                                     The variant table is denoted as hvtabi. hRj i denotes a subset of the
   We show how configuration queries over variant tables with QF-               domain Dj for product property vj . The values of interest to a user
symbols can be meaningfully supported. We also believe that the                 when configuring can be communicated in the WHERE clause.
concept of specialization relations is an important bridge to con-
                                                                                 SELECT * FROM hvtabi
straint processing in general. The idea of defining a specialization
relation via the subset-relation can be inverted: given a set of sym-                          WHERE hv1 i IN hR1 i AND . . . hvk i IN hRk i; (1)
bols from the column of a table that correspond to elements of a                  The query in (2) returns the domain restriction for property vj
partial order, such that a unique greatest successor and a unique least         under the WHERE clause.
common predecessor exists for any two elements, these symbols can
be treated in a like manner to QF-symbols for purposes of queries                SELECT DISTINCT hvj i FROM hvtabi
and constraint processing, if we are willing to interpret the partial                          WHERE hv1 i IN hR1 i AND . . . hvk i IN hRk i; (2)
order as a specialization relation.
                                                                                8 While the approach here may be extended to cover further SQL queries, this
6 This extends the simple processing of real-valued intervals and wildcards         is beyond the scope of this paper.
  proposed in [10, 11] for a set-labeled VDD.                                   9 In the SQL syntax, an IN term in the WHERE clause need not be specified
7 If further QF-symbols are generated dynamically externally, the special-          where no restriction is intended. However, for purposes of representing a
  ization relation will have to be extended dynamically. Nevertheless, at any       query condition as a c-tuple (see Section 3), we will substitute Rj = Dj
  given time a finite number of symbols will be needed.                             for an omitted IN term
  These queries can also be done to further filter the result sets of           must denote disjoint sets, in order to allow a unique decision for a
previous queries (see [11, 10]).                                                node, given a value assignment.
                                                                                   Nodes in an l-chain that all have a common HI-link represent the
   To sum up: tabular constraints in extensional form can be evalu-             disjunction of their value assignments and could be merged into one
ated using database queries. In [11] we have shown that this extends            set-labeled node. In [10, 11] we introduced a VDD with set-labeled
to tables represented as VDDs in a way that also guarantees the effi-           nodes, where a node was labeled with a finite set of r-symbols rep-
ciency of the queries. We now have to show here how to handle the               resenting a disjunction of value assignments. Since any node labeled
queries (1) and particularly (2) in conjunction with QF-symbols.                with such a finite set can be re-expanded into an l-chain of regular
                                                                                VDD nodes nodes that assign the symbols one at a time, we do not
                                                                                propose to use VDDs with set-labeled nodes in practice. We use them
3     C-Tuples, Table Compression, and Decision                                 here in Section 4 to simplify the exposition.
      Diagrams                                                                     A VDD is functionally equivalent to the extensional form of the
The discussion of compression in this section is illustrated with               table it represents from the perspective of the queries (1) and (2)
examples using a simple T-shirt in Section 4.                                   relevant for configuration, see [11]. The extension of these queries
                                                                                to include QF-symbols is the topic of Section 7. A VDD can also
   In the finite case a variant can be represented as an r-tuple. If we         support counting the number of tuples in a table or a result set of a
substitute sets for values in this tuple, the tuple is no longer relational,    query and access a tuple directly by its position in the table/result set.
but represents the Cartesian set of all r-tuples that can be formed as
combinations using values from the sets. We call such a Cartesian               4     T-Shirt Example
tuple a c-tuple10 . As a tuple we denote it by C = hC1 , C2 , . . . , Ck i,
where Cj ⊂ Dj and Dj is the domain of the product property vj ∈                 4.1    Classic Finite T-Shirt Variants
{v1 , . . . , vk }. As a Cartesian set it would be written as C = C1 ×
C2 × . . . × Ck .                                                               In [10] the concepts of representing a variant table using a VDD are
   In the context of the above definition, we don’t care whether an             illustrated using an example of a simple T-Shirt. We use this exam-
element Cj of a c-tuple is finite or infinite. Note that the set of r-          ple here both to illustrate the concepts discussed so far, and also to
tuples represented by a c-tuple is uncountable if one of the sets in the        illustrate the proposed extension to infinite sets.
c-tuple refers to a real-valued interval.                                          The simple T-shirt has the three properties Imprint (v1 ), Size (v2 ),
   The WHERE clause with the k IN operators in (1) and (2) itself               and Color (v3 ) with the finite domains:
describes a c-tuple hR1 , . . . , Rk i, which expresses the set of values
the user (the problem solving agent) believes in. We will refer to this         • {M IB(Men in Black), ST W (Save the Whales)}
c-tuple as the query condition. We will allow a query condition to be           • {L(Large), M (Medium), S(Small)}
any c-tuple from our variant domain.                                            • {Black, Blue, Red, W hite}

                                                                                   Only 11 variants are valid due to constraints that state that M IB
   C-tuples offer a way to compress tables. For example, if the set of
                                                                                implies Black and ST W implies ¬S(Small). Table 1 is the exten-
all variants is totally unconstrained, this can be represented by a sin-
                                                                                sional form of the variant table, which is small enough to be used as
gle c-tuple, which is the Cartesian product of the product domains.
                                                                                the only and definitive representation of the variants for the purposes
With constraints, there will be more c-tuples, but often a c-tuple rep-
                                                                                of both business and configuration. It encodes the underlying CSP as
resentation is much more compact than the extensional form [12].
                                                                                a single tabular constraint.
For this reason, c-tuples are already used both formally and infor-
                                                                                   The query (1) can be used to filter the variants to the set match-
mally in configuration practice.
                                                                                ing any given selection criteria (query condition) hR! , . . . , Rk i. For
   In the case of finite domains, a set of c-tuples can be further com-
                                                                                example, if the user needs a small (S) sized T-shirt, there is only one
pressed to a decision diagram (DD). We use the form of a Variant
                                                                                solution (the first row in Table 1). Alternatively, if a Red T-shirt is de-
Decomposition Diagram (VDD). As introduced in [10, 11], a VDD is
                                                                                sired, there are two variants that satisfy this (eighth and ninth rows),
a binary rooted Directed Acyclic Graph (DAG), where each node has
                                                                                and the domains are restricted as follows: Imprint ∈ {ST W },
a label denoting the assignment of a property to a value (r-symbol).
                                                                                Size ∈ {M edium, Large}, and Color ∈ {Red} by applying the
Here we will allow QF-symbols in node labels as well. Each node has
                                                                                query (2) for each property in turn.
two emanating links, HI and LO, which we characterize as follows
given a fixed ordering of the product properties: v1 , . . . , vk :11
                                                                                                        Table 1.   Simple T-shirt
• the HI-link of a node points to a node for the next product property
  vj+1 or to the terminal sink > (true) from last column nodes.                                         Imprint     Size    Color
• the LO-link points to an alternate value assignment for the same                                      MIB         S       Black
                                                                                                        MIB         M       Black
  product property vj or to the terminal sink ⊥ (false).
                                                                                                        MIB         L       Black
                                                                                                        STW         M       Black
 We will call a chain of nodes linked via LO-links an l-chain. If                                       STW         L       Black
more than one QF-symbol appears in an l-chain, the QF-symbols                                           STW         M       White
                                                                                                        STW         L       White
10 We adapt the term from [14], which investigates direct compression to                                STW         M       Red
    c-tuples.                                                                                           STW         L       Red
11 Under these assumptions, a multi-valued decision diagram (MDD), a more                               STW         M       Blue
    widely known form of a DD [3, 1], can be mapped to a VDD. This is further                           STW         L       Blue
    detailed in [11]
   Figure 1 depicts a VDD with set-labeled nodes for Table 1. The
                                                                                                      Table 4.     Split c-tuples for simple T-shirt with infinite domains
HI-links in each path from the root to the sink > in the VDD in
Figure 1 define a c-tuple. The set of all c-tuples that can be formed is
disjoint and is a way to represent Table 1 in compressed form. Table                                  Scale          Imprint             Size                     Color
                                                                                                     1.0             MIB                 S;M;L                    Black
2 lists the two c-tuples needed to represent the 11 variants.                                        1.0             STW                 M;L       Black;White;Red;Blue
                                                                                                     [1.0]           himg-filenamei      S;M;L                   White
                                                                                                     [0.5, 1.0)      himg-filenamei      S;M;L                   White

                          1:(3, MIB)|7

                                                                                                   We now discuss how to construct a VDD with set-labeled nodes
                                                                                                for these c-tuples. Figure 2 shows the result12 .
                   2:(3, STW)|6

                                                                                                1. We construct a root node ν1 labeled hv1 , [1.0]i starting with the
                                                                                                   first c-tuple. This node will be used for both the first and second
           12:(2, [Large, Medium])|5                      10:(2, [Large, Medium, Small])|3
                                                                                                   c-tuples in Table 3.

                    11:(1, [Black, Blue, Red, White])|4                        6:(1, Black)|2   2. We construct the l-chain (chain of LO-links) for the root node:
                                                                                                   • We pointed out in Section 3 that nodes linked in an l-chain need
                                    F                            T
                                                                                                     to have disjoint set labels. This is illustrated here. It will be a
                                                                                                     problem if we label ν2 with [0.5, 1.0] (third c-tuple), because
                                                                                                     then the value Scale = 1.0 does not allow deciding uniquely
                                                                                                     for either ν1 or ν2 . So we split [0.5, 1.0] into the two disjoint
             Figure 1. VDD of T-shirt with set-labeled nodes
                                                                                                     c-tuples.

                                                                                                           [1.0]             himg-filenamei      {S, M, L}         W hite
                                                                                                           [0.5, 1.0)        himg-filenamei      {S, M, L}         W hite
                       Table 2.         C-tuples for simple T-shirt
                                                                                                   • Table 4 shows all the c-tuples to be handled in constructing
                                                                                                     the VDD. The new third c-tuple is covered by ν1 . We label the
                   Imprint         Size          Color                                               second node ν2 , linked from ν1 via its LO-link, with the half-
                   MIB             S;M;L         Black                                               open interval [0.5, 1.0) from the fourth c-tuple. This handles
                   STW             M;L           Black;White;Red;Blue
                                                                                                     the first column.
                                                                                                3. We next process the rest of the three tuples in Table 4 that start
                                                                                                   with C1 := [1.0]. We create:
                                                                                                   • ν3 , the target for the HI-link of ν1 , labeled with hv2 , M IBi
4.2     T-Shirt with Infinite Domains
                                                                                                   • ν4 , the target for the LO-link of ν3 , labeled with hv2 , ST W i
To illustrate the use of infinite sets, we modify the example to allow                             • ν5 , the target for the LO-link of ν4 , labeled with
an arbitrary user-provided image as an imprint on a white T-shirt. An                                hv2 , himg-filenameii
image is identified at runtime via a file name. The file name must
refer to a processable graphic, which is taken to mean that only a                              4. It is straightforward to handle the third column for the above three
jpg or a tiff format can be accepted. Hence, the domain is the in-                                 c-tuples: nodes ν3 , ν4 , and ν5 have their HI-links pointing to
finite set of all legal file names that match the regular expression                               nodes ν6 , ν7 , and ν8 , respectively with the labels depicted in Fig-
himg-filenamei = ∗.jpg| ∗ .tiff.                                                                   ure 2:
   We also add a property Scale to capture a factor to be used to scale                            • The first of the c-tuples allows only the color Black (node ν9 ).
the image printed on the T-shirt. For the vintage prints MIB and STW                               • The second allows all colors (node ν10 ), and
we require Scale = 1. For the user-provided images, the scale can
be arbitrarily chosen by the user as a floating point number in the                                • the third allows only the color White (node ν11 ).
range 0.5 to 1 (the interval [0.5, 1.0]).                                                          This completely handles the first three c-tuples.
   Table 3 lists the c-tuples needed to describe this setting. The prod-
uct property Scale has here been placed as the first property v1 . The                          5. It is now trivial to finish the VDD. Node ν2 still needs to be pro-
other properties are now v2 (Imprint), v3 (Size), and v4 (Color).                                  cessed with respect to the last (fourth) c-tuple. But the columns
                                                                                                   two to four are identical to those in the third c-tuple. Node ν5 was
        Table 3.       C-tuples for simple T-shirt with infinite domains                           already constructed for this.

                                                                                                   We note that we skirted the issue that the product property Im-
       Scale           Imprint                   Size            Color                          print (v2 ) allows both values from a finite list, e.g. {M IB, ST W },
      1.0              MIB                       S;M;L           Black
      1.0              STW                       M;L             Black;White;Red;Blue           as well as arbitrary “additional” values (himg-filenamei). This is not
      [0.5, 1.0]       himg-filenamei            S;M;L           White                          12 To reduce the size needed to display the graph, the terminal sink ⊥ has
                                                                                                   been omitted. Conceptually, it terminates all chains of LO-links. Also, the
                                                                                                   nodes ν1 , ν2 , . . . νn are identified by “n1”, “n2”, . . . “nn”
                                                                                                The result set intersected with the external condition is then:
                                                                                                           h[0.5, 1.0], {my-img.jpg}, {S, M, L}, W hitei
                                   n1:(1, [1.0])

                                                                                                   If the query formulates the additional restriction Scale ∈
                                                                                                [0.25, 0.75], then the result set intersected with the external condi-
                  n2:(1, [0.5, 1.0))               n3:(2, MIB)                                  tion is:
                                                                                                          h[0.5, 0.75], {my-img.jpg}, {S, M, L}, W hitei
                                                   n4:(2, STW)              n6:(3, [L, M, S])

                                                                                                5   Excursion on the Construction of VDDs from
                                                                                                    C-Tuples
              n5:(2, )                   n7:(3, [L, M])
                                                                                                A c-tuple C can be decomposed into its head (the first element C1 )
                                                                                                and its tail T, which is also a c-tuple. We denote this by C := C1 |T:
   n8:(3, [L, M, S])                   n10:(4, [Black, Blue, Red, White])
                                                                                                         C := hC1 , . . . , Ck i = C1 |hC2 , . . . , Ck i = C1 |T     (3)
                                                                                                   When constructing a (partial) VDD from a list of c-tuples
                n11:(4, White)                                          n9:(4, Black)           C1 , . . . Cm an l-chain for the head (root) node is constructed us-
                                                                                                ing the first elements C11 , C21 , . . . , Cm1 . As discussed in Section 3
                                                                                                and evident from the example in Section 4.2 these elements must be
                                                         T                                      disjoint.
                                                                                                   If there are two c-tuples Ci , Ci0 with the same tail, i.e.
                                                                                                                Ci = Ci1 |T      and    Ci0 1 = Ci0 1 |T
         Figure 2. VDD of T-shirt with non-finite set-labeled nodes
                                                                                                then their first elements must be merged to yield one c-tuple
                                                                                                                         C0 = (Ci1 ∪ Ci0 1 )|T
uncommon in practice where a “standard” solution is modeled with
a predefined finite domain, but additional values are allowed (see                                We can ensure disjointness of any other pair of c-tuples Ci , Ci0
[6]). The sets {M IB}, {ST W } and himg-filenamei are effectively                               with differing tails
treated as disjoint by the VDD due to the constructed l-chain of nodes                                         Ci = Ci1 |Ti      and   Ci0 1 = Ci0 1 |Ti0
ν3 , ν4 , and ν5 ). This could be formally ensured by augmenting the
QF element himg-filenamei to himg-filenamei ∩ ¬{M IB, ST W }                                    by replacing them with the three c-tuples Ca , Cb , Cc in (4) (a c-tuple
(see Section 6).                                                                                with an empty element is considered empty and can be disregarded):
Lastly, we informally discuss some exemplary queries. QF queries
are the subject of Section 7. The query to Table 1 for small (S) T-                                                Ca = (Ci1 \ Ci0 1 )|Ti
Shirts yielded a result set consisting of one r-tuple (the first row). The                                         Cb = (Ci0 1 \ Ci1 )|Ti0
domains for the three product properties were restricted to {M IB},                                                 Cc = (Ci1 ∩ Ci0 1 )|(Ti ∪ Ti0 )                   (4)
{S}, and {Black}. The same query condition against Table 4 yields
three c-tuples (the first, third and fourth c-tuple). Each of these c-                             As the example in Section 4.2 shows, the c-tuple heads show up
tuples in the result set must be intersected with the query condi-                              directly as labels of set-labeled nodes. We already stated that a set-
tion to eliminate the sizes medium (M) and large (L) that are in                                labeled node labeled with a finite set of symbols (r-symbols or QF-
the c-tuples but excluded by the query condition. Consequently, the                             symbols) can be expanded to an l-chain of regular VDD nodes.
domains for the four product properties are restricted to [0.5, 1.0]
{M IB, himg-filenamei}, {S}, and {Black, W hite}13 .                                            6   Operations with Non-Finite Elements in
   Similarly, the query to Table 1 for Red T-Shirts yielded a result set                            C-Tuples
consisting of two r-tuples (the eighth and ninth row). Against Table 4
the result set consists of one (the second) c-tuple. After intersection                         As the discussion in Section 5 and the example in Section 4 make
with the query condition the four product properties are restricted to                          clear, it will be necessary to both split and combine c-tuples when
[1.0] {ST W }, {M, L}, and {Red}.                                                               constructing a VDD and result sets. Therefore, we need the follow-
   Instead, if the query condition simply specifies a file name for a                           ing operations on c-tuple elements Cij , Ci0 j pertaining to the same
particular image, my-img.jpg, then the last two c-tuples would be the                           product property vj :
result set. They agree completely except in the first column. As there                          • set intersection: Cij ∩ Ci0 j
is no need for the split here (as there was when constructing the orig-                         • set union: Cij ∪ Ci0 j
inal VDD), the two tuples could be combined into one14 :                                        • negation with respect to the overall domain: ¬Cij := Dj \ Cij
                                                                                                • set difference: Cij \ Ci0 j = Cij ∩ ¬Ci0 j
             h[0.5, 1.0], himg-filenamei, {S, M, L}, W hitei
                                                                                                   For finite sets this is a given. For QF-symbols that are used in the
13 Formed by collecting all symbols occurring for each column and calculat-
                                                                                                labels of VDD nodes, we must ensure that these operations are well
   ing the the union. The result of the union of the two intervals is normalized
   (see Section 6)                                                                              defined and fit in our QF framework.
14 This reduction is actually required where we want the tuples in a result set                    In the following subsections we look at this in detail for the infinite
   to be distinct, i.e. to have been normalized.                                                elements we propose to add:
• Real-valued intervals                                                               A unary predicate can be represented by its name (a symbol),
• Unconstrained countably infinite sets                                            which serves as the QF-symbol identifying it. In the example in Sec-
• Sets of exclusions, particularly finite exclusion sets.                          tion 4.2, we used the notation himg-filenamei to refer to a regular
                                                                                   expression for legal file names.
  Where the domain underlying negation needs to be made clear we
                                                                                      The set operations translate into logical operations for predicates.
will denote negation as:
                                                                                   The union of two infinite sets qualified by predicates π1 and π2 is just
                                      D
                           ¬C := C        =D\C                                     a set qualified with the disjunction π1 ∨ π2 . Similarly, intersection
                                                                                   translates to π1 ∧ π2 , and negation to ¬π1 .
                                                                                      Again, we require normalization to reduce a complex logical
6.1     Real-Valued Intervals and the Xnumeric                                     expression by removing any redundant elements. It has yet to be
        Datatype                                                                   determined what works best in practice here. From a theoretical
We denote a real-valued interval using conventional mathematical                   view, we might require a disjunctive normal form (DNF). The
notation, e.g. [a, b) for a half-open interval with a closed lower bound           overall predicate could then be represented as a list (finite set) of
a and an open upper bound b. This is the set of all real numbers x                 conjunctions. A set-labeled node for such a list can be expanded to
such that x >= a ∧ x < b. We allow lower and upper infinity,                       an l-chain of nodes, as for xnumerics. Each such node would be
denoted by − inf and + inf, with open bounds. A single real number                 labeled by a conjunction of predicates, which would be treated as an
x can be encoded as a singleton interval [x]. All other interval bounds            indivisible QF-symbol.
can be open or closed
   We define an xnumeric to be a finite list of real-valued intervals                 We must also deal with set unions between finite sets and count-
representing the union of its elements in a normalized form. Normal-               ably infinite sets. In the example in Section 4, the standard imprints
ized means that the intervals in the list are disjoint, separable, and in          for the T-shirt formed a finite set {M IB, ST W }, but “additional
ascending order, e.g. the set of intervals {[0.5, 1.0), [1.0]} is disjoint         values” were then allowed, which were specified by the QF-symbol
and ascending, but it is not separable. In normalized form it is just              himg-filenamei. The domain of the property imprint is just the union
[0.5, 1.0]. (Remark: The interval {[0.5, 1.0), (1.0, 2.0]} is separable            of these sets. Generally, the domain D for a product property with a
and thus normalized, because its two intervals are separated by the                non-xnumeric datatype is D = {F, π} := F ∪ π, where F is a finite
“gap” of the singleton interval [1.0].)                                            set of values, π a predicate representing an countable infinite set, and
   For xnumerics it is straightforward to ensure normalization. First,             both F and π respect the datatype assigned to the product property.16
any intervals that are non-disjoint or not separable can be merged                    The set-operations then become:
into one interval. Since the remaining intervals are disjoint, they can
be ordered. Hence the union of two xnumeric is just the set union fol-             • set intersection: {F, hπi} ∩ {F 0 , hπ 0 i} = {F ∩ F 0 , hπ ∧ π 0 i}
lowed by normalization. The intersection is just the list of pairwise              • set union: {F, hπi} ∪ {F 0 , hπ 0 i} = {F ∪ F 0 , hπ ∨ π 0 i}
                                                                                                                 hπi
intersections. Because the xnumeric is ordered due to normalization,               • negation: ¬{F, hπi} := F         ∩ ¬hπi
this operation is efficient in the sense that it is not necessary to actu-         • set difference: {F, hπi} \ {F 0 , hπ 0 i} = {F 00 , ¬F 0 , hπi \ hπ 0 i}
ally intersect all pairs.                                                             – where F 00 is the finite set F \ F 0 ∪ F \ hπ 0 i, and
   The set union of two intervals is not necessarily again an interval,                                              hπi
hence we need the concept of an xnumeric.                                             – the finite set ¬F 0 = F 0      is an exclusion set of all values in
   The unconstrained xnumeric is the interval (− inf, + inf). The                       F 0 that lie in hπi (see Section 6.3).
negation of an xnumeric is the set difference to this unconstrained
set. It is formed by inverting the finite number of “gaps” between the             6.3    Exclusions and Exclusion Sets
intervals in the xnumeric. For example
                                                                                   An exclusion of a value x from a property domain D is a way of stat-
    ¬{[0.5, 1.0), (1.0, 2.0]} = {(− inf, 0.5), [1.0], (2.0, + inf)}                ing D\{x}. It means that x is considered to be invalid, which we will
   Remark: a finite set of real number values can be represented as                denote by ¬x. For real-valued domains, exclusions can be directly
an xnumeric using singleton intervals. All interaction with finite sets            formulated as xnumerics, e.g., {(− inf, x)(x, + inf)} would exclude
is covered by the above operations defined for xnumerics.                          the real number x. For a finite domain or an xnumeric domain, we can
   An xnumeric is a list of QF-symbols (intervals) representing their              simply positively represent the set D \ {x}. For a countably infinite
set union. A set-labeled node for an xnumeric can be expanded to an                domain, we need further expressiveness. Given a countably infinite
l-chain of nodes with interval labels.                                             domain D for a product property and a finite set of values E ⊂ D,
                                                                                                                              D
                                                                                   we introduce an exclusion set ¬E := E := D \ E. An exclusion
                                                                                   set ¬E can be merged with a unary predicate π by removing any val-
6.2     Countably Infinite Sets and Domains                                        ues from E that do not satisfy the predicate π, i.e. ¬E ∩ hπi ⊂ ¬E
Examples of countably infinite domains are the list of all integers or             is a reduced exclusion set. In order to keep the exposition simple, we
all strings. This requires that each product property is associated with           will ignore this reduction and denote ¬E ∩ hπi also by ¬E.
an immutable datatype. We consider a domain D or any C ⊂ D to be                      For two exclusion sets ¬E, ¬E 0 , the required set operations are
qualified by a unary predicate (condition) that filters out disallowed             inverted:
values at run-time (e.g. a regular expression for a string). Any value
fulfilling the predicate (e.g. any string matching the regular expres-             • set intersection: ¬E ∩ ¬E 0 = ¬(E ∪ E 0 )
sion) is an acceptable value. Examples for qualifying predicates for               • set union: ¬E ∪ ¬E 0 = ¬(E ∩ E 0 )
an integer datatype are: positive p, even p or odd p.15                            • negation: ¬¬E = D \ (D \ ¬E) = E
15 In the absence of more specialized predicates, htruei is taken as the default   16 Ideally, F and π will be disjoint. Either F or π can be empty. We define
  predicate.                                                                         the predicate hfalsei to represent the empty set.
• set difference: ¬E \ ¬E 0 = E 0 \ E                                           • There is a unique symbol ⊥ (false) that is a special of all other
                                                                                  symbols. This is a QF-symbol for the empty set.
   Finite exclusion sets are needed in order to meet our require-               • For any two symbols in the PO, there exists a unique symbol for a
ments of negation of finite sets against infinite domains. The                    greatest common successor/special (the “intersection”).
concept can also be extended to infinite exclusion sets. Indeed, a              • For any two symbols in the PO, there exists a unique symbol for a
negated unary predicate corresponds to such a set. For example, if                least common predecessor/general (the “normalized union”)17 .
the set of all prime numbers is represented by the unary predicate              • There is a unique top-level symbol Ω that represents the entire
hprime pi, the ¬hprime pi represents exclusion of all prime integers.             domain. For any symbol φ in the PO, there exists a symbol ¬φ in
                                                                                  the PO, such that the least common predecessor of φ and ¬φ is Ω
  In either case, a reference to an exclusion set is treated as a QF-             and the greatest common successor is ⊥. ¬φ denotes the negation
symbol. For example, for a predicate π, ¬π is the symbol represent-               of φ.18
ing the exclusion of all values in π.
                                                                                For example, we could arrange images in a PO and declare it a spe-
                                                                                cialization relation, paying some attention to fulfill the requirements
7     Queries on Quasi-Finite VDDs                                              in the spirit of the intended PS.
                                                                                   Specialization relations provide some conceptual and practical
In the classic finite case, the result set R of the query (1) is a finite set
                                                                                benefits:
of r-tuples. In the QF framework, it is a finite set of QF-tuples that
may contain both value symbols and QF-symbols. A QF-symbol in                   • They can be pre-calculated and stored in a graph. This may be
the result set must be specialized to conform to the the query condi-             more performant than calculating intersections and unions on the
tion, e.g. by set intersection with the query condition. Problem solv-            fly.
ing (PS) must expect the remaining degree of non-determinism.                   • They provide a general concept to adapt PS to QF-symbols: PS
   The query (2) contains the keyword DISTINCT. This means any                    must simply be prepared to specialize the assignment of a QF-
duplicates must be removed from the result set for the particular col-            symbol to a CSP variable.
umn (property). We see replacing QF-symbols by their normalized                 • They generalize to other objects, e.g. shapes, images, taxonomies,
union akin to removing duplicates. Therefore, the symbols in the re-              etc.
sult set, both QF-symbols and r-symbols, must be replaced by their
normalized union, which ensures also that remaining symbols are
pairwise disjoint.                                                              8.2     Local Propagation with QF-Symbols
                                                                                If a product model contains multiple constraints, local propagation
8     Constraint Processing with Quasi-Finite Symbols                           [4] can be used to restrict the domains of the product properties to a
                                                                                state of arc consistency. Any domain restriction of a product prop-
8.1    Specialization Relations                                                 erty is propagated to all constraints that reference the same product
                                                                                property. The process continues until no further restrictions are pos-
Given two QF-symbols φ1 , φ2 for the same CSP variable, we regard
                                                                                sible. For a tabular constraint, the query (2) can be used to determine
φ2 to be more special than φ1 if φ2 denotes a subset of φ1 . This
                                                                                the domain restrictions, which are then propagated (see Section 7).
leads to a partial ordering (PO) of the symbols that occur in the vari-
                                                                                   The QF framework fits nicely in this scheme. A c-tuple com-
ant tables, which we call a specialization relation, introduced and
                                                                                prised of finite sets of symbols that may include the normalized union
motivated for another context in [9]. Generally, a specialization rela-
                                                                                of QF-symbols may be used as a query condition. The domain re-
tion on a set of facts expresses specificity of meaning, characterized
                                                                                strictions that result from the query (2) with this query condition
by the following three properties:
                                                                                may again contain the normalized union of QF-symbols. The c-tuple
• Problem solving (PS) need not consider an otherwise valid fact                formed from the resulting domain restriction for each column can be
  in the presence of a more special one (procedural-subsumption                 smoothly propagated to other constraints.
  property). This property requires the acquiescence of PS.                        If the product model is entirely made-up of tabular constraints, the
• A fact is logically implied by any of its specializations (semantic-          local propagation of QF-symbols is covered by our approach. It is
  compatibility property).                                                      also straightforward to include constraints representing numeric lin-
• Negation inverts specialization (symmetry-under-negation prop-                ear (in)equalities when propagating real-valued intervals.19 Extend-
  erty).                                                                        ing the propagation of QF-symbols yet further is a topic of future
                                                                                work.
   The facts we deal with in this paper are assignments of r-symbols
and QF-symbols to a CSP variable. The PS we consider consists of
                                                                                8.3     General Constraint Solving
queries to the table and constraint processing, particularly local prop-
agation of constraints. From the perspective of queries we addition-            Extending existing problem solvers to deal with QF-symbols, will
ally need to be able to aggregate the result sets into a normalized             require an analysis of the particular methods employed. However, a
form, e.g. delete duplicate r-symbols, replace two QF-symbols by a              17 We had noted that the union of two QF-symbols need not itself be a QF-
more general one representing their union, etc. We have shown in                   symbol. Here, however, it is an advantage to be able to have a least com-
Section 6 that the QF-symbols we primarily envision meet these re-                 mon predecessor/general representing the union as part of the PO. A more
quirements.                                                                        detailed treatment of specialization relations is deferred to a discussion us-
   We can also turn the reasoning around and define a PO of symbols                ing practical examples when they arise.
                                                                                18 We need “negation” primarily to ensure a “set-difference” operation to
pertaining to the same property domain as a specialization relation if             be able to split two symbols into a disjoint triple of symbols as in (4) in
we can show that it has the above properties and if we also guarantee              Section 5.
                                                                                19 This is implemented in the SAP product configurators ([6]).
the following:
main common idea is that constraint problem solving will make use         of MC business. This also includes evaluating the need of integrating
of the concept of specialization relations. Instead of exploring the      with other types of constraints, such as linear (in)equality constraints,
validity of a simple value assignment, an assignment of a variable        and the business value of indeterminism in product variants.
to a QF-symbol can be specialized. When considering such an as-
signment and a constraint on the same variable, the greatest common
                                                                          ACKNOWLEDGEMENTS
special must be substituted in the assignment. A forced specializa-
tion to the empty set would invalidate an assignment. The solutions       I would like to thank my daughter Laura and the reviewers for their
found by constraint solving may contain (specialized) QF-symbols,         comments, which helped improve this paper considerably.
i.e. exhibit a degree of non-determinism that cannot be avoided and
is to be expected.
                                                                          REFERENCES
                                                                           [1] Jérôme Amilhastre, Hélène Fargier, Alexandre Niveau, and Cédric
9    Indeterminism in Variants and Sub-Variants                                Pralet, ‘Compiling csps: A complexity map of (non-deterministic) mul-
                                                                               tivalued decision diagrams’, International Journal on Artificial Intelli-
A product variant is classically defined as an r-tuple, a value assign-        gence Tools, 23(4), (2014).
ment to each of its product properties, but this is neither ideal nor      [2] Henrik Reif Andersen, Tarik Hadzic, John N. Hooker, and Peter Tiede-
sufficient in practice. Some degree of indeterminism in a variant is           mann, ‘A constraint store based on multivalued decision diagrams’, In
                                                                               Bessiere [5], pp. 118–132.
needed when a variant is to be further specialized in a later business     [3] Rüdiger Berndt, Peter Bazan, Kai-Steffen Jens Hielscher, Reinhard
process (e.g., at the customer’s site). For example, a pump may be             German, and Martin Lukasiewycz, ‘Multi-valued decision diagrams for
sold with a connection that fits several different sizes of hoses. The         the verification of consistency in automotive product data’, in 2012 12th
end customer may have to make a manual adjustment for the partic-              International Conference on Quality Software, Xi’an, Shaanxi, China,
ular hose they want to attach by cutting off a part of the provided            August 27-29, 2012, eds., Antony Tang and Henry Muccini, pp. 189–
                                                                               192. IEEE, (2012).
connector. The pump being sold to the customer by the business is          [4] C. Bessiere, ‘Constraint propagation’, in Handbook of Constraint Pro-
a variant of their MC “pump” product that allows further individual-           gramming, eds., F. Rossi, P. van Beek, and T. Walsh, chapter 3, Elsevier,
ization at the customer’s site.                                                (2006).
   The number of sub-variants can be infinite, e.g. for the frequency      [5] Christian Bessiere, ed. Principles and Practice of Constraint Program-
                                                                               ming - CP 2007, 13th International Conference, CP 2007, Providence,
a radio receiver may be tuned to. As built, the property Frequency             RI, USA, September 23-27, 2007, Proceedings, volume 4741 of Lecture
would be described by a list of real-valued intervals for possible re-         Notes in Computer Science. Springer, 2007.
ception bands, e.g. {[7.2, 7.45], [9.4, 9.9], [11.6, 12.1]} (MHz).         [6] U. Blumöhr, M. Münch, and M. Ukalovic, Variant Configuration with
   Allowing a variant to be defined by a c-tuple solves the above              SAP, second edition, SAP Press, Galileo Press, 2012.
problem. However, c-tuples that define variants must be distin-            [7] A. Haag, ‘Chapter 27 - Product Configuration in SAP: A Retrospec-
                                                                               tive’, in Knowledge-Based Configuration, eds., Alexander Felfernig,
guished from those that are merely the by-product of compression.              Lothar Hotz, Claire Bagley, and Juha Tiihonen, 319 – 337, Morgan
This would be an open MC business topic.                                       Kaufmann, Boston, (2014).
                                                                           [8] Albert Haag, ‘Konzepte zur praktischen handhabbarkeit einer atms-
                                                                               basierten problemlösung’, in Das PLAKON-Buch, Ein Expertensys-
10    Summary and Outlook                                                      temkern für Planungs- und Konfigurierungsaufgaben in technischen
                                                                               Domänen, eds., Roman Cunis, Andreas Günter, and Helmut Strecker,
This work extends a common theme: that data in tabular form is not             volume 266 of Informatik-Fachberichte, 212–237, Springer, (1991).
only natural for modeling variants, but also a natural, non-proprietary    [9] Albert Haag, The ATMS - an assumption based problem solving ar-
medium for communicating between interrelated business processes               chitecture utilizing specialization relations, Ph.D. dissertation, Kaiser-
                                                                               slautern University of Technology, Germany, 1995.
within an enterprise, as well as between enterprises. Compression of      [10] Albert Haag, ‘Column oriented compilation of variant tables’, in Pro-
tables is essential in MC for letting variant tables scale with a grow-        ceedings of the 17th International Configuration Workshop, Vienna,
ing number of choices, which production technology now enables                 Austria, September 10-11, 2015., eds., Juha Tiihonen, Andreas A.
[13]. C-tuples are a transparent yet powerful form of compression              Falkner, and Tomas Axling, volume 1453 of CEUR Workshop Proceed-
                                                                               ings, pp. 89–96. CEUR-WS.org, (2015).
that is transparent and upon which non-proprietary exchange formats
                                                                          [11] Albert Haag, ‘Managing variants of a personalized product’, Journal of
can be based. This is addressed in other work, e.g. [13]. Here, we             Intelligent Information Systems, 1–28, (2016).
propose to add to the expressivity of variant tables by presenting a      [12] Albert Haag, ‘Assessing the complexity expressed in a variant table’,
quasi-finite (QF) framework that allows dealing with infinite sets of          in Proceedings of the 19th International Configuration Workshop, La
choices within the tabular paradigm.                                           Defense, France, September 14-15, 2017., pp. 20–27, (2017).
                                                                          [13] Albert Haag and Laura Haag, ‘Empowering the use of variant tables in
   We believe the QF framework presented here meets these expec-               mass customization’, in Proceedings of the MCP-CE 2018 conference,
tations. The main idea is that problem solving will deal with the              Novi Sad, Serbia, September 19-21, 2018., p. (Submitted), (2018).
QF-symbols representing infinite sets via specialization relations. In-   [14] G. Katsirelos and T. Walsh, ‘A compression algorithm for large arity
stead of exploring the validity of a value assignment of a value (fea-         extensional constraints’, In Bessiere [5], pp. 379–393.
ture) to a variable (product property), the assignment to a QF-symbol
can be specialized. A forced specialization to the empty set would in-
validate an assignment.
   The discussed techniques involving c-tuples and VDDs using QF-
symbols has not yet been deployed in practice. The individual ingre-
dients: c-tuples, VDDs, and the management of partial orders (POs)
for specialization relations have all been applied with positive re-
sults. As already mentioned, the question of how to define practical
normalization of predicates is open. However, we believe that the
primary open issue is to verify that it actually meets the expectations