<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Zero-Defect Manufacturing: Machine through Unsupervised Learning in the Printing Industry Selection</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>National Technical University of Athens</institution>
          ,
          <addr-line>Patission Complex 42, Patission str, 10682 Athens</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>National and Kapodistrian University of Athens</institution>
          ,
          <addr-line>Psachna, Evia, 34400</addr-line>
          ,
          <country country="GR">Greece</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Zero Defect Manufacturing (ZDM), being one of the main concepts of Industry 4.0, is especially critical in the offset printing industry, since it is associated with production enhancement and environmental footprint reduction. This work proposes a Machine Learning clustering-based approach to determine hidden order attributes that can be used to define a beneficial machine selection policy for the incoming orders in terms of fault occurrence reduction and production enhancement. Three clustering methods (k-means, agglomerative hierarchical clustering and density-based scanning) are modified in order to reveal the hidden order features that have a significant impact on the number of defected pieces. First, the ML framework of the clustering methods is presented, mainly including the fine-tuning of the learning parameters. Then, the trained ML models are compared in terms of their performance on unseen data to evaluate the machine selection process. The evaluation outcomes demonstrate the ability of the clustering ML framework to ensure proactive machine selection policy, reducing the printing defects.</p>
      </abstract>
      <kwd-group>
        <kwd>1 Machine learning</kwd>
        <kwd>Industry 4</kwd>
        <kwd>0</kwd>
        <kwd>zero-defect-manufacturing</kwd>
        <kwd>unsupervised learning</kwd>
        <kwd>clustering</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        The primary target of Industry 4.0 is to ameliorate the current conventional production methods by
combining innovative data technologies from both physical and digital contexts [
        <xref ref-type="bibr" rid="ref1 ref2">1, 2</xref>
        ]. This
transformation will enable the manufacturing production to move from the state of a posteriori
management to the state of timely prediction of optimal resource and process management,
optimizing the quality of the product and the usage of raw materials, while also minimizing the
production chain defects [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. The concept of Zero-Defect Manufacturing (ZDM) has therefore been
adapted by the majority of the stakeholders operating in the manufacturing domain, not only due to
the effective cost reduction in their production chain, but also due to the reduction of their
environmental footprint [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. For these purposes, the whole industrial field is currently moving beyond
reactive resource management towards proactive and predictive solutions, necessitating the
establishment of Artificial Intelligence (AI)-assisted solutions. In the context of Industry 4.0,
AIbased techniques and Machine Learning (ML) methods are used as the primary instigators to enable
self-optimization and automation in the manufacturing process, as well as provide fault detection and
real-time decision making functionalities towards ZDM [
        <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4">1–4</xref>
        ].
      </p>
      <p>
        By leveraging proactive and predictive measures in the industrial production chain, product quality
can be effectively maximized and the cost associated with defects can be eliminated. AI/ML-assisted
solutions have therefore been developed for typical manufacturing applications such as fault
detection, predictive maintenance, optimization of the manufacturing process and machine
configuration parameters and enhancement of the energy savings [
        <xref ref-type="bibr" rid="ref3 ref5">3, 5</xref>
        ]. With the use of ML
algorithms, prediction of defected products can be identified in advance and machine configuration
parameters can be linked with the fault occurrence [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ].
      </p>
      <p>Offset printing is one of the widely-used printing processes ever developed, accommodating many
types of printing jobs, including newspapers, magazines, brochures, labels, books, and many others.
The main identified issues that still remain unsolved are: (i) the number and the diversity of
characteristics of the orders do not allow for an easy standardization of the processes; (ii) the
deficiencies are typically observed during the quality control of the final product, thus leaving no
space for corrective actions; (iii) existing rule-based optimization methods work in an a-posteriori
management instead of timely prediction of optimal resource management; (iv) the printing industry
exhibits a significant environmental footprint, since the manufacturing process involves extensive
usage of raw materials (water, paper, ink, aluminum), where defected products contribute the largest
part.</p>
      <p>In the present exploratory work, we propose three modified Unsupervised Learning (UL) modeling
algorithms in order to facilitate the standardization of ML-empowered methods in the offset printing
industry. The present work proposes: (i) unsupervised learning-driven modeling of each machine with
the goal of revealing the cluster of the minimum defects per order; (ii) investigation of the hidden
attributes of machines to optimize the machine selection policy; (iii) construction of machine-specific
clustering models using three well-established algorithms, namely k-means, agglomerative
hierarchical clustering and density-based scanning (DBSCAN); (iv) a constraint-dependent clustering
approach based on pre-defined functions; (v) quantitative validation of the developed models
compared to the existing machine selection policy.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods 2.1.</title>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <p>The offset printing process consists of three phases, namely pre-press, press and post-press. In
each one of these three phases, several (raw, organic, chemical and recycled) materials are used,
including paper, water, ink, aluminum, alcohol solutions, having a direct impact not only on the
environmental footprint but also on the economic growth. This paper exploits a subset of historical
dataset obtained from the press stage of an offset printing company during the last two years. Data
collection was performed at the single-order level for five operating printing machines, meaning that
the features of a given order were recorded, along with the associated machine ID. Specifically, the
dataset contains 10K entries per machine (a total of 50K data), corresponding to 10K historical orders.</p>
      <p>The collected features for each order are: (i) Quantity: Number of pieces requested in a particular
order. Indicative values range from 100 to 1000 pieces, depending on the order type; (ii) Quality:
Paper quality requested in a particular order. Quality is a categorical variable that takes values
‘Velvet’, ‘Uncoated’ or ‘Illustration/Gloss’. Note that ‘Velvet’ is the most-frequently requested paper
quality (57%), followed by ‘Uncoated’ (26%) and ‘Gloss’ (17%); (iii) Color: Color requirements of a
particular order. Color is also a categorical variable that takes values ‘Color’ (typical 4-color printing,
88%), ‘Color+’ (4+1 color printing, 10%) or ‘B&amp;W’ (grayscale printing, 2%); (iv) Ink: Ink level
required for each piece of a particular order. Typical ink values vary between 0.1 to 1 gr; (v) Type:
The requested outcome type of a particular order. Type is a categorical variable with values ‘Book’
(30%), ‘Poster’ (30%) or ‘Journal’ (40%); (vi) Accuracy: The ratio between the accurately printed
pieces and the quantity of the order. Accuracy is a scalar variable ranging from 0 – 1 and reflects the
percentage of defected pieces in the order (1 corresponds to zero defected pieces).</p>
    </sec>
    <sec id="sec-4">
      <title>2.2. Machine Selection through Unsupervised Learning and Constraint</title>
    </sec>
    <sec id="sec-5">
      <title>Clustering</title>
      <p>The modeling process follows the basic assumption that one or multiple order features are
associated with enhanced accuracy levels. Given the variability in the number and shape of the
clusters resulted by each algorithm, we determined an objective function to stabilize the algorithms’
hyper-parameters (number of clusters for k-means and agglomerative clustering and minimum
number of points in a ε radius for DBSCAN). The Accuracy Discrimination Score (ASD) is used as
an objective function:
! = (max[!!] + !)/!, (1)</p>
      <p>!!!"
where ! is the number of clusters exceeding the accuracy threshold, ! is the within-cluster accuracy
score (90%) and ! is the silhouette score over the  clusters (ASD value is zero in case that ! =
0). The ASD targets to jointly maximize the within-cluster accuracy score, while also minimizing the
number of clusters that exhibit 90% accuracy levels, enabling the determination of
constraintclustering models.</p>
    </sec>
    <sec id="sec-6">
      <title>3. Simulation Results 3.1.</title>
    </sec>
    <sec id="sec-7">
      <title>Hyperparameter Tuning</title>
      <p>K-Means. The 10K dataset collected by each machine is provided to the algorithm using a varying
number of clusters k (1 to 100). K-means iteratively assigns the data samples to k clusters, targeting to
minimize the within cluster variance (6-dimensional squared distance between each sample and the
cluster centroid). Figure 1 shows the ADS relative to k for each individual machine. Evidently, ADS
is maximized for Machines 1-3 and 5 with relatively low number of total clusters k, while the dataset
obtained by Machine 4 requires a significant k=88 in order to identify at least one cluster with
accuracy level above 90%.</p>
      <p>Agglomerative Hierarchical Clustering. The hierarchical clustering algorithm initially considers
that each data sample forms its individual cluster. Then, depending on the distance between the data,
adjacent samples in the 6-dimensional space are iteratively grouped together until the defined number
of clusters k is reached. Similarly to the k-means algorithm, the ADS for varying k (1 to 100) is
shown in Figure 2, along with the number of clusters k exhibiting the maximum ADS value for each
machine.</p>
      <p>DBSCAN. The density-based clustering algorithm is suitable for more complex clusters, e.g. when
dense data areas are nested. The algorithm identifies core points in the data samples that are used to
establish clusters depending on the minimum number of neighbouring data points N in radius ε. For
this reason, the parameters N, ε are jointly varied in order to identify the optimal (N, ε) pair that
maximizes the ADS. Figure 3 depicts the ADS as a function of (N, ε) as surface plot for each
individual machine.</p>
    </sec>
    <sec id="sec-8">
      <title>Validation Results and Machine Labeling</title>
      <p>A validation dataset containing 100 unseen orders per machine, each one exhibiting accuracy
levels above 90%, was used to verify the performance of the pre-trained models. The performance
metric for each machine was calculated as the ratio between the number of validation samples
grouped within the best-accuracy cluster and the total number of validation samples. For
benchmarking purposes, Figure 4 depicts the performance of the three clusters along with a
roundrobin machine selection policy. All metrics are illustrated in relation to the ground-truth performance
(Relative Performance Gain - RPG), resulted by the existing machine selection policy (rule-based
approach, primarily exploiting specifications of the machines’ manufacturers).</p>
      <p>As evident from Figure 4, k-means outperforms the rest of the clustering models in Machine 1 and
3, implying that training datasets can be clustered following geometrically centroid-based criteria. On
the contrary, datasets from Machines 2 and 5 formed density-based groups to isolate the best-accuracy
clusters, thereby showing beneficial RPG for DBSCAN. Finally, Machine 4 did not reveal any
excessive RPG score, concluding that there are no gains in using clustering methods for proactive
machine selection. Note that, an RPG value of 1 denotes that a particular model performs equivalently
with the currently used policy.</p>
      <p>The presented ML clustering methods can be used to further analyze the features of the data
samples that form clusters with enhanced accuracy scores and determine the hidden order attributes
for each machine. To this end, a machine/feature labeling can be established for beneficial machine
selection policy (new orders are assigned to the printing machine showing suitable feature labels),
which in turn will contribute to the enhancement of the production efficiency, the minimization of
defected products and the reduction of the company’s environmental footprint.</p>
    </sec>
    <sec id="sec-9">
      <title>4. Acknowledgement</title>
      <p>This work has been partially supported by the project Offspring, under the open call of ZDMP
project, funded by the European Commission under Grant Agreement number 825631 through the
Horizon 2020 program.</p>
    </sec>
    <sec id="sec-10">
      <title>5. References</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>A.</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. T.</given-names>
            <surname>Michailidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Nomikos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Trakadas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Hatziefremidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Voliotis</surname>
          </string-name>
          , T. Zahariadis,
          <article-title>Tackling faults in the industry 4.0 era - a survey of machine-learning solutions and key aspects</article-title>
          ,
          <source>Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <article-title>109</article-title>
          . doi:
          <volume>10</volume>
          .3390/s20010109.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y. S.</given-names>
            <surname>Chuo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. H.</given-names>
            <surname>Mun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>I. W.</given-names>
            <surname>Noh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Rezvani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Kim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. W.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <article-title>Artificial intelligence enabled smart machining and machine tools</article-title>
          ,
          <source>Journal of Mechanical Science and Technology</source>
          <volume>36</volume>
          (
          <year>2022</year>
          )
          <fpage>1</fpage>
          -
          <lpage>23</lpage>
          . doi:
          <volume>10</volume>
          .1007/s12206-021-1201-0.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z. M.</given-names>
            <surname>Çınar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. Abdussalam</given-names>
            <surname>Nuhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zeeshan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Korhan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Asmael</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Safaei</surname>
          </string-name>
          ,
          <article-title>Machine learning in predictive maintenance towards sustainable smart manufacturing in Industry 4.0</article-title>
          ,
          <string-name>
            <surname>Sustainability</surname>
            <given-names>12</given-names>
          </string-name>
          (
          <year>2020</year>
          )
          <article-title>8211</article-title>
          . doi:
          <volume>10</volume>
          .3390/su12198211.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>P.</given-names>
            <surname>Trakadas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Simoens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gkonis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. P.</given-names>
            <surname>Ramallo-González</surname>
          </string-name>
          ,
          <string-name>
            <given-names>… P.</given-names>
            <surname>Karkazis</surname>
          </string-name>
          ,
          <article-title>An artificial intelligence-based collaboration approach in industrial IoT manufacturing: Key concepts, architectural extensions and potential applications</article-title>
          ,
          <source>Sensors</source>
          <volume>20</volume>
          (
          <year>2020</year>
          )
          <article-title>5480</article-title>
          . doi:
          <volume>10</volume>
          .3390/s20195480.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Angelopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. E.</given-names>
            <surname>Giannopoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. C.</given-names>
            <surname>Kapsalis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. T.</given-names>
            <surname>Spantideas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Sarakis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Voliotis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Trakadas</surname>
          </string-name>
          ,
          <article-title>Impact of Classifiers to Drift Detection Method: A Comparison</article-title>
          , in: International Conference on Engineering Applications of Neural Networks, Springer, Cham,
          <year>2021</year>
          , pp.
          <fpage>399</fpage>
          -
          <lpage>410</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>D. P.</given-names>
            <surname>Penumuru</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Muthuswamy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Karumbu</surname>
          </string-name>
          ,
          <article-title>Identification and classification of materials using machine vision and machine learning in the context of Industry 4.0</article-title>
          ,
          <source>Journal of Intelligent Manufacturing</source>
          <volume>31</volume>
          (
          <year>2020</year>
          )
          <fpage>1229</fpage>
          -
          <lpage>1241</lpage>
          . doi:
          <volume>10</volume>
          .1007/s10845-019-01508-6.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>O.</given-names>
            <surname>Silvén</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Niskanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Kauppinen</surname>
          </string-name>
          ,
          <article-title>Wood inspection with non-supervised clustering</article-title>
          ,
          <source>Machine Vision and Applications</source>
          <volume>13</volume>
          (
          <year>2003</year>
          )
          <fpage>275</fpage>
          -
          <lpage>285</lpage>
          . doi:
          <volume>10</volume>
          .1007/s00138-002-0084- z.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>