<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Extraction of Classification Rules from Sequences of Crystal Growth Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Radek Buša</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yann Dauxais</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Ecklebe</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Natasha Dropka</string-name>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martin Holenˇ a</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff5">5</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Faculty of Information Technology, Czech Technical University</institution>
          ,
          <addr-line>Thákurova 9, Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Institute of Computer Science, Czech Academy of Sciences</institution>
          ,
          <addr-line>Prague</addr-line>
          ,
          <country country="CZ">Czech Republic</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Institute of Control Theory, TU Dresden</institution>
          ,
          <addr-line>Georg-Schumann-Str. 7a, Dresden</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>KU Leuven</institution>
          ,
          <addr-line>Celestijnenlaan 200a, Leuven</addr-line>
          ,
          <country country="BE">Belgium</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Leibniz Institut für Kristalzüchtung</institution>
          ,
          <addr-line>Max-Born Str. 2, Berlin</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff5">
          <label>5</label>
          <institution>Leibniz Institute for Catalysis</institution>
          ,
          <addr-line>Albert-Einstein Str. 29a, Rostock</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The paper presents a generalization of a data mining method for the extraction of classification rules for classification of sequences of events, which is called discriminant chronicles mining. The generalization is motivated by the objective to extract classification rules from crystal growth data, for which the original method needs to be extended to events with vectors of attributes and to real-valued attributes. The paper elaborates incorporating both extensions into the theoretical fundamentals of the original method, and describes a corresponding modification of a system for discriminant chronicles mining, which has been developed three years ago to implement the original method. Finally, an application of the generalized method, using the modified system for discriminant chronicles mining, to data from the growth of GaAs crystals by vertical gradient freeze method is briefly sketched.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        This paper deals with data mining of crystal growth data,
obtained either experimentally or from simulations. Such
data records the crystal growth process, its performance,
and conditions in the melt, such as temperatures in various
control points or the power of heaters, or parameters of the
magnetic fields influencing melt convection [
        <xref ref-type="bibr" rid="ref6 ref7">6, 7</xref>
        ]. In
particular, we consider the common situation that the
performance data indicates whether the crystal growth process
can be classified as satisfactory according to a given
criteria , e.g., according to the shape or position of the
solid/liquid interface. Hence, the primary data mining approach to
that data is the extraction of classification rules.
      </p>
      <p>
        Although a plethora of methods for classification rules
extraction exist [
        <xref ref-type="bibr" rid="ref10 ref11">10, 11</xref>
        ], most of them cannot be used for
our data. The reason is that crystal growth proceeds
sequentially, hence, the data is inheretly sequential.
Therefore, we have chosen a specific rules extraction method
extracting classification rules for the classification of
sequences of events, which was proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. It is called
“discriminant chronicles mining” because it was originally
developed for events described with attributes conveying a
temporal meaning. However, it cannot be directly applied
to crystal growth data, due to the following two
restrictions:
(i) The events in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] are described with scalar values of
the temporal attribute. On the other hand, members of
sequences of crystal growth data, which we will for
simplicity also call events, are described with vectors
of attribute values.
(ii) The temporal attribute describing events in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has a
finite number of values, thus it can be represented
by a finite subset of integers. On the other hand,
attributes describing events in sequences of crystal
growth data are real-valued.
      </p>
      <p>
        Therefore, we have extended the method from [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] to
sequences of events described by real-valued vector
attributes. This extension is the main contribution of the
paper.
      </p>
      <p>
        The next section briefly recalls the original method
proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. Its extension removing the restrictions i and ii
above is described in Section 3. Finally, an application of
the proposed method to crystal growth data is sketched in
Section 4.
2
      </p>
    </sec>
    <sec id="sec-2">
      <title>Classification Rules Extraction with</title>
    </sec>
    <sec id="sec-3">
      <title>Discriminant Chronicles</title>
      <p>
        Let E be a finite set, the elements of which are called event
types, and let T be an arbitrary subset of the extended reals,
T R¯ . For a multiset of m event types, the rather unusual
notation ffe1; : : : ; emgg has been introduced in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], and a
couple (e; t) 2 E T is called event.
      </p>
      <p>
        Assume further that some ordering is imposed to the
event types in the application domain. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], where
temporal relationships are investigated, a total ordering
corresponding to their order of occurrence is considered. For
the rules extraction method proposed in Section 3),
however, the weaker concept of a partial ordering will be
sufficient, with a semantics tailored to a partiucular
application (see Section 4 for the real-world application
considered in this paper). For e1; e2 2 E, t ; t+ 2 R¯ such
that e1 e2; t t+, a temporal constraint is a tuple
(e1; e2; t ; t+), also denoted e1[t ; t+]e2. The semantics
of such a temporal constraint is as follows: the difference
between the timestamps t2 of an event (e2; t2) of type e2
and the timestamp t1 of an event (e2; t2) of type e2
fulfills t t2 t1 t+. A temporal constraint e1[t ; t+]e2
is called satisfied by a couple of events ((e; t); (e0; t0)) if
e1 = e; e2 = e0 and t0 t 2 [t ; t+]. Because constraining
two events to occur in a fixed interval duration is too strict
for most applications, the simpliest way to represent
temporal constraints is by using duration intervals. These
intervals can be interpreted as two constraints defining the
lowest and highest accepted duration, respectively.
      </p>
      <p>
        Using a set E of event types and a set T of
temporal constraints, two complementary concepts can be
introduced:
(i) If we are interested in finding events that pairwise
satisfy a given set T of temporal contraints, then
the concept of a simple temporal constraint network
[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], alternatively called simple temporal problem [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
is useful, which can be defined as the triple
(E; ; T ); whereT is a set of temporal constraints
e[l; u]e0; such that e; e0 2 E; e
e0: (1)
(ii) If we are interested in mining temporal constraints
from given sequences of events, then the concept of a
chronicle [
        <xref ref-type="bibr" rid="ref3 ref9">3, 9</xref>
        ] is useful, which can be defined as the
couple
      </p>
      <p>
        (E ; T ); where E = ffe1; : : : ; emgg; ei 2 E and
T is a set of temporal constraints e[l; u]e0 such that
(9i; j = 1; : : : ; m)i 6= j &amp; ei
e j &amp; e = ei &amp; e0 = e jg:
(2)
A chronicle is a temporal extension of episodes or
partial orders introduced in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], a type of pattern
dedicated to summarize sequential data. Chronicles have
proven their usefulness in applications where the
temporal dimension is mandatory to differentiate two
different behaviors. The first application of them is on
alarm log data in [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] where the temporal distance
between two alarm events is very important.
      </p>
      <p>Observe that the constraint e[ ¥; ¥]e0 holds for any
e; e0 2 E, meaning that this constraint actually does
not constrain anything. In case that no constraint
from T constrains anything, the set T will be
denoted T¥. Hence, (E ; T¥) is a chronicle for the set of
temporal constraints T¥ of which, it is only required:
(9i; j = 1; : : : ; m)i 6= j &amp; ei
e j &amp; e = ei &amp; e0 = e j:
(3)</p>
      <p>Because the research reported in this paper concerns
data mining, it relies on the latter concept, as well as on
several additional concepts concerning chronicles.</p>
      <p>Let m; n 2 N; 2 m n;C = (ffe1; : : : ; emgg; T ) be a
chronicle and s = ((e1; t1); : : : ; (en; tn)); n 2, be a
sequence of events. An occurence of C in s is a subsequence
s˜ = ((e f (1); t f (1)); : : : ; (e f (m); t f (m))) of s such that:
(i) f : mˆ ! nˆ is an injective function;
(ii) e0i = e f (i); i = 1; : : : ; m;
(iii) if i 6= j and e0i e0j, then t f ( j) t f (i) 2 [a; b], where
e0i[a; b]e0j 2 T .</p>
      <p>We say that C occurs in s if there exists at least one
occurrence of C in s.</p>
      <p>Let further S be a set of sequences. The support of a
chronicle C in S is the number of sequences from S in
which it occurs:</p>
      <p>supp(C; S) = #fs 2 SjC occurs in sg:
If
(4)
(5)
(6)
(7)
for a given smin &gt; 0 or equivalently
supp(C; S)</p>
      <p>smin
supp(C; S)
#S
fmin
for a given fmin = s#mSin , then C is called frequent in S on
the level fmin.</p>
      <p>Finally, let S+ and S be two disjoint sets of sequences.
The growth rate of C for S+ with respect to S is defined:
g(C; S) =
( supp(C;S+)
supp(C;S )
+¥
if supp(C; S ) &gt; 0
if supp(C; S ) = 0:</p>
      <p>If C is frequent and g(C; S) gmin for a given minimal
growth rate gmin 1, then C is called discriminant for S+
with respect to S on the level gmin.</p>
      <p>
        The sequence sets S+ and S can be viewed as two
classes of their union S = S+ [ S . Hence, the
algorithm DCM for discriminant chronicles mining presented
in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] is actually a sophisticated algorithm for extraction of
classification rules. Before searching frequent chronicles
satisfying some temporal constraints, it searches frequent
chronicles with the set of temporal constraints T¥, which
is equivalent to the extraction of classification rules
without temporal constraints. To this end, any rules extraction
algorithm can be used. In the implementation of DCM in
[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], the algorithm Ripper, based on the minimal descrition
length principle, [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] has been employed.
3
3.1
      </p>
    </sec>
    <sec id="sec-4">
      <title>Proposed Rules Extraction Method</title>
      <sec id="sec-4-1">
        <title>Discriminant Multi-dimensional Chronicles</title>
        <p>Rd , with bi
Let d; n 2 N; d; n 2. For a set E of event types
with a partial ordering , T R¯ d and a label set L =
f+; g, an event is a couple (e; t), where e 2 E; t 2 T
and a labelled sequence of events is defined as a tuple
(SID; (e1; t1); : : : ; (en; tn); L), where SID 2 N is a sequence
index, unique among all considered labelled sequences of
events, (e1; t1); : : : ; (en; tn) are events, and L 2 L.</p>
        <p>Let ~a = (a1; : : : ; ad );~b = (b1; : : : ; bd );~c = (c1; : : : ; cd ) 2
ai; i = 1; : : : ; d, and R(~a;~b) = [a1; b1]
[a2; b2] : : : [ad ; bd ]. The relation ~c = R(~a;~b) ()
8 i 2 dˆ : ci 2 [ai; bi] will be called hyperrectangle test.
A hyperrectangle constraint is a tuple (e1; e2; ~t1; ~t2), also
denoted as e1[[~t1;~t2]]e2 where e1; e2 2 E and ~t1; ~t2 2 R¯ d .
A hyperrectangle constraint e1[[~t1; ~t2]]e2 is said to be
satisfied by a couple of events ((e;~t); (e0;~t0)) if and only if
e = e1 &amp; e0 = e2 &amp;~t0 ~t = R(~t1; ~t2).</p>
        <p>A multi-dimensional chronicle is a couple (E ; T ) such
that E = ffe1; e2; : : : ; engg, ei 2 E, i 2 nˆ is a multiset
of event types and T = mfe1[[~t1; ~t2]]e2je1; e2 2 E ; e1 e2g
is a set of hyperrectangle constraints. If in particular all its
constraints are e[[( ¥; : : : ; ¥); (¥; : : : ; ¥)]]e0, i.e., they
don’t constraint anything, then this T is again denoted
T¥:</p>
        <p>T¥ = fe[[( ¥; : : : ; ¥); (¥; : : : ; ¥)]]e0j
(9i; j 2 mˆ ) i 6= j &amp; ei</p>
        <p>e j &amp; e = ei &amp; e0 = e jg: (8)</p>
        <p>Let s = ((e1;~t1); : : : ; (en;~tn)) be a sequence of events,
m 2 nˆ and C = (E = ffe01; e02; : : : ; e0mgg; T ) be a
multidimensional chronicle. An occurrence of the
multidimensional chronicle C in s is a subsequence s˜ =
((e f (1);~t f (1)); (e f (2);~t f (2)); : : : ; (e f (m);~t f (m))), such that f :
mˆ 7 ! nˆ is an injective function, 8i : e0i = e f (i), and if i 6= j,
then~t f ( j) ~t f (i) = R(~a;~b) where e0i[[~a;~b]]e0j 2 T . A
multidimensional chronicle C is said to occur in sequence s if
there exists at least one occurrence of C in s.</p>
        <p>The support of a multi-dimensional chronicle C in a
sequence set S is again defined by 2 like for chronicles in
Section 2. Finally, also the definition of frequent
chronicles and chronicles discriminant for one set of sequences
with respect to another transfers to multi-dimensional
chronicles.
3.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Discriminant Multi-dimensional Chronicles</title>
      </sec>
      <sec id="sec-4-3">
        <title>Mining</title>
        <p>
          The DCM-MD algorithm illustrated in Listing 1 is a
modification of the DCM algorithm for discriminant chronicles
mining proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The main aspects of the
modification are the data model (substituting scalar integer
values for vectors of real numbers) and a new discriminant
hyperrectangle constraints mining algorithm (a
substitution of an algorithm used for discriminant temporal
constraints mining proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]). It operates with
multidimensional input data and multi-dimensional
chronicles, mining an incomplete set of discriminant
multidimensional chronicles, determined by user-supplied
argument values fmin (in the pseudocode as fmin) and gmin
(in the pseudocode as gmin).
        </p>
        <p>The branching statement in Listing 1 containing the
condition
supp(S+,{m,tinf}) &gt; (gmin*supp(S-,{m,tinf}))
is used to check whether given frequent multiset without
further specific hyperrectangle constraints is discriminant
(in the pseudocode, T¥ is represented by the tinf
symbol). If the given condition is true, no discriminant
temporal constraints are mined using the extractDC(...)
function.</p>
        <p>DCM-MD(S+, S-, fmin, gmin):
M := extractMultiSet(S+,fmin). // M is a set of
// frequent multisets
C := emptySet(). // C is a set of resulting
// discriminant multi-dimensional
// chronicles
for (m of M):
if supp(S+,{m,tinf}) &gt; (gmin * supp(S-,{m,tinf})):
C.add({m,tinf}). // adds a discriminant chronicle
// without temporal constraints
else:
for t of extractDC(S+,S-,m,fmin,gmin):</p>
        <p>C.add({m,t}). // adds a discriminant chronicle
// with temporal constraints
return C.</p>
        <p>Listing 1: DCM-MD pseudocode</p>
        <p>The extractMultiSet(...) function extracts a set
of frequent multisets from a given sequence set and
usersupplied minimal support threshold ( fmin). It applies a
regular frequent itemset mining algorithm where an event
type a 2 E occurring n times in a sequence is encoded
by n items I1a; I2a; : : : ; Ina. An intermediate frequent itemset
e
of size m denoted as (Iikk )1 k m is extracted from the
supplied sequence set and is further transformed into the
resulting multiset. The last phase of the algorithm
incorpoe
rates converting each frequent itemset (Iikk )1 k m to a
multiset containing mutually different events ek; k = 1; : : : ; m,
each of them exactly ik times.</p>
        <p>
          The extractDC(...) function is used to mine
discriminant hyperrectangle constraints from a given frequent
multiset E = ffa1; a2; : : : ; angg, disjoint sequence sets S+
and S , and with user-defined parameters fmin and gmin.
Exact conceptual and implementation details regarding the
extraction of discriminant hyperrectangle constraints are
further elaborated in [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
4
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Application to Crystal Growth Data</title>
      <p>The need for affordable high quality semiconducting
crystals such as gallium arsenide GaAs is continuously
increasing, particularly for the electronic and photovoltaic
applications. Despite GaAs has a number of outstanding
physical properties, its production is hampered by
challenging processes control due to high melting
temperatures (1238 C) and chemically-aggressive environment.
Particularly in-situ measurements of the process variables
(e.g. temperatures, velocities, concentrations etc.) in the
GaAs have high contamination potential and lead to the
low crystal quality. Moreover, in-situ visual observations
of the crystal growth are not possible. Prediction of the
position of the crystallization front, i.e. length of the grown
crystal after usage of certain growth recipe (i.e. temporal
profiles of a power of heaters) is a key information for the
process monitoring.</p>
      <p>Here, we considered Vertical Growth Freeze (VGF)
method for the growth of GaAs crystals. VGF growth
method involves the progressive freezing of the lower end
of a melt upward by moving the desired temperature
gradient in a furnace via temporal change of heating power.
1-dimensional model of VGF-GaAs growth is shown in
Figure 1.
4.1</p>
      <sec id="sec-5-1">
        <title>Used Data</title>
        <p>
          The above described implementation extending the
method proposed in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] has been applied to data gathered in
the German Research Foundation (DFG) project
“Modelbased control and regulation of the VGF crystal growth
process using distributed parametric methods”. The data
records the position of the solid/liquid interface of GaAs
crystals grown by the vertical gradient freeze (VGF)
method, which involves progressive freezing of the lower end
of a melt upward by moving the temperature gradient in
a furnace, together with the evolution of temperatures in
0th–4th quarter of the GaAs height. They have been
obtained by solving the inverse problem for a simplified one
dimensional model of the VGF process for different
desired growth rates as described in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ], using as input the
evolution of 2-dimensional vectors describing the heat flux
in and heat flux out (Figure 1). All simulations were
performed for 100 times, among which the 5th, 10th, . . . ,
95th, 100th time will in the following serve as milestone
times.
        </p>
        <p>
          For an application of the method presented in Section 3,
event types and events have been defined as follows. The
2-dimensional inputs of the 500 numeric simulations
underlying [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] have been clustered into k = 20 clusters using
the Matlab implementation of the standard k-means
clustering algorithm. The centers of the resulting clusters are
listed in Table 1. An event type is now the fact that the
input belongs to a particular cluster. For each numeric
simulation, an event type is recorded at every milestone
time. Consequently, the size of any multiset of event types
from one numeric simulation is at most 20. . An event
is a pair (e; T ), where e is an event type and T 2 R5 is a
vector of temperatures obtained in the numeric simulation
and at the milestone time when e was recorded, provided
the position of the solid/liquid interface at the end of that
simuation was at least 17.25 cm. There were 255 such
simulations available, thus we have 255 event sequences
of length 20, due to the 20 milestone times. They were
divided into two disjoint sequence sets as follows:
S+ = f((e1; T1); : : : ; (e20; T20)jei; Ti; i = 1; : : : ; 20;
originated in a simulation ending with the position
of the solid/liquid interface &gt;25 cm)g
(9)
The experimental setup aimed at a chronicle set
containing about 20-30 elements and including both chronicles
discriminant for S+ with respect to S and chronicles
discriminant for S with respect to S+. Each
chronicle (E ; T ) 2 C should contain only a minimal number
of T¥ constraints.
        </p>
        <p>Assume that C = (E ; T ) is a chronicle, C is a set
of chronicles and ts 2 [0; 1]. Chronicle specificity denoted
as s(C) is defined as:
s(C) =
#fe[[t; t0]]e0 2 T je[[t; t0]]e0 62 T¥g :</p>
        <p>#T</p>
        <p>Chronicle set C specific for a specificity threshold ts
denoted as s(C; ts) is defined as s(C; ts) = fCjC 2 C &amp; s(C)
tsg.</p>
        <p>The metrics used for evaluating the convenience of
parameters passed to the DC-PBC component are described
in the rest of this paragraph. #M is the size of the set of
frequent multisets set as introduced in the pseudocode of the
DCM-MD algorithm in Listing 1. #E is the count of
distinct frequent multisets which occurred in some
discriminant chronicle of the resulting chronicle set C:
#E = #fE j(9T – a set of</p>
        <p>hyperrecrtangle constraints)(E ; T ) 2 Cg:
maxs(C) = maxfs(C)jC 2 Cg is the maximal
specificity value found among the chronicles in C. #s(C; ts)
is the count of chronicles specific for ts found in C.</p>
        <p>The following parameters were tuned: fmin
implemented by the --fmin parameter representing minimal
support threshold. gmin implemented by the --gmin
parameter representing the minimal growth rate threshold
parameter of the DCM-MD algorithm as introduced in
Listing 1. min(#E ) implemented by the --mincs
parameter representing minimal chronicle event multiset size.
max(#E ) implemented by the --maxcs parameter
representing maximal chronicle event multiset size. ts
representing the specificity threshold for a custom tool implemented
for extracting specific discriminant chronicles from a set of
discriminant chronicles.</p>
        <p>After evaluating the metrics for each parameter
tuning step, the argument values fmin = 0:1, gmin = 5000,
min(#E ) = 2, max(#E ) = 5, ts = 0:7 proved sufficient for
retrieving a set of specific discriminant chronicles with the
desired properties.
4.3</p>
      </sec>
      <sec id="sec-5-2">
        <title>Examples of Extracted Rules</title>
        <p>The implementation of generalized DC-PBC available
at github.com/busarade-itat/md-dc-pbc was
invoked for the data described above with argument values
--mincs 2, --maxcs 5, --fmin 0.1, --gmin 5000.</p>
        <p>The resulting set of discriminant chronicles was
afterwards filtered to include only specific discriminant
chronicles. To this end, a tool chronicle_statgen available
at github.com/busarade-itat was invoked with
arguments --minspec 0.7, --vecsize 5.</p>
        <p>The final result is presented in Tables 2 and 3, counting
a total of 26 specific discriminant chronicles – 18 of them
discriminant for S+ with respect to S , the remaining 8
discriminant for S with respect to S+.</p>
        <p>Proposed method enables prediction of the conditions
for reaching targeted crystal length by following the
differences among segments in temporal profiles of
temperatures in characteristic points in the GaAs. If the same
approach is further applied on the experimental
temperature profiles measured by thermocouples in heaters
(outside of the melt and crystal) as in real experiments, it will
be possible to determine moment of reaching desired
crystal length without visual observations and GaAs
contamination. From that moment on, crystal growth process step
terminates and cooling down of the furnace starts. Such
accurate prediction of the end of solidification step will be
very beneficial for the process economy and the final
crystal quality.
fe1[[( 30:8; 30:8; 30:9; 31:0; 31:0); ( 30:1; 30:2; 30:2; 30:3; 30:3)]]e2g</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion</title>
      <p>
        The paper has presented a generalization of the method
for discriminant chronicles mining proposed in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. This
generalization has been motivated by the objective to
extract classification rules from crystal growth data,
bringing two additional problems not pertaining to the data to
which the original method had been applied: the events
are described with a vector of attributes instead of a
single scalar attribute, and the attributes are real-valued
instead of integer-valued. The theoretical fundamentals of
the method in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] have been extended to tackle those two
problems and the system for discriminant chronicles
mining based on [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] has been adapted to accomodate those
extensions, together with some additional implementation
improvements such as refactoring. As a proof of concept
of the presented generalization, it has been applied, using
the modified system, to real-world data with events
characterizing the heat fluxes for the growth of GaAs crystals
by vertical gradient freeze method, and with a vector of 5
attributes recording the temperatures in different heights.
Although most of the hyperrectangles in Tables 2 and 3 are
not very restrictive, the extracted classification rules
nevertheless show that the proposed approach allows to
assess whether the grown crystal will have a desired length
based solely on the temperature profiles. Regarding
future research, it would be interesting to assess how small
changes to the mined hyperrectangle constraints affect the
manufacturing process of the VGF-GaAs crystals.
      </p>
      <sec id="sec-6-1">
        <title>Acknowledgement</title>
        <p>The research reported in this paper has been supported by
the Czech Science Foundation (GACˇ R) grant 18-18080S.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Radek</given-names>
            <surname>Buša</surname>
          </string-name>
          .
          <article-title>Implementation of a generalized version of a system for discriminant chronicles mining</article-title>
          . Czech Technical University in Prague. Computing and
          <string-name>
            <given-names>Information</given-names>
            <surname>Centre</surname>
          </string-name>
          .,
          <string-name>
            <surname>Cham</surname>
          </string-name>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>W.W.</given-names>
            <surname>Cohen</surname>
          </string-name>
          .
          <article-title>Fast effective rule induction</article-title>
          . pages
          <fpage>115</fpage>
          -
          <lpage>123</lpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Yann</given-names>
            <surname>Dauxais</surname>
          </string-name>
          , Thomas Guyet, David Gross-Amblard, and
          <string-name>
            <given-names>André</given-names>
            <surname>Happe</surname>
          </string-name>
          .
          <source>Discriminant chronicles mining</source>
          .
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>R.</given-names>
            <surname>Dechter</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Meiri</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.</given-names>
            <surname>Pearl</surname>
          </string-name>
          .
          <article-title>Temporal constraint networks</article-title>
          .
          <source>Artifical Intelligence</source>
          ,
          <volume>49</volume>
          :
          <fpage>61</fpage>
          -
          <lpage>95</lpage>
          ,
          <year>1991</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Christophe</given-names>
            <surname>Dousson</surname>
          </string-name>
          and
          <article-title>Thang Vu Duong</article-title>
          .
          <article-title>Discovering chronicles with numerical time constraints from alarm logs for monitoring dynamic systems</article-title>
          .
          <source>In IJCAI</source>
          , pages
          <fpage>620</fpage>
          -
          <lpage>626</lpage>
          ,
          <year>1999</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dropka</surname>
          </string-name>
          and
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Holenˇa. Optimization of magnetically driven directional solidification of silicon using artificial neural networks and Gaussian process models</article-title>
          .
          <source>Journal of Crystal Growth</source>
          ,
          <volume>471</volume>
          :
          <fpage>53</fpage>
          -
          <lpage>61</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>N.</given-names>
            <surname>Dropka</surname>
          </string-name>
          , M. Holenˇa, S. Ecklebe,
          <string-name>
            <given-names>C.</given-names>
            <surname>Frank-Rotsch</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>J.</given-names>
            <surname>Winkler</surname>
          </string-name>
          .
          <article-title>Fast forecasting of VGF crystal growth process by dynamic neural networks</article-title>
          .
          <source>Journal of Crystal Growth</source>
          ,
          <volume>521</volume>
          :
          <fpage>9</fpage>
          -
          <lpage>14</lpage>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Gemma</surname>
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Garriga</surname>
          </string-name>
          .
          <article-title>Summarizing sequential data with closed partial orders</article-title>
          .
          <source>In SDM</source>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>M.</given-names>
            <surname>Ghallab</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nau</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P.</given-names>
            <surname>Traverso</surname>
          </string-name>
          .
          <source>Automated Planning and Acting</source>
          . Cambridge University Press, Cambridge,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>D.J.</given-names>
            <surname>Hand</surname>
          </string-name>
          .
          <article-title>Construction and Assessment of Classification Rules</article-title>
          . John Wiley and Sons, New York,
          <year>1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>M. Holenˇa</surname>
            , P. Pulc, and
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Kopp</surname>
          </string-name>
          .
          <article-title>Classification Methods for Internet Applications</article-title>
          . Springer,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>H.C.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Ou</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Sim</surname>
          </string-name>
          .
          <article-title>Robust temporal constraint networks</article-title>
          .
          <source>In International Conference on Tools with Artificial Intelligence</source>
          , pages
          <fpage>82</fpage>
          -
          <lpage>88</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>