Extraction of Classification Rules from Sequences of Crystal Growth Data Radek Buša1 , Yann Dauxais2 , Stefan Ecklebe3 , Natasha Dropka4 , Martin Holeňa5,6 1 Faculty of Information Technology, Czech Technical University, Thákurova 9, Prague, Czech Republic 2 KU Leuven, Celestijnenlaan 200a, Leuven, Belgium 3 Institute of Control Theory, TU Dresden, Georg-Schumann-Str. 7a, Dresden, Germany 4 Leibniz Institut für Kristalzüchtung, Max-Born Str. 2, Berlin, Germany 5 Institute of Computer Science, Czech Academy of Sciences, Prague, Czech Republic 6 Leibniz Institute for Catalysis, Albert-Einstein Str. 29a, Rostock, Germany Abstract: The paper presents a generalization of a data to crystal growth data, due to the following two restric- mining method for the extraction of classification rules for tions: classification of sequences of events, which is called dis- (i) The events in [3] are described with scalar values of criminant chronicles mining. The generalization is moti- the temporal attribute. On the other hand, members of vated by the objective to extract classification rules from sequences of crystal growth data, which we will for crystal growth data, for which the original method needs simplicity also call events, are described with vectors to be extended to events with vectors of attributes and to of attribute values. real-valued attributes. The paper elaborates incorporat- (ii) The temporal attribute describing events in [3] has a ing both extensions into the theoretical fundamentals of finite number of values, thus it can be represented the original method, and describes a corresponding mod- by a finite subset of integers. On the other hand, ification of a system for discriminant chronicles mining, attributes describing events in sequences of crystal which has been developed three years ago to implement growth data are real-valued. the original method. Finally, an application of the gener- alized method, using the modified system for discriminant Therefore, we have extended the method from [3] to chronicles mining, to data from the growth of GaAs crys- sequences of events described by real-valued vector at- tals by vertical gradient freeze method is briefly sketched. tributes. This extension is the main contribution of the paper. The next section briefly recalls the original method pro- 1 Introduction posed in [3]. Its extension removing the restrictions i and ii above is described in Section 3. Finally, an application of This paper deals with data mining of crystal growth data, the proposed method to crystal growth data is sketched in obtained either experimentally or from simulations. Such Section 4. data records the crystal growth process, its performance, and conditions in the melt, such as temperatures in various control points or the power of heaters, or parameters of the 2 Classification Rules Extraction with magnetic fields influencing melt convection [6, 7]. In par- Discriminant Chronicles ticular, we consider the common situation that the perfor- mance data indicates whether the crystal growth process Let E be a finite set, the elements of which are called event can be classified as satisfactory according to a given crite- types, and let T be an arbitrary subset of the extended reals, ria , e.g., according to the shape or position of the solid/liq- T ⊂ R̄. For a multiset of m event types, the rather unusual uid interface. Hence, the primary data mining approach to notation {{e1 , . . . , em }} has been introduced in [3], and a that data is the extraction of classification rules. couple (e,t) ∈ E × T is called event. Although a plethora of methods for classification rules Assume further that some ordering is imposed to the extraction exist [10, 11], most of them cannot be used for event types in the application domain. In [3], where tem- our data. The reason is that crystal growth proceeds se- poral relationships are investigated, a total ordering corre- quentially, hence, the data is inheretly sequential. There- sponding to their order of occurrence is considered. For fore, we have chosen a specific rules extraction method the rules extraction method proposed in Section 3), how- extracting classification rules for the classification of se- ever, the weaker concept of a partial ordering ≺ will be quences of events, which was proposed in [3]. It is called sufficient, with a semantics tailored to a partiucular appli- “discriminant chronicles mining” because it was originally cation (see Section 4 for the real-world application con- developed for events described with attributes conveying a sidered in this paper). For e1 , e2 ∈ E, t − ,t + ∈ R̄ such temporal meaning. However, it cannot be directly applied that e1 ≺ e2 ,t − ≤ t + , a temporal constraint is a tuple Copyright c 2020 for this paper by its authors. Use permitted un- (e1 , e2 ,t − ,t + ), also denoted e1 [t − ,t + ]e2 . The semantics der Creative Commons License Attribution 4.0 International (CC BY of such a temporal constraint is as follows: the difference 4.0). between the timestamps t2 of an event (e2 ,t2 ) of type e2 and the timestamp t1 of an event (e2 ,t2 ) of type e2 ful- (ii) e0i = e f (i) , i = 1, . . . , m; fills t − ≤ t2 − t1 ≤ t + . A temporal constraint e1 [t − ,t + ]e2 (iii) if i 6= j and e0i ≺ e0j , then t f ( j) − t f (i) ∈ [a, b], where is called satisfied by a couple of events ((e,t), (e0 ,t 0 )) if e0i [a, b]e0j ∈ T . e1 = e, e2 = e0 and t 0 − t ∈ [t − ,t + ]. Because constraining We say that C occurs in s if there exists at least one two events to occur in a fixed interval duration is too strict occurrence of C in s. for most applications, the simpliest way to represent tem- Let further S be a set of sequences. The support of a poral constraints is by using duration intervals. These in- chronicle C in S is the number of sequences from S in tervals can be interpreted as two constraints defining the which it occurs: lowest and highest accepted duration, respectively. Using a set E of event types and a set T of tempo- ral constraints, two complementary concepts can be intro- supp(C, S) = #{s ∈ S|C occurs in s}. (4) duced: If (i) If we are interested in finding events that pairwise satisfy a given set T of temporal contraints, then supp(C, S) ≥ σmin (5) the concept of a simple temporal constraint network [12], alternatively called simple temporal problem [4] for a given σmin > 0 or equivalently is useful, which can be defined as the triple supp(C, S) ≥ fmin (6) (E, ≺, T ), whereT is a set of temporal constraints #S e[l, u]e0 , such that e, e0 ∈ E, e ≺ e0 . (1) for a given fmin = σ#Smin , then C is called frequent in S on the level fmin . (ii) If we are interested in mining temporal constraints Finally, let S+ and S− be two disjoint sets of sequences. from given sequences of events, then the concept of a The growth rate of C for S+ with respect to S− is defined: chronicle [3, 9] is useful, which can be defined as the supp(C,S+ ) ( couple − if supp(C, S− ) > 0 g(C, S) = supp(C,S ) (7) +∞ if supp(C, S− ) = 0. (E , T ), where E = {{e1 , . . . , em }}, ei ∈ E and T is a set of temporal constraints e[l, u]e0 such that If C is frequent and g(C, S) ≥ gmin for a given minimal (∃i, j = 1, . . . , m)i 6= j & ei ≺ e j & e = ei & e0 = e j }. growth rate gmin ≥ 1, then C is called discriminant for S+ (2) with respect to S− on the level gmin . The sequence sets S+ and S− can be viewed as two A chronicle is a temporal extension of episodes or classes of their union S = S+ ∪ S− . Hence, the algo- partial orders introduced in [8], a type of pattern ded- rithm DCM for discriminant chronicles mining presented icated to summarize sequential data. Chronicles have in [3] is actually a sophisticated algorithm for extraction of proven their usefulness in applications where the tem- classification rules. Before searching frequent chronicles poral dimension is mandatory to differentiate two dif- satisfying some temporal constraints, it searches frequent ferent behaviors. The first application of them is on chronicles with the set of temporal constraints T∞ , which alarm log data in [5] where the temporal distance be- is equivalent to the extraction of classification rules with- tween two alarm events is very important. out temporal constraints. To this end, any rules extraction Observe that the constraint e[−∞, ∞]e0 holds for any algorithm can be used. In the implementation of DCM in e, e0 ∈ E, meaning that this constraint actually does [3], the algorithm Ripper, based on the minimal descrition not constrain anything. In case that no constraint length principle, [2] has been employed. from T constrains anything, the set T will be de- noted T∞ . Hence, (E , T∞ ) is a chronicle for the set of temporal constraints T∞ of which, it is only required: 3 Proposed Rules Extraction Method (∃i, j = 1, . . . , m)i 6= j & ei ≺ e j & e = ei & e0 = e j . 3.1 Discriminant Multi-dimensional Chronicles (3) Let d, n ∈ N, d, n ≥ 2. For a set E of event types Because the research reported in this paper concerns with a partial ordering ≺, T ⊂ R̄d and a label set L = data mining, it relies on the latter concept, as well as on {+, −}, an event is a couple (e,t), where e ∈ E,t ∈ T several additional concepts concerning chronicles. and a labelled sequence of events is defined as a tuple Let m, n ∈ N, 2 ≤ m ≤ n,C = ({{e1 , . . . , em }}, T ) be a (SID, (e1 ,t1 ), . . . , (en ,tn ), L), where SID ∈ N is a sequence chronicle and s = ((e1 ,t1 ), . . . , (en ,tn )), n ≥ 2, be a se- index, unique among all considered labelled sequences of quence of events. An occurence of C in s is a subsequence events, (e1 ,t1 ), . . . , (en ,tn ) are events, and L ∈ L. s̃ = ((e f (1) ,t f (1) ), . . . , (e f (m) ,t f (m) )) of s such that: Let ~a = (a1 , . . . , ad ),~b = (b1 , . . . , bd ),~c = (c1 , . . . , cd ) ∈ d (i) f : m̂ → n̂ is an injective function; R , with bi ≥ ai , i = 1, . . . , d, and R(~a,~b) = [a1 , b1 ] × = R(~a,~b) ⇐⇒ [a2 , b2 ] × . . . × [ad , bd ]. The relation ~c ⊂ (in the pseudocode, T∞ is represented by the tinf sym- ∀ i ∈ dˆ : ci ∈ [ai , bi ] will be called hyperrectangle test. bol). If the given condition is true, no discriminant tem- A hyperrectangle constraint is a tuple (e1 , e2 ,~t1 ,~t2 ), also poral constraints are mined using the extractDC(...) denoted as e1 [[~t1 ,~t2 ]]e2 where e1 , e2 ∈ E and ~t1 ,~t2 ∈ R̄d . function. A hyperrectangle constraint e1 [[~t1 ,~t2 ]]e2 is said to be sat- DCM-MD(S+, S-, fmin, gmin): isfied by a couple of events ((e,~t), (e0 ,~t 0 )) if and only if M := extractMultiSet(S+,fmin). // M is a set of e = e1 & e0 = e2 & ~t 0 −~t ⊂ = R(~t1 ,~t2 ). // frequent multisets C := emptySet(). // C is a set of resulting A multi-dimensional chronicle is a couple (E , T ) such // discriminant multi-dimensional that E = {{e1 , e2 , . . . , en }}, ei ∈ E, i ∈ n̂ is a multiset // chronicles of event types and T = m{e1 [[~t1 ,~t2 ]]e2 |e1 , e2 ∈ E , e1 ≺ e2 } is a set of hyperrectangle constraints. If in particular all its for (m of M): if supp(S+,{m,tinf}) > (gmin * supp(S-,{m,tinf})): constraints are e[[(−∞, . . . , −∞), (∞, . . . , ∞)]]e0 , i.e., they C.add({m,tinf}). // adds a discriminant chronicle don’t constraint anything, then this T is again denoted // without temporal constraints T∞ : else: for t of extractDC(S+,S-,m,fmin,gmin): C.add({m,t}). // adds a discriminant chronicle T∞ = {e[[(−∞, . . . , −∞), (∞, . . . , ∞)]]e0 | // with temporal constraints (∃i, j ∈ m̂) i 6= j & ei ≺ e j & e = ei & e0 = e j }. (8) return C. Let s = ((e1 ,~t1 ), . . . , (en ,~tn )) be a sequence of events, Listing 1: DCM-MD pseudocode m ∈ n̂ and C = (E = {{e01 , e02 , . . . , e0m }}, T ) be a multi- dimensional chronicle. An occurrence of the multi- The extractMultiSet(...) function extracts a set dimensional chronicle C in s is a subsequence s̃ = of frequent multisets from a given sequence set and user- ((e f (1) ,~t f (1) ), (e f (2) ,~t f (2) ), . . . , (e f (m) ,~t f (m) )), such that f : supplied minimal support threshold ( fmin ). It applies a reg- m̂ 7−→ n̂ is an injective function, ∀i : e0i = e f (i) , and if i 6= j, ular frequent itemset mining algorithm where an event then~t f ( j) −~t f (i) ⊂ = R(~a,~b) where e0i [[~a,~b]]e0j ∈ T . A multi- type a ∈ E occurring n times in a sequence is encoded dimensional chronicle C is said to occur in sequence s if by n items I1a , I2a , . . . , Ina . An intermediate frequent itemset e there exists at least one occurrence of C in s. of size m denoted as (Iikk )1≤k≤m is extracted from the sup- The support of a multi-dimensional chronicle C in a se- plied sequence set and is further transformed into the re- quence set S is again defined by 2 like for chronicles in sulting multiset. The last phase of the algorithm incorpo- e Section 2. Finally, also the definition of frequent chroni- rates converting each frequent itemset (Iikk )1≤k≤m to a mul- cles and chronicles discriminant for one set of sequences tiset containing mutually different events ek , k = 1, . . . , m, with respect to another transfers to multi-dimensional each of them exactly ik times. chronicles. The extractDC(...) function is used to mine dis- criminant hyperrectangle constraints from a given frequent multiset E = {{a1 , a2 , . . . , an }}, disjoint sequence sets S+ 3.2 Discriminant Multi-dimensional Chronicles and S− , and with user-defined parameters fmin and gmin . Mining Exact conceptual and implementation details regarding the The DCM-MD algorithm illustrated in Listing 1 is a modi- extraction of discriminant hyperrectangle constraints are fication of the DCM algorithm for discriminant chronicles further elaborated in [1]. mining proposed in [3]. The main aspects of the modifi- cation are the data model (substituting scalar integer val- 4 Application to Crystal Growth Data ues for vectors of real numbers) and a new discriminant hyperrectangle constraints mining algorithm (a substitu- The need for affordable high quality semiconducting crys- tion of an algorithm used for discriminant temporal con- tals such as gallium arsenide GaAs is continuously in- straints mining proposed in [3]). It operates with multi- creasing, particularly for the electronic and photovoltaic dimensional input data and multi-dimensional chroni- applications. Despite GaAs has a number of outstanding cles, mining an incomplete set of discriminant multi- physical properties, its production is hampered by chal- dimensional chronicles, determined by user-supplied ar- lenging processes control due to high melting tempera- gument values fmin (in the pseudocode as fmin) and gmin tures (1238◦ C) and chemically-aggressive environment. (in the pseudocode as gmin). Particularly in-situ measurements of the process variables The branching statement in Listing 1 containing the (e.g. temperatures, velocities, concentrations etc.) in the condition GaAs have high contamination potential and lead to the supp(S+,{m,tinf}) > (gmin*supp(S-,{m,tinf})) low crystal quality. Moreover, in-situ visual observations of the crystal growth are not possible. Prediction of the po- is used to check whether given frequent multiset without sition of the crystallization front, i.e. length of the grown further specific hyperrectangle constraints is discriminant crystal after usage of certain growth recipe (i.e. temporal profiles of a power of heaters) is a key information for the process monitoring. Table 1: Centers c of the 20 clusters in R2 defining event Here, we considered Vertical Growth Freeze (VGF) types. They were obtained through clustering the first method for the growth of GaAs crystals. VGF growth 500 numeric simulations underlying [7] using the k-means method involves the progressive freezing of the lower end algorithm in Matlab of a melt upward by moving the desired temperature gra- Cluster c1 c2 Cluster c1 c2 dient in a furnace via temporal change of heating power. A 13400 8560 K 14300 8720 1-dimensional model of VGF-GaAs growth is shown in B 12300 8690 L 11900 8650 Figure 1. C 15200 8840 M 12700 8660 D 13900 8590 N 12100 8670 E 13600 8730 O 13100 8720 4.1 Used Data F 14900 8800 P 13700 8550 G 13200 8570 Q 14100 8710 The above described implementation extending the me- H 12900 8590 R 13300 8730 thod proposed in [3] has been applied to data gathered in I 13800 8740 S 14600 8760 the German Research Foundation (DFG) project “Model- J 12500 8680 T 12900 8720 based control and regulation of the VGF crystal growth process using distributed parametric methods”. The data records the position of the solid/liquid interface of GaAs crystals grown by the vertical gradient freeze (VGF) me- S− = {((e1 , T1 ), . . . , (e20 , T20 )|ei , Ti , i = 1, . . . , 20, thod, which involves progressive freezing of the lower end originated in a simulation ending with the position of a melt upward by moving the temperature gradient in of the solid/liquid interface 17.25–25 cm)} (10) a furnace, together with the evolution of temperatures in 0th–4th quarter of the GaAs height. They have been ob- As to the number of sequences in both sets, #S+ = tained by solving the inverse problem for a simplified one 90, #S− = 165. dimensional model of the VGF process for different de- Finally, the considered partial ordering ≺ of event types sired growth rates as described in [7], using as input the is given by the order of ocurrence of events of those types evolution of 2-dimensional vectors describing the heat flux in any of the event sequencesin S+ or in S− , i.e., in and heat flux out (Figure 1). All simulations were per- formed for 100 times, among which the 5th, 10th, . . . , 95th, 100th time will in the following serve as milestone e ≺ e0 iff (∃((e1 , T1 ), . . . , (e20 , T20 ) ∈ S+ ∪ S− ) times. (∃i, j = 1, . . . , 20) i < j & = ei & e0 = e j . (11) For an application of the method presented in Section 3, event types and events have been defined as follows. The 2-dimensional inputs of the 500 numeric simulations un- 4.2 Experimental Setup derlying [7] have been clustered into k = 20 clusters using The experimental setup aimed at a chronicle set contain- the Matlab implementation of the standard k-means clus- ing about 20-30 elements and including both chronicles tering algorithm. The centers of the resulting clusters are discriminant for S+ with respect to S− and chronicles listed in Table 1. An event type is now the fact that the discriminant for S− with respect to S+ . Each chroni- input belongs to a particular cluster. For each numeric cle (E , T ) ∈ C should contain only a minimal number simulation, an event type is recorded at every milestone of T∞ constraints. time. Consequently, the size of any multiset of event types from one numeric simulation is at most 20. . An event Assume that C = (E , T ) is a chronicle, C is a set is a pair (e, T ), where e is an event type and T ∈ R5 is a of chronicles and ts ∈ [0, 1]. Chronicle specificity denoted vector of temperatures obtained in the numeric simulation as s(C) is defined as: and at the milestone time when e was recorded, provided #{e[[t,t 0 ]]e0 ∈ T |e[[t,t 0 ]]e0 6∈ T∞ } the position of the solid/liquid interface at the end of that s(C) = . simuation was at least 17.25 cm. There were 255 such #T simulations available, thus we have 255 event sequences Chronicle set C specific for a specificity threshold ts de- of length 20, due to the 20 milestone times. They were noted as s(C,ts ) is defined as s(C,ts ) = {C|C ∈ C & s(C) ≥ divided into two disjoint sequence sets as follows: ts }. The metrics used for evaluating the convenience of pa- S+ = {((e1 , T1 ), . . . , (e20 , T20 )|ei , Ti , i = 1, . . . , 20, rameters passed to the DC-PBC component are described originated in a simulation ending with the position in the rest of this paragraph. #M is the size of the set of fre- of the solid/liquid interface >25 cm)} (9) quent multisets set as introduced in the pseudocode of the DCM-MD algorithm in Listing 1. #E is the count of dis- tinct frequent multisets which occurred in some discrimi- Figure 1: Illustration explaining the used crystal growth data nant chronicle of the resulting chronicle set C: voked for the data described above with argument values --mincs 2, --maxcs 5, --fmin 0.1, --gmin 5000. #E = #{E |(∃T – a set of hyperrecrtangle constraints)(E , T ) ∈ C}. The resulting set of discriminant chronicles was after- maxs(C) = max{s(C)|C ∈ C} is the maximal speci- wards filtered to include only specific discriminant chron- ficity value found among the chronicles in C. #s(C,ts ) icles. To this end, a tool chronicle_statgen available is the count of chronicles specific for ts found in C. at github.com/busarade-itat was invoked with argu- The following parameters were tuned: fmin imple- ments --minspec 0.7, --vecsize 5. mented by the --fmin parameter representing minimal support threshold. gmin implemented by the --gmin pa- The final result is presented in Tables 2 and 3, counting rameter representing the minimal growth rate threshold a total of 26 specific discriminant chronicles – 18 of them parameter of the DCM-MD algorithm as introduced in List- discriminant for S+ with respect to S− , the remaining 8 ing 1. min(#E ) implemented by the --mincs param- discriminant for S− with respect to S+ . eter representing minimal chronicle event multiset size. max(#E ) implemented by the --maxcs parameter repre- senting maximal chronicle event multiset size. ts represent- Proposed method enables prediction of the conditions ing the specificity threshold for a custom tool implemented for reaching targeted crystal length by following the dif- for extracting specific discriminant chronicles from a set of ferences among segments in temporal profiles of temper- discriminant chronicles. atures in characteristic points in the GaAs. If the same After evaluating the metrics for each parameter tun- approach is further applied on the experimental tempera- ing step, the argument values fmin = 0.1, gmin = 5000, ture profiles measured by thermocouples in heaters (out- min(#E ) = 2, max(#E ) = 5, ts = 0.7 proved sufficient for side of the melt and crystal) as in real experiments, it will retrieving a set of specific discriminant chronicles with the be possible to determine moment of reaching desired crys- desired properties. tal length without visual observations and GaAs contami- nation. From that moment on, crystal growth process step terminates and cooling down of the furnace starts. Such 4.3 Examples of Extracted Rules accurate prediction of the end of solidification step will be The implementation of generalized DC-PBC available very beneficial for the process economy and the final crys- at github.com/busarade-itat/md-dc-pbc was in- tal quality. Table 2: Resulting set of specific chronicles discriminant for S+ with respect to S− , rounded to 3 significant digits E(C) T(C) supp(C, S+ ) supp(C, S− ) {{e1 = A, e2 = A, e3 = Q}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (−33.7, −51.3, −68.9, −86.7, −105)]]e2 , 23 0 e1 [[(−∞, −∞, −∞, −∞, −∞), (73.4, 73.5, 73.7, 73.8, 74.0)]]e3 , e2 [[(−∞, −∞, −∞, −∞, −∞), (159, 160, 161, 162, 180)]]e3 } {{e1 = A, e2 = I, e3 = Q}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (15.5, 17.9, 18.0, 18.1, 18.2)]]e2 , 10 0 e1 [[(43.1, 60.6, 73.7, 73.8, 74.0), (∞, ∞, ∞, ∞, ∞)]]e3 , e2 [[(−∞, −∞, −∞, −∞, −∞), (144, 145, 146, 146, 163)]]e3 } {{e1 = A, e2 = I, e3 = Q}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (36.9, 37.0, 37.1, 37.1, 37.2)]]e2 , 13 0 e1 [[(43.1, 60.6, 73.7, 73.8, 74.0), (∞, ∞, ∞, ∞, ∞)]]e3 , e2 [[(−∞, −∞, −∞, −∞, −∞), (144, 145, 146, 146, 163)]]e3 } {{e1 = G, e2 = A, e3 = I}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (−128, −128, −129, −130, −130)]]e2 , 13 0 e1 [[(55.9, 73.8, 91.6, 109, 127), (∞, ∞, ∞, ∞, ∞)]]e3 , e2 [[(398, 412, 414, 416, 418), (∞, ∞, ∞, ∞, ∞)]]e3 } {{e1 = G, e2 = A, e3 = I}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (−374, −376, −378, −380, −382)]]e2 , 13 0 e1 [[(−∞, −∞, −∞, −∞, −∞), (88.6, 107, 125, 144, 162)]]e3 , e2 [[(398, 412, 414, 416, 418), (∞, ∞, ∞, ∞, ∞)]]e3 } {{e1 = G, e2 = A, e3 = Q}} {e1 [[(139, 140, 141, 142, 142), (∞, ∞, ∞, ∞, ∞)]]e2 , 9 0 e1 [[(−∞, −∞, −∞, −∞, −∞), (249, 268, 287, 306, 325)]]e3 , e2 [[(43.5, 61.3, 73.7, 73.8, 74.0), (∞, ∞, ∞, ∞, ∞)]]e3 } {{e1 = G, e2 = I, e3 = Q}} {e1 [[(104, 117, 134, 151, 163), (∞, ∞, ∞, ∞, ∞)]]e2 , 18 0 e1 [[(−∞, −∞, −∞, −∞, −∞), (189, 205, 222, 239, 256)]]e3 , e2 [[(−60.4, −60.5, −60.6, −60.7, −60.8), (∞, ∞, ∞, ∞, ∞)]]e3 } {{e1 = G, e2 = Q, e3 = Q}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (249, 267, 286, 305, 324)]]e2 , 14 0 e1 [[(60.4, 78.3, 96.3, 114, 132), (∞, ∞, ∞, ∞, ∞)]]e3 , e2 [[(−32.5, −32.5, −32.6, −32.7, −32.7), (−31.8, −31.8, −31.9, −32.0, −32.0)]]e3 } {{e1 = A, e2 = A}} {e1 [[(−135, −135, −136, −137, −137), (−127, −128, −128, −129, −130)]]e2 } 38 0 {{e1 = A, e2 = I}} {e1 [[(431, 451, 470, 490, 510), (∞, ∞, ∞, ∞, ∞)]]e2 } 12 0 {{e1 = A, e2 = K}} {e1 [[(286, 305, 324, 343, 362), (∞, ∞, ∞, ∞, ∞)]]e2 } 30 0 {{e1 = A, e2 = Q}} {e1 [[(372, 391, 411, 430, 449), (∞, ∞, ∞, ∞, ∞)]]e2 } 21 0 {{e1 = G, e2 = A}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (−375, −377, −379, −381, −383)]]e2 } 16 0 {{e1 = G, e2 = G}} {e1 [[(−124, −125, −125, −126, −127), (−122, −123, −123, −124, −124)]]e2 } 8 0 {{e1 = G, e2 = K}} {e1 [[(26.5, 43.7, 44.6, 44.8, 45.1), (161, 180, 198, 217, 236)]]e2 } 17 0 {{e1 = I, e2 = K}} {e1 [[(72.5, 72.7, 72.8, 73.0, 73.1), (137, 144, 162, 180, 198)]]e2 } 12 0 {{e1 = I, e2 = Q}} {e1 [[(68.6, 68.8, 68.9, 69.1, 69.2), (69.0, 69.1, 69.3, 69.4, 69.5)]]e2 } 7 0 {{e1 = Q, e2 = K}} {e1 [[(−27.3, −27.4, −27.4, −27.5, −27.5), (∞, ∞, ∞, ∞, ∞)]]e2 } 72 0 Table 3: Resulting set of specific chronicles discriminant for S− with respect to S+ , rounded to 3 significant digits E(C) T(C) supp(C, S+ ) supp(C, S− ) {{e1 = G, e2 = I, e3 = Q}} {e1 [[(−∞, −∞, −∞, −∞, −∞), (80.3, 80.5, 80.6, 80.7, 80.9)]]e2 , 0 19 e1 [[(−∞, −∞, −∞, −∞, −∞), (50.2, 50.2, 50.3, 50.4, 50.5)]]e3 , e2 [[(−∞, −∞, −∞, −∞, −∞), (67.4, 67.5, 67.6, 77.0, 93.3)]]e3 } {{e1 = A, e2 = Q}} {e1 [[(12.1, 12.1, 12.1, 12.1, 12.1), (13.6, 13.6, 13.6, 13.6, 13.7)]]e2 } 0 13 {{e1 = G, e2 = A}} {e1 [[(316, 333, 351, 368, 385), (∞, ∞, ∞, ∞, ∞)]]e2 } 0 16 {{e1 = G, e2 = G}} {e1 [[(−121, −122, −122, −123, −123), (∞, ∞, ∞, ∞, ∞)]]e2 } 0 39 {{e1 = G, e2 = I}} {e1 [[(318, 334, 351, 367, 384), (∞, ∞, ∞, ∞, ∞)]]e2 } 0 31 {{e1 = G, e2 = Q}} {e1 [[(393, 411, 429, 447, 465), (∞, ∞, ∞, ∞, ∞)]]e2 } 0 18 {{e1 = I, e2 = I}} {e1 [[(−30.9, −31.0, −31.0, −31.1, −31.1), (∞, ∞, ∞, ∞, ∞)]]e2 } 0 37 {{e1 = Q, e2 = Q}} {e1 [[(−30.8, −30.8, −30.9, −31.0, −31.0), (−30.1, −30.2, −30.2, −30.3, −30.3)]]e2 } 0 14 5 Conclusion by dynamic neural networks. Journal of Crystal Growth, 521:9–14, 2019. The paper has presented a generalization of the method [8] Gemma C. Garriga. Summarizing sequential data with for discriminant chronicles mining proposed in [3]. This closed partial orders. In SDM, 2005. generalization has been motivated by the objective to ex- [9] M. Ghallab, D. Nau, and P. Traverso. Automated Planning tract classification rules from crystal growth data, bring- and Acting. Cambridge University Press, Cambridge, 2016. ing two additional problems not pertaining to the data to [10] D.J. Hand. Construction and Assessment of Classification which the original method had been applied: the events Rules. John Wiley and Sons, New York, 1997. are described with a vector of attributes instead of a sin- [11] M. Holeňa, P. Pulc, and M. Kopp. Classification Methods gle scalar attribute, and the attributes are real-valued in- for Internet Applications. Springer, 2020. stead of integer-valued. The theoretical fundamentals of [12] H.C. Lau, T. Ou, and M. Sim. Robust temporal constraint the method in [3] have been extended to tackle those two networks. In International Conference on Tools with Arti- problems and the system for discriminant chronicles min- ficial Intelligence, pages 82–88, 2005. ing based on [3] has been adapted to accomodate those extensions, together with some additional implementation improvements such as refactoring. As a proof of concept of the presented generalization, it has been applied, using the modified system, to real-world data with events char- acterizing the heat fluxes for the growth of GaAs crystals by vertical gradient freeze method, and with a vector of 5 attributes recording the temperatures in different heights. Although most of the hyperrectangles in Tables 2 and 3 are not very restrictive, the extracted classification rules nev- ertheless show that the proposed approach allows to as- sess whether the grown crystal will have a desired length based solely on the temperature profiles. Regarding fu- ture research, it would be interesting to assess how small changes to the mined hyperrectangle constraints affect the manufacturing process of the VGF-GaAs crystals. Acknowledgement The research reported in this paper has been supported by the Czech Science Foundation (GAČR) grant 18-18080S. References [1] Radek Buša. Implementation of a generalized version of a system for discriminant chronicles mining. Czech Techni- cal University in Prague. Computing and Information Cen- tre., Cham, 2020. [2] W.W. Cohen. Fast effective rule induction. pages 115–123, 1995. [3] Yann Dauxais, Thomas Guyet, David Gross-Amblard, and André Happe. Discriminant chronicles mining. 2017. [4] R. Dechter, I. Meiri, and J. Pearl. Temporal constraint net- works. Artifical Intelligence, 49:61–95, 1991. [5] Christophe Dousson and Thang Vu Duong. Discovering chronicles with numerical time constraints from alarm logs for monitoring dynamic systems. In IJCAI, pages 620–626, 1999. [6] N. Dropka and M. Holeňa. Optimization of magnetically driven directional solidification of silicon using artificial neural networks and Gaussian process models. Journal of Crystal Growth, 471:53–61, 2017. [7] N. Dropka, M. Holeňa, S. Ecklebe, C. Frank-Rotsch, and J. Winkler. Fast forecasting of VGF crystal growth process