Introduction

Pattern Structures for Treatment Optimization in Subgroups of Patients

Natalia V. Korepanova

korepanova.natalia@gmail.com 0

Sergei O. Kuznetsov

skuznetsov@hse.ru 0 0 National Research University Higher School of Economics , Moscow , Russia

A comparison of di erent treatment strategies does not always result in determining the best one for all patients, one needs to study subgroups of patients with signi cant di erence in e ciency between treatment strategies. To solve this problem an approach to subgroups generation is proposed, where data are described in terms of a pattern structure and pattern concepts stay for patient subgroups and their descriptions. To nd the most promising pattern concepts in terms of the di erence of treatment strategies in e ciency a version of CbO algorithm is proposed. An application to the analysis of data on childhood acute lymphoblastic leukemia is considered.

pattern structure subgroup analysis acute lymphoblastic leukaemia

Introduction

Randomized controlled trial (RCT)[ 1 ] is a common approach in evidence-based medicine to prove the superiority of a disease treatment over another one. There are three main types of hypotheses which can be tested with the help of RCT: superiority, noninferiority or equivalence. The main goal of all intervention, drug, or therapy inventions is to nd a better way of treating patients. So, superiority trials are possibly the most popular ones, and they are the subject of interest of this paper. A superiority trial allows physicians to nd the optimal treatment and improve the curability of the disease. However if comparing treatment strategies are similar enough it is a big success to nd and prove the superiority of any of them for all patients. But the e ect of treatment strategies may depend on patients initial features (physiological characteristics and/or results of diagnostics). For example, if we compare dosages of a toxic drug a small dosage may be more suitable for patients with light disease manifestation because it throws o a disease and reduces the negative consequences of toxicity while for patients with intense disease manifestation a small dosage is not enough to cure them of the disease. The question is how to nd such subgroups of quite similar patients where the e ciency di erence of treatment strategies is signi cant.

Several approaches were proposed in [2{11]. Most of them [5{11] are based on the idea of decision or regression trees which locally optimize some measure at every iteration of the algorithm. This approach is more suitable when we operate on the big datasets or constantly increasing dataset because they allow to nd some subgroups quickly. In the case of treatment optimization the datasets, as a rule, are not very big and do not increase in size rapidly. Moreover, collecting such datasets demands a lot of time and e orts. So, it is more important to carry out more detailed analysis of the data then to make it fast. Also, in RCT on cancer, heart conditions or chronic diseases the outcome of the therapy can be censored, while only few papers like [ 9, 11 ] report on analysis of censored data. In this paper we propose a universal approach to nding subgroups of patients with signi cantly di erent responses to di erent treatment strategies, which is not biased by any local optimization criterion. Within this approach subgroups of patients are generated, which are determined by subsets of patients' features.The approach in based on computing closed patterns [14{17] that satisfy criteria of treatment e ciency. The approach was proposed for the analysis of the database of randomized controlled trial on childhood acute lymphoblastic leukemia (ALL) [ 12, 13 ] which was performed in several hospitals in Russia and Byelorussia. In this dataset each patient under study is assigned one of the studied treatment strategies, he/she is described by a set of initial features that can be nominal or numerical, and some outcome which is used to estimate treatment e ciency.

The rest of the paper is organized as follows. In section 2 we recall basic de nitions of pattern structures and give examples of pattern structures [14{17] relevant to the analyzed data. In section 3 a version of Close-by-One (CbO) algorithm [ 18 ] performing on the attributes is proposed. Section 4 presents stopping criterion added to the version of CbO to generate only subgroups with the di erence in the e ciency of treatment strategies. Section 5 presents an application of the proposed approach to the ALL dataset, and we conclude in section 6. 2 2.1

Pattern Structures Main De nitions

In this section we recall pattern structures and examples of pattern structures used for nominal and numerical features.

Let G be a set (of objects), (D; u) be a meet-semilattice (of all possible object descriptions), and : G ! D be a mapping. Then (G; (D; u); ) is called a pattern structure, provided that the set (G) = f (g) j g 2 Gg generates a complete subsemilattice (D ; u) of (D; u), i.e. every subset X of (G) has an in mum FX in (D; u). Elements of D are called patterns and are naturally ordered by subsumption relation v: c v d , cud = c, where c; d 2 D. Operation u is also called a similarity operation. If (G; (D; u); ) is a pattern structure we de ne the derivation operators which form a Galois connection between the powerset of G and (D; u) as:

A = gF2A (g) d = fg 2 G j d v (g)g for A G for d 2 D (1) The pairs (A; d) satisfying A G, d 2 D, A = d, and A = d are called pattern concepts of (G; (D; u); ), with pattern extent A and pattern intent d. Pattern concepts are ordered with respect to set inclusion on extents. The ordered set of pattern concepts makes a lattice, called pattern concept lattice. Operator ( ) is an algebraical closure operator on patterns, since it is idempotent, extensive, and monotone.

If objects are described by binary attributes from set M , then D = }(M ), the powerset of M , and (g) is prime operator ( )0 in the context (G; M; I): (g) = fm 2 M j gImg, and d1 u d2 = d1 \ d2 where d1; d2 2 D. So, subsumption corresponds to set inclusion: d1 v d2 , d1 u d2 = d1 , d1 \ d2 = d1 , d1 d2.

So, if all patients' initial features are binary we can use them as binary attributes directly. However we also aim at dealing with nominal initial features. Consider patients are described by k initial features f 1; :::; kg, all of them are nominal (binary is a particular case) and their values are coded as natural numbers. So, if i takes li values we assume that the range of i is f1; :::; lig. For each i we construct li binary attributes f i1; :::; ili g such that ij : G ! f0; 1g and ij (g) : g ! i(g) = j where g 2 G; j = 1; :::; li. As a result we get Pi=1;:::;n li binary attributes to which pattern structures can be applied as it is shown above. 2.2

Pattern Structures on Intervals

To operate with numerical features interval pattern structures [15{17] can be applied. Let us consider each patient is described by n numerical and no nominal initial features. In our notation G corresponds to the set of patients. So, let f'1; '2; :::; 'ng be a set of functions represented patients' initial features such that 'i : G ! R for i = 1; :::; n. For each feature 'i we construct a corresponding interval attribute i : G ! [R; R] such that if 'i(g) = x for g 2 G, then i(g) = [x; x], where x 2 R. These attributes are used for pattern structures construction.

Each object is described by a n-dimensional tuple of intervals. Let a and b be tuples of n intervals, so a = h[vi; wi]ii=1;:::;n and b = h[xi; yi]ii=1;:::;n, where vi; wi; xi; yi 2 R 8i = 1; :::; n. In this case the similarity operation u is de ned by the meet of tuple components:

a u b = h[vi; wi]ii=1;:::;n u h[xi; yi]ii=1;:::;n = h[vi; wi] u [xi; yi]ii=1;:::;n; where [vi; wi] u [xi; yi] = [min(vi; xi); max(wi; yi)].

Hence, subsumption on tuples of interval is de ned as: a v b , [vi; wi] v [xi; yi]i=1;:::;n , [vi; wi] u [xi; yi] = [vi; wi]i=1;:::;n , , [min(vi; xi); max(wi; yi)] = [vi; wi]i=1;:::;n , [vi; wi] [xi; yi]i=1;:::;n: For example, h[2; 6]; [4; 5]i v h[3; 4]; [5; 5]i as [2; 6] v [3; 4] and [4; 5] v [5; 5]. (2) (3) 2.3

Pattern Structures on Mixed Tuples

In the previous sections we consider separately the cases of nominal and numerical initial features. However, the situation when patients have both nominal and numerical initial features seems more natural. As it is described above we associate the set of binary attributes with each nominal feature and an interval attribute - with each numerical one. So, d 2 D is a tuple, where components are intervals and binary attributes. Let us have k binary and n interval attributes. Assume d = h ; i where is the tuple of intervals of length n, and is the subset of binary attributes. If d1; d2 2 D, d1 = h 1; 1i, and d2 = h 2; 2i similarity operator can be set as d1 u d2 = h 1 u 2; 1 u 2i where similarity operators for the tuples of intervals and the sets of binary attributes are de ned above. The subsumption is also de ned by subsumption on the tuples of intervals and the sets of binary attributes: Pattern concepts can be computed by Close by One (CbO) algorithm. It produces a tree structure on pattern concepts where edges represent a subset of lattice edges. For the following analysis we do not need the whole lattice but the tree-structure provided by CbO is helpful for implementation of stopping criterion. Considering the top of the lattice is the pair (G; G ), and the bottom is (;; ; ) CbO starts from the bottom and proceeds \object-wise". However, when the number of objects is larger than the number of attributes processing top-down allows one to reduce computation time. Moreover, for the given problem we need to generate pattern concepts with as large extents as possible to detect the di erence in treatment e ciency. So, it is more reasonable to start generation process from the top of the lattice. As we operate on descriptions consisting from interval and binary attributes classical CbO must be adapted to such descriptions. For this purpose a version of CbO is proposed below. The idea is to order elements of descriptions, and start to reduce descriptions by changing its elements in this order.

Let objects be described by n interval and k binary attributes. So, we can rewrite description as d = hv1; w1; :::; vn; wn; b1; :::; bki, where vi and wi are the left and right bounds of the i-th interval attribute for i = 1; :::; n, and bj indicates whether description d contains the j-th binary attribute for j = 1; :::; k. So, if bj is 0 (g) does not contain the j-th binary attribute for all g 2 d , and if bj is 1, then (g) may or may not contain the j-th binary attribute for all g 2 d . In other words, 1 u 0 = 0 or 0 v 1. The de nition of similarity operator remains the same: we take minimum of left bounds, maximum of right bounds, and set intersection, which in the given notation can be written as element-wise conjunction of indicator vectors: d1 u d2 = hhmin(v1;i; v2;i); max(w1;i; w2;i)ii=1;:::;n; hb1;j ^ b2;j ij=1;:::;ki : (5)

The introduced version of CbO starts from the most general description and speci es it by reducing intervals and adding binary features. For interval reduction it is necessary to choose some step value si for each interval attribute (i = 1; :::n;). If we aim at some sort of scaling we can set these values by ourselves, or if scaling is unwanted si is set to the smallest di erence between values of the initial feature corresponding to the i-th interval attribute.

Further we denote dall = gF2G (g), min(d1; d2) denotes the minimum position of unequal elements of tuples d1 and d2 in element-wise comparison. Let suc(d) denote the set of all children of the node corresponding to the description d. Let prev(d) return the parent of d, address(d) return the address of d, and nexti(d) store the position of the description tuple d which must be changed at the next algorithm returning to d. Let denote integer division, % denote residue of division, and [ ] be an operator of taking the element of the tuple. Function AddConcept set required links between tree nodes when a new node is added. Function OneIteration changes the description dcurr given as argument in position nexti(dcurr) and takes closure of the changed description by ( ) . If min of the closure and dcurr is not less than nexti(dcurr) then the function returns the closure, otherwise it returns dcurr. 1. 2. 3. 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. 17. 18.

def AddConcept(parent; child) suc(parent) address(child) prev(child) := parent nexti(child) := nexti(parent) def OneIteration(dcurr) dnew := dcurr i := nexti(dcurr) if i 2n then q = i 2 r = i%2 dnew[i] := dnew[i] sq(2r 1) dadd := dnew if dnew[2q] dnew[2q + 1] and min(dcurr; dadd)

AddConcept(dcurr; dadd) return dadd else return dcurr else dnew[i] := d[i] ^ 1 dadd := dnew if dnew[i] 6= d[i] and min(d; dadd)

AddConcept(dcurr; dadd) return dadd else return dcurr i then 0. d := dall; nexti(d) := 1; prev(d) := ;; suc(d) := ; 1. until d = dall and nexti(d) > 2n + k do 2. until nexti(d) > 2n + k do 3. dadd:=OneIteration(d) 4. if d 6= dadd then i then 5. 6. 7. 8.

d := dadd output(d; d ) else nexti(d) := nexti(d) + 1 d := prev(d)

Lines 7 and 14 of function OneIteration has complexity O((2n + k)jGj), and all lines 3{8 of the main part of the algorithm are performed at most in this time. The loop starting at line 2 is repeated 2njGj + k times at worst (as in the worst case each boundary of all interval attributes can take jGj values), while the loop starting at line 1 is repeated jLj times exactly, where jLj is the number of pattern concepts. So, the algorithm has complexity O((2n+k)jGj(2njGj+k)jLj). The complexity is higher than that of CbO, O((2n + k)jGj2jLj), but in practice our algorithm may become faster when n and k are small, and the number of numerical values in data is less than jGj. 4

Stopping Criterion

As it is mentioned above it may not be required to generate all pattern concepts, and the version of CbO may stop when subgroups with di erence in treatment e ciency and maximal possible extent are generated. To estimate the di erence in e ciency for some description d we de ne a di erence measure which depends on the sets of outcomes of patients who match d and have received the same treatment, and if the value of this measure satis es some criterion of signi cance the proposed version of CbO stops to generate speci cation of the description which is currently in work.

Let p be the number of comparing treatment strategies, and d is the description currently processed by the algorithm. Assume Qi = foutcome(g) j g 2 d ; treatment(g) = ig for all i = 1; :::; p, where outcome(g) is the outcome of g, and treatment(g) is the treatment assigned to g. So, jQij denotes the number of patients received treatment i. We de ne di erence measure which takes the sets of outcomes corresponding to each treatment strategy and returns the estimation of di erence, and set threshold ".The criterion looks like if (Q1; :::; Qp) > " the algorithm does not generate the children of the currently processed node and returns to its parent node.

Except the stopping criterion itself several additional restrictions are necessary. So, if a subgroup does not contain patients per each of p treatment strategies we are not able to compute and need to return to the parent subgroup without generating children nodes, since the antimonotonicity of operator ( ) ensures this property for all descendants of the current node. Also, it may be important to result in descriptions with approximately equal number of patients per treatment strategy in corresponding subgroups. Therefore, additional parameter 2 [0; 1] is provided to control the ratio of each treatment strategy in a subgroup (see line 4 in function Restrictions). If is set to one this restriction is deactivated, if it is set to zero the numbers of patients per each treatment strategy must be equal. Let outc(d) be hQiii=1;:::;n, outc(d)[i] be Qi, and j j return the power of the set. Function Restrictions checks ful llment of these restrictions for a particular description. Let isempty(d) be F alse if the subgroups corresponding to description d do not contain patients from every treatment strategy, and T rue otherwise. Let also notBalanced(d) be F alse if the subgroup corresponding to d is not balanced (do not satisfy restriction on proportion of treatment strategies in the subgroup), and T rue otherwise.

def Restrictions(d) 0. isempty(d) := F alse; notBalanced(d) := F alse 1. for j from 1 to p do 2. isempty(d) := isempty(d) _ (joutc(d)[j]j = 0) 43.. snhoatBreal:=ancjoeudtj(cd(dd)j)[:j=]j notBalanced(d) _ ( 1 p

The algorithm outputs the set of maximal size subgroups (i.e. maximal extents) with signi cant di erence in treatment e ciency. Since we do not construct the whole pattern lattice the resulting set may contain subgroups which subsumed under the other subgroups from this set. Smaller subgroups should be excluded from the output by post-processing. 5 5.1

Dataset Application to ALL Dataset

The dataset consists of more than 2000 patients from 1 to 18 years old with newly diagnosed ALL. All of them were included into the standard risk group (SRG) or into the intermediate risk group (ImRG) of randomized clinical trial MB-ALL2008 [ 19 ]. The protocol of this trial contains three stages of treatment for SRG and ImRG: induction (36 days), consolidation (25 weeks), and maintenance (2-3 years). In this paper we only focus on the induction stage. Induction therapy aims at bringing a patient into remission and is very toxic. At this stage of treatment patients from SRG and ImRG are randomized into 3 and 2 treatment strategies correspondingly. Let us code them as T1, T2, and T3 in SRG and T4 and T5 in ImRG. SRG and ImRG parts of the dataset may be considered as two independent datasets because SRG and ImRG therapies di er considerably.

Each patient from the dataset is described by the set of initial features, treatment strategy which he or she was assigned at randomization, and outcome features. From all initial features 8 were chosen for the analysis: sex (male or female), age (in years from the birth date to the start of the therapy), initial white blood count (per nl) (WBC), immunophenotype (B- or T-ALL), central nervous system (CNS) status (normal, cytosis is less than 5 per mcl and blast cells, neuroleukemia), liver enlargement (in cm), spleen enlargement (in cm), mediastinum status (normal or pathological). As for outcome features of the dataset each outcome feature consists of two parts: the result of the therapy and time from the beginning of the therapy to the date when the result was xed (in years). In this dataset we have two such features. One of them is xing the time before patient's death.This feature has three possible states: alive, lost to follow-up or death. When a patient is alive or lost to follow-up we say that censoring happened because we cannot measure exactly the time before death. The other outcome feature represents the time before a negative event . This feature can possess the following states: alive in remission, lost to follow-up, death in remission, secondary tumor, relapse or metastases, nonresponse to the therapy or disease progression, or death in induction. Alive in remission and lost to follow-up events correspond to censoring. Two types of outcomes are used for di erent variants of treatment e ciency estimation which are presented below.

Finally, we exclude patients with missed values from the further analysis and result in 1221 SRG patients: 387, 366, and 368 patients received T1, T2, and T3 respectively. 929 patients in ImRG: 467 and 462 patients received T4 and T5 respectively. 5.2

Data Preprocessing

As it was shown above the initial features of the patients should be transformed into the tuple of interval and binary attributes. Age, WBC, liver enlargement, and spleen enlargement are numerical features, so for each of them an interval attribute is created. Moreover, we scale them a little to obtain subgroup descriptions which make sense for physicians. For example, it is not correct enough in medical terms that one treatment is more e ective for patients, let's say, up to 5.4 years old. Similar limitations should also be applied to other 3 numerical features. To answer these limitations we propose the following way of interval attributes construction: 1. Let one set the steps of interval reduction in CbO to sage = sliver = sspleen = 1 and sW BC = 10. 2. For each name 2 fage; liver; spleen; W BCg and for every g 2 D, where D is the set of all patients, if name(g) = x then the value of the corresponding interval attribute is set to [sname bx=snamec; sname dx=snamee].

All nominal features (sex, immunophenotype, CNS status, and mediastinum status) are converted to the set of binary attributes in the way presented in section 2.1. For instance, we construct two binary attributes corresponding to sex: one indicating males and another indicating females. 5.3

Methodology of Generating Hypotheses

The algorithm of subgroup descriptions generation is proposed in Section 4. It requires to set the di erence measure. For the data of the childhood ALL we choose log-rank statistics [ 20 ]. As it is a statistical method of detecting the di erence between two (or several) survival curves [20{22] the threshold is naturally chosen to satisfy 95% level of con dence of log-rank test. So, if p-value of twosided log-rank test is less than 5%, we output current description as a potential subgroup description and do not generate its children, otherwise we continue to generate children descriptions in accordance with the introduced algorithm. Finally, from the set of potential subgroup descriptions we delete descriptions subsumed under other potential subgroup descriptions.

As in SRG three treatment strategies (T1; T2; T3) are compared, the algorithm selected those subgroups where the di erence between any pair of the treatment strategies is signi cant. Assume one of descriptions we get is d. For the subgroup described by this description we should compare every pair of treatment strategies: T1 and T2, T1 and T3, and T2 and T3. For each pair where log-rank test detects the di erence with con dence 95% and power 80% we make a hypothesis. For instance, if pair Ti and Tj satis es these requirements then a hypothesis says that treatment strategies Ti and Tj a ect patients described by d di erently. At the same time if the survival curve for Ti, for instance, is located above the survival curve for Tj we can even say that Ti is better for patients described by d than Tj . For patients from ImRG only two treatment strategies are compared. Therefore it is enough to estimate power, and if it is more than 80% we make a hypothesis in the same way. 5.4

Summary of Results

The proposed algorithm was run to compare separately overall survival (OS)[ 23 ], event-free survival (EFS)[ 24 ], and relapse-free survival (RFS)[ 25 ] in each risk group with set to 13 and 15 for SRG and ImRG respectively. We also add restrictions on the size of the subgroups: not less than 20 and 200 patients per each treatment strategy for SRG and ImRG, respectively (the choice is explained by the greater number of patients per treatment and possible descriptions for ImRG). So, as a result three sets of subgroup descriptions were obtained for each risk group (SRG and ImRG), one per each type of survival.

The results of experiments for SRG are presented in Table 1. OS, EFS, and RFS stand for three types of survival described above, CbO stands for the proposed approach, and IT stands for Interaction Tree from [ 9 ]. To construct an interaction tree the same restriction on the size of subgroups (i.e. leaves) was set: not less than 20 patients per each treatment strategy. Performing IT with pruning results in no subgroups, therefore we compared to unpruned trees. To estimate subgroups in each of them paired logrank test p-values for every pair of compared treatment strategies are estimated. We have also performed bootstrap sampling on 1000 samples, and for each subgroup we estimate p-value median and 0.95 unpivotal con dence interval for the di erence between long-term survival estimations for every pair of compared treatment strategies. We count the number of subgroups where the result of comparison of even one pair of compared treatment strategies in this subgroups satis es restrictions at the heading. So, p corresponds to p-value of paired logrank test, pm is a median estimated by bootstrap, dl and dr are the left and the right boundaries of 0.95 bootstrap con dence interval for the di erence in long-term survival. For ImRG we obtained 2559 and 153 subgroups based on OS and EFS respectively and no subgroups based on RFS since the superiority of T5 over T4 in RFS holds for the whole set of ImRG patients. Moreover, all obtained subgroups based on OS and EFS con rm that T5 is better than T4. For this reason we did not carry out an experiments on ImRG by applying Interaction Trees. Given all hypotheses for ImRG propose the superiority of T5 over T4 it is more interesting to look at the hypotheses for SRG. Short-term estimation of the treatment strategy e ciencies carried out by physicians shows that T1 is signi cantly worse than T2 and T3 for all patients from SRG while long-term estimations show no signi cant di erence. However, by applying the proposed algorithm we found several subgroups where T1 is better than T2 and T3 in long-term. For instance, the description was obtained on the basis of di erence in OS: 4 age , 3 liver enlargement 7, and normal mediastinum status. Testing T1 vs T2 and T1 vs T3 we got p-values 1.4% and 1.9% and power estimations 86% and 94%. There are approximately 60 patients per treatment strategy in the subgroup. OS curves for the patients which can be described by even one of these descriptions is presented in Fig. 1. Con dence intervals of p-values obtained from 1000 sample bootstrap: [0.3%, 1.6%] and [0.7%, 1.7%]. So, the advantage of strategy T1 over T2 and T3 for patients matching the description seems con dent and independent from the certain data. In this paper we have introduced an approach to solving the problem of determining relevant subgroups of patients for therapy optimization. The approach is based on representing data by numerical pattern structures and applying the version of CbO algorithm. The algorithm computes the pattern lattice top-down (starting with the most general descriptions) and its stopping criterion allows one to generate subgroups with signi cant di erences in the e ciency of treatment strategies containing the maximal possible number of patients to satisfy statistical power restrictions. This approach allows one to avoid binarization or using similarity measures on patients, which can result in artifacts. The approach is also not biased by local optimization heuristics used for constructing decision trees and random forests. The situations when various subgroups are not disjoint will be the subject of further study.

Acknowledgments

We would like to thank Prof. Alexander I. Karachunskiy and his colleagues from the Federal Research Institute of Pediatric Hematology, Oncology and Immunology, Moscow, Russia for helpful discussions and for providing us with data on MB-ALL-2008 protocol.

1. Levin , K.A. : Study Design VII. Randomised Controlled Trials . Evident-Based Dentistry 8 , 22 - 23 , Nature Publishing Group ( 2007 )

2. Zeileis , A. , Hothorn , T. , Hornik , K. Model-Based Recursive Partitioning . J. Comput. Graph. Stat . 17 ( 2 ), 492 - 514 ( 2008 )

3. Foster , J. , Taylor , J., Ruberg , S. Subgroup Identi cation from Randomized Clinical Trial Data . Stat. Med . 30 ( 24 ), 2867 - 2880 ( 2011 )

4. Zhao , Y. , Zeng , D. , Rush

A.J.

, Kosorok , M.R.

Estimating

Individualized Treatment Rules Using Outcome Weighted Learning . J. Am. Stat. Assoc . 107 ( 449 ), 1106 - 1118 ( 2012 )

5. Korepanova , N. , Kuznetsov , S.O. , Karachunskiy , A.I. : Matchings and Decision Trees for Determining Optimal Therapy . In: Ignatov D.I . et al. Analysis of Images, Social Networks and Texts, Proc. 3rd International Conference (AIST2014) , Communications in Computer and Information Science , Vol. 436 , pp. 101 - 110 . Springer ( 2014 )

6. Dusseldorp , E. , Conversano , C. , Van Os , B.J. Combining an Additive and TreeBased Regression Model Simultaneously: STIMA . J. Comput. Graph. Stat . 19 ( 3 ), 514 - 530 ( 2010 )

7. Dusseldorp , E., Van Mechelen , I. Qualitative Interaction Trees: a Tool to Identify Qualitative Treatment-Subgroup Interactions . Stat. Med . 33 , 219 { 237 ( 2014 )

8. Lipkovich , I. , Dmitrienko , A. , Denne , J. , Enas , G. Subgroup Identi cation Based on Di erential E ect Search a Recursive Partitioning Method for Establishing Response to Treatment in Patient Subpopulations . Stat. Med . 30 ( 21 ), 2601 - 2621 ( 2011 )

9. Su , X. , Zhou , T. , Yan , X. , Fan , J. , Yang , S. Interaction Trees with Censored Survival Data . Int. J. Biostat . 4 ( 1 ), 2 ( 2008 )

10. Su , X. , Tsai , C.L. , Wang , H. , Nickerson , D.M. , Li , B. Subgroup Analysis via Recursive Partitioning . J. Mach. Learn. Res . 10 , 141 - 158 ( 2009 )

11. Loh , W.-Y., He , X. , Man , M. A Regression Tree Approach to Identifying Subgroups with Di erential Treatment E ects . Stst. Med . 34 , 1818 { 1833 ( 2015 )

12. Ching-Hon , P. , Robison , L.L. , Look , A.T. Acute Lymphoblastic Leukaemia. The Lancet . 9617 ( 371 ), 1030 { 1043 ,

The

Lancet ( 2008 )

13. Karachunskiy , A. , Roumiantseva , J. , Lagoiko , S. (eds.) E cacy and Toxicity of Dexamethasone vs Methylprednisolone Long-Term Results in More than 1000 Patients from the Russian Randomized Multicentric trial ALL-MB 2002 , Letter to the Editor . Leukemia. 29 , 1955 {1958, Nature Publishing Group ( 2015 )

14. Ganter , B. , Kuznetsov , S.O. Pattern

Structures and Their

Projections . In: Proc. Stumme , G. , Delugach , H. (eds.) 9th International Conference on Conceptual Structures (ICCS 2001). Lecture Notes in Arti cial Intelligence , vol. 2120 , pp. 129 { 142 , Springer ( 2001 )

15. Kuznetsov , S.O. Pattern Structures for Analyzing Complex Data . In: Sakai H. et al. (eds.) Proc. 12th International Conference on Rough Sets, Fuzzy Sets, Data Mining and Granular Computing (RSFDGrC 2009 ). Lecture Notes in Arti cial Intelligence , vol. 5908 , pp. 33 { 44 , Springer ( 2009 )

16. Kaytoue , M. , Kuznetsov , S.O. , Napoli , A. , Duplessis , S. Mining Gene Expression Data with Pattern Structures in Formal Concept Analysis . Information Sciences , 10 ( 181 ), 1989 {2001, Elsevier, New York ( 2011 )

17. Kuznetsov , S.O. Scalable Knowledge Discovery in Complex Data with Pattern Structures . In: Maji, P. , Ghosh , A. , et al. (eds.) Proc. 5th International Conference Pattern Recognition and Machine Intelligence (PReMI'2013). Lecture Notes in Computer Science , vol. 8251 , pp. 30 { 41 , Springer ( 2013 )

18. Kuznetsov , S.O. A Fast Algorithm for Computing All Intersections of Objects from an Arbitrary Semilattice . Nauchno-Tekhnicheskaya Informatsiya Seriya 2 - Informatsionnye Protsessy i Sistemy , 1 , 17{ 20 ( 1993 ).

19. A service of the U .S. National Institutes of Health, https://clinicaltrials.gov/ct2/show/NCT01953770

20. Kleinbaum , D.G. , Klein , M. : Kaplan-Meier Survival Curves and the Log-Rank Test . In: Survival Analysis , pp. 55 { 96 , Springer New York ( 2012 )

21. Kaplan , E.L. , Meier , P. Nonparametric Estimation from Incomplete Observations . Journal of the American Statistical Association , 282 ( 53 ), 457 { 481 ( 1958 )

22. May , W.L. Kaplan-Meier Survival Analysis . In: Encyclopedia of Cancer , pp. 1590 - 1593 , Springer Berlin Heidelberg ( 2009 )

23. National Cancer Institute, http://www.cancer.gov/publications/dictionaries/cancerterms?cdrid= 655245

24. National Cancer Institute, http://www.cancer.gov/publications/dictionaries/cancerterms?cdrid= 655269

25. National Cancer Institute, http://www.cancer.gov/publications/dictionaries/cancerterms?cdrid= 655254