=Paper= {{Paper |id=Vol-1430/paper5 |storemode=property |title=Lazy Classication with Interval Pattern Structures: Application to Credit Scoring |pdfUrl=https://ceur-ws.org/Vol-1430/paper5.pdf |volume=Vol-1430 |dblpUrl=https://dblp.org/rec/conf/ijcai/MasyutinKK15 }} ==Lazy Classication with Interval Pattern Structures: Application to Credit Scoring== https://ceur-ws.org/Vol-1430/paper5.pdf
                    Lazy Classication with
                  Interval Pattern Structures:
                 Application to Credit Scoring

           Alexey Masyutin, Yury Kashnitsky, and Sergei Kuznetsov


             National Research University Higher School of Economics
  Scientic-Educational Laboratory for Intelligent Systems and Structural Analysis
                                  Moscow, Russia
      alexey.masyutin@gmail.com,ykashnitsky@hse.ru,skuznetsov@hse.ru



      Abstract. Pattern structures allow one to approach the knowledge ex-
      traction problem in case of arbitrary object descriptions. They provide
      the way to apply Formal Concept Analysis (FCA) techniques to non-
      binary contexts. However, in order to produce classication rules a con-
      cept lattice should be built. For non-binary contexts this procedure may
      take much time and resources. In order to tackle this problem, we in-
      troduce a modication of the lazy associative classication algorithm
      and apply it to credit scoring. The resulting quality of classication is
      compared to existing methods adopted in bank systems.




1 Introduction
   Banks and credit institutions face classication problem each time they con-
sider a loan application. In the most general case, a bank aims to have a tool to
discriminate between solvent and potentially delinquent borrowers, i.e. the tool
to predict whether the applicant is going to meet his or her obligations or not.
Before 1950s such a decision was expert driven and involved no explicit statis-
tical modeling. The decision whether to grant a loan or not was made upon an
interview and after retrieving information about spouse and close relatives [4].
From the 1960s, banks have started to adopt statistical scoring systems that were
trained on datasets of applicants, consisting of their socio-demographic factors
and loan application features. As far as mathematical models are concerned, they
were typically logistic regressions run on selected set of attributes. Apparently,
a considerable amount of research was done in the eld of alternative machine
learning techniques seeking the goal to improve the results of the wide-spread
scorecards [7,8,9,10,11].
   All mentioned methods can be divided into two groups: the rst one provides
the result dicult for interpretation, so-called black box models, the second
group provides interpretable results and clear model structure. The key feature
of risk management practice is that, regardless of the model accuracy, it must
not be the black box. That is why methods such as neural networks and SVM
classiers did not earn much trust within the banking community [4]. The divid-
ing hyperplane in an articial high-dimensional space (dependent on the chosen
kernel) cannot be easily interpreted in order to claim the reject reason for the
client. As far as neural networks are concerned, they also do not provide the
user with a set of reasons why a particular loan application has been approved
or rejected. In other words, these algorithms do not provide the decision maker
with knowledge. The predicted class is generated, but no knowledge is retrieved
from data.
    On the contrary, alternative methods such as association rules and decision
trees provide the user with easily interpretable rules which can be applied to the
loan application. FCA-based algorithms also belong to the second group since
they use concepts in order to classify objects. The intent of the concept can be
interpreted as a set of rules that is supported by the extent of the concept. How-
ever, for non-binary context the computation of the concepts and their relations
can be very time-consuming. In case of credit scoring we deal with numerical
context, as soon as categorical variables can be transformed into a set of dummy
variables. Lazy classication [16] seems to be appropriate to use in this case
since it provides the decision maker with the set of rules for the loan application
and can be easily parallelized. In this paper, we modify the lazy classication
framework and test it on credit scoring data of a top-10 Russian bank.
    The paper is structured as follows: section 2 provides basic denitions. Section
3 argues why the original setting can be inconsistent in case of a large numerical
context and describes the proposed modication and its parameters. Section
4 describes voting schemes that can be used to classify test objects. Section
5 describes the data in hand and some experiments with parameters of the
algorithm. Finally, section 6 concludes the paper.



2 Main Denitions
    First, we recall some standard denitions related to Formal Concept Analysis,
see e.g. [1,2].
    Let G be a set (of objects), let (D, u) be a meet-semi-lattice (of all possible
object descriptions) and let δ : G → D be a mapping. Then (G, D ,δ ), where
D =(D, u), is called a pattern structure [1], provided that the set
δ (G) := {δ (g)|g ∈ G} generates a complete subsemilattice (Dδ , u) of (D, u), i.e.,
every subset X of δ(G) has an inmum uX in (D, u). Elements of D are called
patterns and are naturally ordered by subsumption relation v:
given c, d ∈ D one has c v d ↔ c u d = c. Operation u is also called a similarity
operation. A pattern structure (G, D, δ) gives rise to the following derivation
operators (·) :

                                       l
                                A =         δ(g)       for A ∈ G,
                                       g∈A
                  d = {g ∈ G | d v δ(g)}           for d ∈ (D,   u).
    These operators form a Galois connection between the powerset of G and
(D, u). The pairs (A, d) satisfying A ⊆ G, d ∈ D, A = d, and A = d are called
pattern concepts of (G,D, δ ), with pattern extent A and pattern intent d. Oper-
        
ator (·)   is an algebraical closure operator on patterns, since it is idempotent,
extensive, and monotone [1].
   The concept-based learning model for standard object-attribute representa-
tion (i.e., formal contexts) is naturally extended to pattern structures. Suppose
we have a set of positive examples G+ and a set of negative examples G− w.r.t.
a target attribute, G+ ∩ G− = ∅, objects from
Gτ = G \(G+ ∪ G− ) are called undetermined examples. A pattern c ∈ D is an
α - weak positive premise (classier) i:

                        ||c ∩ G− ||
                                     ≤ α and ∃A ⊆ G+ : c v A
                           ||G− ||

A pattern h ∈ D is an α - weak positive hypothesis i:


                        ||h ∩ G− ||
                                     ≤ α and ∃A ⊆ G+ : h = A
                           ||G− ||

   In case of credit scoring we work with pattern structures on intervals as
soon as a typical object-attribute data table is not binary, but has many-valued
attributes. Instead of binarizing (scaling) data, one can directly work with many-
valued attributes by applying interval pattern structure. For two intervals [a1 , b1 ]
and [a2 , b2 ], with a1 , b1 , a2 , b2 ∈ R the meet operation is dened as [15]:
   [a1 , b1 ] u [a2 , b2 ] = [min(a1 , a2 ), max(b1 , b2 )].
   The original setting for lazy classication with pattern structures can be
found in [3].



3 Modication of lazy classication algorithm
   In credit scoring the object-attribute context is typically numerical. Factors
can have arbitrary distributions and take wide range of values. At the same time
categorical variables and dummies can be present. With relatively large number
of attributes (over 30-40) it produces high-dimensional space of continuous vari-
ables. That is when the result of the meet operator tends to be very specic, i.e.
for almost every g ∈ G only g and gn have the descriptin δ(gn ) u δ(g). This hap-
pens due to the fact that numerical variables, ratios especially, can have unique
values for every object. This results in that for test object gn the number of
positive and negative premises is close to the number of observations in those
context correspondingly. In other words, too specic descriptions are usually not
falsied (i.e. there are no objects of opposite class with such description) and
almost always form either positive or negative premises. Therefore, the idea of
voting scheme for lazy classication in the case of high dimensional numerical
context may turn out to be obscure. Thus, it seems reasonable to seek the con-
cepts with larger extent and with not too specic intent. At the same we would
like to preserve the advantages of lazy classication, e.g. no need to compute a
full concept lattice, easy parallelization etc. The way to increase the extent of the
generated concepts is to consider intersection of the test object with more than
one element from the positive (negative) context. What is the suitable number
of objects to take for intersection? In our modication we consider this as a
parameter subsample size and perform grid search. The parameter is expressed
as percentage of the observations in the context. As subsample size grows, the
resulting intersection δ(g1 ) u . . . u δ(gk ) u δ(g) becomes more generic and it is
more frequently falsied by the objects from the opposite context. Strictly speak-
ing, in order to replicate the lazy classication approach, one should consider all
possible combinations of the chosen number of objects from the positive (neg-
ative) context. Apparently, this is not applicable in the case of large datasets.
For example, having 10 000 objects in positive context and having subsample
size equal to only two objects will produce almost 50 mln combinations for in-
tersection with the test object. Therefore, we randomly take the chosen number
of objects from positive (negative) context as candidates for intersection with
the test object. The number of times (number of iterations) we randomly pick a
subsample from the context is also tuned through grid search. Intuition says , the
higher the value of the parameter the more premises are mined from the data.
However, the obvious penalty for increasing the value of this parameter is time
and resources required for computing intersections. As mentioned before, the
                                                                                          
greater the subsample size, the more it is likely that (δ(g1 ) u . . . u δ(gk ) u δ(g))
contains the object of the opposite class. In order to control this issue, we add a
third parameter which is alpha-threshold. If the percentage of objects from the
positive (negative) context that falsify the premise δ(g1 ) u . . . u δ(gk ) u δ(g) is
greater than alpha-threshold of this context then the premise will be considered
as falsied, otherwise the premise will be supported and used in the classication
of the test object.



4 Voting schemes
   The nal classication of a test object is based on a voting scheme among
premises. In most general case voting scheme F is a mapping:

                                                 −         −
                      F (gtest , h+         +
                                  1 , ..., hp , h1 , ..., hn ) → [−1, 1, ∅]
                                                         +
where gtest is the test object with unknown class, hi      is a positive premise ∀i =
           −
1, p and hj is a negative premise ∀j = 1, n , -1 is a label for negative class, and 1
is a label for positive class (i.e. defaulters). In other words, F is an aggregating
rule that takes premises as input and gives the classication label as an output.
Note that we allow for an empty label. If the label is empty it is said that the
voting rule abstains from classication. There may be dierent approaches to
build up aggregating rules. The voting scheme is built upon weighting function
ω(·), aggregation operator A(·) and comparing operator ⊗.
                          F (ω(·), A(·), ⊗) =
                           = (Api=1 [ω(h+         n       −
                                        i )]) ⊗ (Aj=1 [ω(hj )])
   In order to congure a new weighting scheme it is sucient to dene the op-
erators and the weighting function. In this paper we use the number of positive
versus negative premises. In this case the rule allows the test object to satisfy
both positive and negative premises which decreases the rejection from classi-
cation. The weighting function, aggregation operator and comparing operator
are dened as follows:
                                   X
                         A(h) =     h
                                (
                                 1, if δ(gtest ) v h
                         ω(h) =
                                 0, otherwise
                                (
                                  sign(b − a), if a 6= b
                         a⊗b=
                                  ∅,               a=b

So the label for a test object gn is dened by the following mapping:

                                            −         −
                 F (gtest , h+         +
                             1 , ..., hp , h1 , ..., hn ) =
                     p
                     X                      Xn
                  = ( [δ(gtest ) v h+
                                    i ]) ⊗ (   [δ(gtest ) v h−
                                                             j ])
                      i=1                          j=1

   However, one can think of margin            b − a as a measure for discrimination
between two classes and consider the decision boundary based on receiver oper-
ating characteristic analysis, for instance. This approach is good for decreasing
the number of rejects from classication, but it does not account for the sup-
port of the premises. Naturally, one would give more weight to the premise with
large image (with higher support). Also, if the number of positive and negative
premises is equal the rule rejects from classication.



5 Experiments
   The data we used for the computation represent the customers and their met-
rics assessed on the date of loan application. The applications were approved by
the bank credit policy and the clients were granted the loans. After that the
loans were observed for the fact of delinquency. The dataset is divided into two
contexts positive and negative. The positive context is the set of loans where
the target attribute is present. The target attribute in credit scoring is typi-
cally dened as more than 90 days of delinquency within the rst 12 months
after the loan origination. So, the positive context is the set of bad borrowers,
and the negative context consists of good ones. Each context consists of 1000
objects in order that voting scheme concerned in the second section was appli-
cable. The test dataset consists of 300 objects and is extracted from the same
population as the positive and negative contexts. Attributes represent various
metrics such as loan amount, term, rate, payment-to-income ratio, age of the
borrower, undocumented-to-documented income, credit history metrics etc. The
set of attributes used for the lazy classication trials contained 28 numerical at-
tributes. In order to evaluate the accuracy of the classication we calculate the
Gini coecient for every combination of parameters based on 300 predictions on
the test set. Gini coecient is calculated based on the margin between the num-
ber of objects within positive premises and negative ones. In fact, the margin is
the analog for the score value in credit scorecards. Gini coecient was chosen as
performance metric because it is conventionally used to evaluate the quality of
classication models in credit scoring [4]. When the subsample size is low, the
intersections of the test object description and the members of positive (nega-
tive) context tend to be more specic. That is why, a relatively high number of
premises are mined and used for the classication. As subsample size increases,
the candidates for premises start being generic and it is likely that there exists
certain amount of objects from the opposite context which also satisfy the de-
scription. If alpha-threshold is low, the frequency of rejects from classication is
high. The dynamics of premise mining is demonstrated on the following graphs:




             Fig. 1. The dynamics of negative α - weak premises mining




   The average number of premises mined for a test object is dropping as ex-
pected with the increase in the subsample size and the drop is quicker for higher
alpha-thresholds. This supports the idea, that if lazy classication is run in its
original setting upon the numerical context (i.e. when subsample size consists
of only one object) the number of premises generated is close to the number of
objects in the context, so the premises can be considered as too specic. The
descriptive graph above allows one to expect that the proposed parameters of
the algorithm can be tuned (grid searched), so as to tackle the trade-o between
the high number of premises used for classication and the size of their support.
The average number of positive premises tends to fall slightly faster compared
to negative premises. Below we present the classication accuracy obtained for
dierent combinations of parameters (grid search).
            Fig. 2. The dynamics of α - weak positive premises mining



             Table 1. Gini coecients for the parameters grid search


                                                     Subsample size
Alpha-threshold Number of iterations 0.1% 0.2% 0.3% 0.4% 0.5% 0.6% 0.7% 0.8% 0.9%
0.0%           100                  40% 44% 39% 18% 1%         0%       0%   0%   0%
               150                  35% 46% 35% 5%        0%   0%       0%   0%   0%
               200                  42% 37% 36% 12% 5%         1%       0%   0%   0%
               500                  39% 44% 44% 25% 6%         1%       0%   0%   0%
               1000                 44% 47% 44% 41% 11% 3%              0%   0%   0%
               2000                 44% 48% 46% 36% 17% 4%              0%   0%   0%
0.1%           100                  33% 37% 40% 40% 44% 43% 34% 32% 34%
               150                  41% 34% 33% 43% 41% 47% 41% 37% 37%
               200                  40% 40% 34% 42% 51% 43% 44% 41% 36%
               500                  37% 42% 47% 49% 51% 49% 43% 41% 34%
               1000                 37% 42% 46% 48% 49% 48% 43% 43% 37%
               2000                 39% 43% 45% 49% 51% 49% 46% 41% 38%
               5000                 43% 40% 44% 49% 46% 50% 48% 38% 36%
0.2%           100                  29% 38% 42% 32% 43% 37% 46% 43% 37%
               150                  27% 42% 41% 41% 36% 47% 48% 45% 41%
               200                  32% 40% 43% 42% 42% 49% 46% 47% 48%
               500                  39% 46% 46% 48% 47% 48% 51% 48% 51%
               1000                 41% 50% 48% 47% 49% 53% 52% 52% 47%
               2000                 38% 48% 50% 48% 47% 53% 52% 53% 50%
0.3%           100                  35% 38% 39% 42% 39% 45% 34% 45% 39%
               150                  27% 43% 44% 42% 42% 39% 37% 40% 46%
               200                  34% 46% 47% 45% 49% 47% 45% 45% 52%
               500                  31% 45% 49% 50% 49% 46% 50% 51% 47%
               1000                 37% 48% 49% 49% 49% 47% 52% 51% 51%
               2000                 38% 46% 48% 51% 51% 50% 50% 52% 52%
               5000                 40% 47% 46% 51% 52% 51% 49% 51% 53%
               10000                40% 44% 43% 46% 46% 48% 50% 52% 54%
               20000                40% 43% 42% 46% 47% 49% 50% 52% 53%
0.4%           100                  28% 39% 44% 48% 43% 50% 53% 42% 49%
               150                  34% 42% 43% 42% 43% 52% 50% 45% 47%
               200                  33% 46% 43% 47% 51% 49% 49% 42% 45%
               500                  37% 50% 50% 49% 49% 49% 51% 47% 48%
               1000                 40% 48% 50% 50% 51% 52% 50% 48% 50%
               2000                 37% 48% 49% 49% 49% 47% 52% 49% 51%
               5000                 39% 42% 42% 43% 45% 47% 49% 52% 49%
   We observe the area with zero Gini coecients where the alpha-threshold is
zero and the subsample size is relatively high. That is due to the fact that almost
no premises were mined during the lazy classication run. It is quite intuitive
because as the subsample size grows, the intersection of the subsample with a
test object results in a generic description, which is very likely to be falsied at
least by one object from the opposite context. In this case the rejection from
classication takes place almost for all test objects. The rst thing that is quite
intuitive is that the more iterations are produced, the higher is the Gini on
average:




Fig. 3. Average Gini grouped by the dierent number of iterations (over all other pa-
rameter values)




   The more times the subsamples are randomly extracted the more knowledge
(in terms of premises) is generated. By increasing the number of premises used for
classication according to voting scheme, we are likely to capture the structure of
the data in more detail. However, the number of iterations is not the only driver
of the classication accuracy in our case. We nd a range with relatively high
Gini in the area of mild alpha-threshold and relatively high subsample size. It
also seems natural as soon as the support of a good predictive rule (i.e. premise)
is expected to be higher than its support in the opposite context. We elaborate
further and run additional grid search in range of parameters providing high
Gini coecient:
       Table 2. Gini coecients for the parameters grid search on specied area


                                                Subsample size
Alpha-thresh-old Number of iterations 1.0% 1.1% 1.2% 1.3% 1.4% 1.5%
0.3%              500                  51% 49% 48% 43% 41% 38%
                  1000                 52% 51% 48% 45% 43% 39%
                  2000                 54% 53% 49% 47% 46% 38%
                  5000                 55% 52% 50% 47% 46% 40%
                  10000                56% 53% 50% 47% 47% 40%
                  20000                55% 53% 51% 46% 48% 41%




   According to performed grid search the range with the highest Gini (55%-
56%) on the test sample is in range with following parameter values: alpha-
threshold = 0,3%, number of iterations = 10000, subsample size = 1,0%. The
result was compared to three benchmarks that are traditionally used in the credit
scoring within the bank system: logistic regression, scorecard and decision tree.
It should be cleared what is implied by the scorecard classier. Mathematical
architecture of the scorecard is based on logistic regression which takes the trans-
formed variables as input. The transformation of the initial variables which is
typically used is weight of evidence transformation (WOE-transformation [13]).
It is wide-spreaded in credit scoring to apply such a transformation to the input
variables as soon as it accounts for non-linear dependencies and it also provides
certain robustness coping with potential outliers. The aim of the transformation
is to divide each variable into no more than k categories. The thresholds are
derived so as to maximize the information value of a variable [13]. Having each
variable binned into categories, the log-odds ratio is calculated for each category.
Finally, instead of initial variables the discrete valued variables are considered as
input in logistic regression. The properties of the decision tree were as follows:
we ran CART with two possible child nodes from each parent node. The crite-
rion for optimal threshold calculation was the greatest entropy reduction. The
number of terminal nodes was not explicitly restricted; however, the minimum
size of the terminal node was set to 50. As far as logistic regression is concerned,
the variable selection was performed based on stepwise approach [14]. As for
scorecard, the variables were initially selected based on their information value
after the WOE-transformation. The comparison of the classiers performance
based on test sample of 300 objects is given in Table 3.
  Table 3. Modied lazy classication algorithm versus models adopted in the bank


                                          Gini on test sample
Logistic regression                       47.38%
Scorecard
                                          51.89%
(Logistic based on WOE-transformation)
CART (minsize= 50)                        54.75%
MLCA
(s = 1%, a=0.3%,                          56.30%
n=10000)




6 Conclusion
   When dealing with large numerical datasets, lazy classication may be prefer-
able to classication based on explicitly generated classiers, since it requires less
time and memory resources [3]. However, the original lazy classication setting
in case of high dimensional numerical feature space meets certain limitation.
The limitation is that, when intersecting descriptions of a test object and every
object from the context, one is likely to acquire premises with image consisting
only of those two objects. In other words, the premises tend to be very specic
for the context and, therefore, the number of positive and negative premises is
likely to be equal to the number of the objects in the contexts. The weighting
cannot be considered helpful in this case as soon as the premises will have very
similar low support. In this paper, we modied the original lazy classication
setting by making it, in fact, a stochastic procedure with three parameters: sub-
sample size, number of iterations and alpha-threshold. In eect, the modied
algorithm mines the premises with relatively high support that will be used for
the classication of the test object. The classication is then carried out upon
the predened voting scheme. We applied the introduced procedure to the retail
loan classication problem. The data we used for was provided during the pilot
project with one of the top-10 banks in Russia, the details are not provided due
to non-disclosure agreement. The positive and negative contexts both had 1000
objects with 28 numerical attributes. The accuracy of the algorithm was evalu-
ated on the test dataset consisting of 300 objects. Gini coecient was chosen as
accuracy metric. We performed the basic grid search by running the modied
lazy classication algorithm with dierent parameter values. The classication
accuracy of the algorithm was compared to the conventionally adopted models
used in the bank. The benchmark models were logistic regression, scorecard and
decision tree. The proposed algorithm outperforms the logistic regression the
scorecard with the subsample size parameter around 1%, alpha-threshold equal
to 0,3% and with number of iterations over 5000. The performance of the decision
tree is at the comparable level with the proposed algorithm, however, the mod-
ied lazy classication is slightly better in terms of Gini coecient. As an area
for further research, one can consider and compare accuracy when other voting
schemes are used. It is expected that taking into account premises' specicity
one can improve overall accuracy of the classication algorithm or, alternatively,
one will reach the same accuracy given less number of iterations, which can save
the time resources required for the calculations.



References
1. Bernhard Ganter and Sergei Kuznetsov, Pattern structures and their projections,
   in Conceptual Structures: Broadening the Base, Harry Delugach and Gerd Stumme,
   Eds., vol. 2120 of Lecture Notes in Computer Science, pp. 129142. Springer,
   Berlin/Heidelberg, 2001.
2. Ganter, B., Wille, R.: Formal concept analysis: Mathematical foundations. Springer,
   Berlin, 1999.
3. Sergei O. Kuznetsov, Scalable knowledge discovery in complex data with pattern
   structures., in PReMI, Pradipta Maji, Ashish Ghosh, M. Narasimha Murty, Kuntal
   Ghosh, and Sankar K. Pal, Eds. 2013, vol. 8251 of Lecture Notes in Computer
   Science, pp. 3039, Springer.
4. Thomas L., Edelman D., Crook J. (2002) Credit Scoring and Its Applications, Mono-
   graphs on Mathematical Modeling and Computation, SIAM: Pliladelphia, pp. 107
   117
5. Bigss, D., Ville, B., and Suen, E. (1991). A Method of Choosing Multiway Partitions
   for Classication and Decision Trees. Journal of Applied Statistics, 18, 1, 49-62.
6. Naeem Siddiqi, Credit Risk Scorecards: Developing and Implementing Intelligent
   Credit Scoring, WILEY,ISBN: 978-0-471-75451-0, 2005
7. B Baesens, T Van Gestel, S Viaene, M Stepanova, J Suykens, Benchmarking state-
   of-the-art classication algorithms for credit scoring, Journal of the Operational
   Research Society 54 (6), 627-635, 2003
8. Ghodselahi A., A Hybrid Support Vector Machine Ensemble Model for Credit Scor-
   ing, International Journal of Computer Applications (0975  8887), Volume 17
   No.5, March 2011
9. Yu, L., Wang, S. and Lai, K. K. 2009. An intelligent agent-based fuzzy group decision
   making model for nancial multicriteria decision support: the case of credit scoring.
   European journal of operational research. vol. 195. pp.942-959.
10. Gestel, T. V., Baesens, B., Suykens, J. A., Van den Poel, D., Baestaens, D.-E.
   and Willekens, B. 2006. Bayesian kernel based classication for nancial distress
   detection. European journal of operational research. vol. 172. pp. 979-1003.
11. P. Ravi Kumar and V. Ravi, Bankruptcy Prediction in Banks and Firms via
   Statistical and Intelligent Techniques-A Review, European Journal of Operational
   Research, Vol. 180, No. 1, 2007, pp. 1-28.
12. Sergei O. Kuznetsov and Mikhail V. Samokhin, Learning closed sets of labeled
   graphs for chemical applications., in ILP, Stefan Kramer and Bernhard Pfahringer,
   Eds. 2005, vol. 3625 of Lecture Notes in Computer Science, pp. 190 208, Springer
13. SAS Institute Inc. (2012), Developing Credit Scorecards Using Credit Scoring for
   SAS   R Enterprise MinerTM 12.1, Cary, NC: SAS Institute Inc.
14. Hocking, R. R. (1976) "The Analysis and Selection of Variables in Linear Regres-
   sion," Biometrics, 32.
15. Mehdi Kaytoue, Sergei O. Kuznetsov, Amedeo Napoli, and Sebastien Duplessis,
   Mining gene expression data with pattern structures in formal concept analysis,
   Information Sciences, vol. 181, no. 10, pp. 19892001, 2011.
16. Veloso, A. & Jr., W. M. (2011), Demand-Driven Associative Classication.,
   Springer.
Algorithm 1 Lazy Classication by Sub-Samples in Numeric Context
Input: {P osdata , N egdata }  positive and negative numerical contexts.
N + , N −  number of objects in the contexts. It is preferable that the positive and
negative contexts are of the same size.
M  number of attributes.
sub.smpl  percentage of the context randomly used for intersection with the test
object (parameter).
num.iter  number of iterations (resamplings) during the premise mining (parameter).
alpha.threshold is the maximum allowable percentage of the opposite context for that
the premise is not falsied (parameter).
t  test object.

Output: margint  measure that is produced by the voting rule.
yt  class labels predicted for the test object.
   for iter from 1 to num.iter do
                                                   +
     S=random.sample(P osdata ,size=sub.smpl·N )  mine positive α - weak premises
     descr = δ(g1 ) u ... u δ(gs ) u δ(t)
     N egimage = {x ∈ descr |x ∈ N egdata }
                                             −
     if ||N egimage || < alpha.threshold · N   then
        Add descr to positive α - weak premises set
     else
       Do nothing
     end if
     S=random.sample(N egdata ,size=sub.smpl   · N − )  mine α - weak negative
     premises
     descr = δ(g1 ) u ... u δ(gs ) u δ(t)
     P osimage = {x ∈ descr |x ∈ P osdata }
                                             +
     if ||P osimage || < alpha.threshold · N   then
        Add descr to negative α - weak premises set
     else
       Do nothing
     end if
  end for
  p = dim(set of positive α - weak premises)
  n = dim(set of negative α - weak premises)
  Choose voting scheme: A(·), w(·), ⊗
  pos.power = Api (w(h+i ))
                        −
  neg.power = An j (w(hj ))
  margin = pos.power − neg.power
  yt = pos.power ⊗ neg.power