=Paper= {{Paper |id=Vol-3032/presentation2 |storemode=property |title=Towards Explainable Relational Boosting via Propositionalization |pdfUrl=https://ceur-ws.org/Vol-3032/SEDAMI2021-Paper3-BlazSkrlj.pdf |volume=Vol-3032 |authors=Blaz Skrlj,Nada Lavrac }} ==Towards Explainable Relational Boosting via Propositionalization== https://ceur-ws.org/Vol-3032/SEDAMI2021-Paper3-BlazSkrlj.pdf
    Towards Explainable Relational Boosting via
               Propositionalization

                 Blaž Škrlj1,2[0000−0002−9916−8756] and Nada Lavrač1,2
                     1
                        Jožef Stefan Institute, Jamova 39, Ljubljana, SI
      2
          Jožef Stefan International Postgraduate School, Jamova 39, Ljubljana, SI
                              {blaz.skrlj,nada.lavrac}@ijs.si



          Abstract. Many problems can be modeled by using relational struc-
          tures, such as graphs or relational databases. Propositionalization refers
          to the process of transforming a relational database into a propositional
          (tabular) representation, suitable for a wide repertoire of machine learn-
          ing algorithms, including contemporary black-box classifiers such as deep
          neural networks or ensemble-based classifiers such as Gradient boosted
          tree ensembles. Even though such black-box classifiers often outperform
          symbolic learners, their results are hard to interpret by humans. This
          paper explores the means to improve black-box classifier interpretabil-
          ity in a relational setting. To this end, we conducted a series of experi-
          ments to evaluate how relational features, constructed using Aleph, RSD
          and Tertius propositionalization algorithms, impact the interpretability
          and performance of black-box classifiers such as Gradient boosted tree
          ensembles. We show how improved interpretability can be achieved by
          combining XGBoost with SHAP, an algorithm that leverages the ideas
          from coalitional game theory to assign importance scores to the obtained
          relational features, offering both the state-of-the-art performance as well
          as insights into the feature impacting individual predictions.

          Keywords: Relational learning, boosting, propositionalization


1     Introduction
Relational data mining is widely used in many areas of science, including biology,
sociology, and more. The most common task solved by relational learners is entity
classification, which is the focus of this work. In relational data classification,
two conceptually different approaches are commonly used3 :
 – In the first approach, a learner can learn directly from the given data. Here, a
   relational learner, e.g., Aleph [20], is given a relational database, with class-
   labelled training instances as input to the learner. The learner leverages the
   relational structure of the database to construct a set of interpretable rules,
   which cover the majority of positive examples and can be used as a classifier
   (new instances can be classified using the learned rules).
3
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
         Škrlj and Lavrač.

 – The second approach implements a two-step process named propositionaliza-
   tion. In the first step, the relational data is transformed into a propositional
   format, and in the second step, the classification of transformed tabular
   data is solved by effective propositional learning algorithms, including state-
   of-the-art deep neural networks [8] or Gradient boosting machines [3].

    In this work we focus exclusively on the second group of algorithms, i.e.
propositionalization-based learners. Propositionalization algorithms take a rela-
tional database as input and transform it into a propositional table, consisting
of labeled instances described by features that have been automatically derived
from the underlying relational structure. These features correspond to simple
conjuncts of the form fi := p1 ∧ p2 · · · ∧ pk , where fi denotes the i-th rela-
tional feature (a conjunct of elementary relational features) and pj represents
the j-th elementary relational feature. A valid feature could be, for example,
fi := atomType(a) ∧ charge(b). Each elementary relational feature pj describes
some relational property of the dataset. For example, if instances correspond
to researchers who published papers in conferences, pj will return value true or
false for a given author. Property pj can be a rather complex relational query
involving multiple relations (as long as that query returns either true or false), or
the result of some other aggregation function. For example, the property could
be “does author X have a paper published at the ECML/PKDD conference?” or
“how many papers has author X published at the ECML/PKDD conference?”.
While such feature construction could be done manually by a data analyst, we
are only interested in automated propositionalization methods.4
    The remainder of this work is structured as follows. First, we discuss some
of the related work focusing on the task of propositionalization. Next, we discuss
the notion of boosting and how it can be applied in combination with propo-
sitionalization. We next describe the experimental setting used to benchmark
a selection of existing propositionalization approaches alongside a number of
widely used propositional learners. Finally, we discuss the results and possible
implications of the proposed approach.


2     Background technologies
In the following section, we discuss the related work and techniques relevant to
this work.

2.1     Propositionalization
To be able to apply standard machine learning algorithms on multi-relational
data, the considered relational database should be transformed into a single
tabular data format, where each row represents a single data instance, and each
4
    Note that simplest relational features may correspond even to single propositional
    features of the form table-attribute-value, which are evaluated true if the given at-
    tribute has a particular value for a given training instance.
           Towards Explainable Relational Boosting via Propositionalization

column represents a feature (a simple attribute-value, a relational query pj , or
a conjunction of such queries). This transformation into symbolic vector space
(i.e. a symbolic data table format), referred to as propositionalization, is defined
below.

Definition 1 (Propositionalization). Consider the input of a given data type
and format and heterogeneous background knowledge of various data types and
formats. Propositionalization corresponds to the process of constructing a tabu-
lar (matrix) representation of the data enriched with the background knowledge,
where each row represents a single data instance and each column represents a
feature in a d-dimensional binary vector space Bd .

    Propositionalization thus transforms a complex data structure such as a re-
lational database to a simpler, binary vector space, where each feature reflects
the presence (or absence) of the relational property modelled by the constructed
relational feature for the given training instance. Selected propositionalization
algorithms used in this study are briefly described below.

RSD [23] is a relational subgroup discovery algorithm composed of two main
   steps: the propositionalization step and the (optional) subgroup discovery
   step. The output of the propositionalization step can also be used as input
   to other propositional learners. RSD effectively produces an exhaustive list
   of first-order features that comply with the user-defined mode constraints,
   similar to those of Progol [16] and Aleph [20]. Furthermore, RSD features
   satisfy the connectivity requirement, which imposes that no feature can be
   decomposed into a conjunction of two or more features. Mode declarations
   define the algorithm’s syntactic bias, i.e. the space of possible features.
Tertius (Treeliker) [7] is a top-down rule discovery system, incorporating first-
   order clausal logic. As no particular prediction target is specified beforehand,
   Tertius can be seen as an ILP system that learns rules in an unsupervised
   manner. Its relevance lies in the fact that Tertius encompasses 1BC, i.e.
   relational data is handled through 1BC transformation [6].
Aleph [20] is the most popular ILP algorithm, which is actually an ILP toolkit
   with many modes of functionality: learning of theories, feature construction,
   incremental learning, etc. It includes a feature construction functionality,
   which makes it also act as a propositionalization approach. Aleph uses mode
   declarations to define the syntactic bias. Input relations are Prolog clauses,
   defined either extensionally or intensionally.


2.2   Learning by Boosting

In this section, we discuss the notion of boosting machine learning, focusing on
the aspects used in this work. We begin with an overview of this methodology.
    Boosting refers to a group of learning algorithms. The key idea of boosting
revolves around the fact that a series of weak learners can, when joined, form a
strong learner [9]. In general, boosting algorithms iteratively construct a strong
        Škrlj and Lavrač.

classifier, where weak learners’ predictions are used to weight the following mod-
els, emphasizing misclassifications: misclassified input data gain a higher weight
and examples that are classified correctly lose weight. Thus, future weak learn-
ers focus more on the examples that previous weak learners have mis-classified.
Boosting algorithms are described, for example, in the AnyBoost framework [12].
Furthermore, it was shown theoretically that boosting performs gradient descent
in a function space using a convex cost function. Commonly, boosting-based en-
sembles are not directly explainable. We next discuss how SHAP [11], a game
theory-based approach, can be used to overcome this issue partially.

2.3   Explaining black-box models
Recent trends in machine learning attempt to solve hard problems using black-
box, non-interpretable models. Even though such models are not explainable per
se, individual predictions can be approximated using existing, symbolic learners
in order to obtain feature relevances for a given classifier. As the final part of
the learning workflow proposed in this work we exploit the recently introduced
SHAP tool [11] to identify which features, constructed using propositionalization
approaches, were the most relevant for a given classification problem.
     Shapley regression values can be interpreted as feature importances for linear
models in the presence of multicolinearity. The method begins by training the
model on all subsets S ⊆ F , where F is the set of all features. Each feature is
assigned a numeric score denoting the impact of this feature on a given model’s
performance. This impact is computed by training two models; one with a given
feature and one without. Predictions of the two models are compared on a given
input. As the effect of withholding a single feature from the model depends on all
other features, the performance differences are computed for all possible subsets
S ⊆ F \ fi ; where i denotes fi -th feature. Shapley values ρ can be thus (for the
fi -th feature) defined as:
                   X        |S|!(|F | − |S| − 1)!                          
           ρi =                                    mS∪fi (xS∪fi ) − mS (xS ) ;
                                     |F |!
                  S⊆F \fi

where mS∪fi (xS∪fi ) corresponds to the model’s performance when the fi -th
feature is considered and mS (xS ) when it is not considered. The xS corresponds
to feature values of the feature set S. The used SHAP methodology employs
efficient sampling schemes for computing feature relevance. In this work, we use
the “tree-explainer” module of SHAP. We refer the interested reader to [11] for
a detailed overview of SHAP. The empirical setup is discussed next.


3     Propositionalization and semantic data mining
Recall the definition of propositionalization. (Definition 1) that involves hetero-
geneous background knowledge of various data types and formats with the aim to
construct a tabular (matrix) representation of the data enriched with the back-
ground knowledge, where each column represents a feature in a d-dimensional
             Towards Explainable Relational Boosting via Propositionalization

binary vector space Bd . Let τ represent a mapping from a relational database
(RDB) to a d-dimensional binary vector space B. Let n represent the number
of instances from the target table, for which relational features are constructed.
Each propositionalization algorithm thus attempts to identify relational features
representing the initial RDB, and can be described as the mapping:

                                     τ : RDB → B n×d .                          (1)

      An example b ∈ B 5×5 can be represented as




                                                                 p5
                                                            p1 ∧
                                p2

                                       p2

                                              p3

                                                     p2
                              p1 ∧

                                     p3 ∧

                                            p1 ∧

                                                   p5 ∧

                                                          p4 ∧
                              1       1      1      1       0        
                              0       1      0      0       1        
                              0       0      1      0       0
                                                                     
                                                                      
                               0       1      0      0       1
                                                                     
                               1       0      0      0       0


    Features pj describing the data can either be low-level features describing the
actual experimental data or higher-level semantic features representing higher-
level concepts in ontologies or taxonomies encoded as the background knowl-
edge. The latter refers to the semantic data mining setting. Higher-level features
have higher generalization potential, given that the use of these features ensures
higher coverage of induced rules.


4      Proposed approach and experimental setting

In this section we describe the experimental setting used to evaluate the per-
formance of various classifiers trained using some of the mentioned proposition-
alization approaches. The two-step approach described next is summarized in
Figure 1. Here, a relational database is used as input, where only a single en-
tity (table — green circle) is considered for learning, and all others are used to
construct features related to this target table. The algorithm yields a proposi-
tionalized database of dimension n × d, where n corresponds to the number of
examples and d to a number of (selected) relational features. Note that values
of feature vectors are boolean {0, 1}.


4.1     Propositionalization algorithms implementation

We tested the following propositionalization methods, which represent a selection
of well-established approaches developed in the Inductive Logic Programming
(ILP) community: RSD [23], Tertius [7] and Aleph [20], presented in Section 2.1.
        Škrlj and Lavrač.




                                                d




                                                                 Boosting
                                      Propositionalized data
                Relational data

Fig. 1. Schematic representation of propositionalization combined with boosting. A
d-dimensional representation of the input relational structure is used as input to a
boosting-based learner.


   As individual propositionalization algorithms are implemented in different
programming languages using different frameworks, executing all of the ap-
proaches in the same learning setting can be very challenging. The RSD and
Aleph algorithms are written in Prolog, while Tertius is in Java. We built on
a recent effort to unify propositionalization approaches under a joint Python
framework called PyRDM [22]5 , which is freely accessible, and was updated as
part of this work.6 A contribution of this work is also a significant extension of
the PyRDM library to include the newest libraries for data pre-processing and
manipulation, including Orange 3 [5], Pandas [13], and Scikit-learn [17].

4.2    The classifiers used
In this subsection, we describe the classifiers that learned from the transformed,
propositionalized databases. The classifiers were tested on all the propositional-
ization methods described in the previous section.
Support vector machines. We use the implementation, available as part of
   the libSVM library [2]. We used C = 20 for the regularization value.
Gradient boosting machines. The GBM implementation [17] used is sup-
   ported in Scikit-learn. We use the default settings for learning. We varied
   the number of estimators from the range: 10,20,50,100.
Logistic regression. This classifier is also implemented in Scikit-learn [17]. It
   uses a logistic function to model a binary distributed (dependent) random
   variable. The regularization parameter was set to 1 (default setting).
Extra-randomized trees. Extremely randomized trees are an algorithm that
   attempts to capture relevant patterns by constructing larger, random trees.
   We use the implementation supported in Scikit-learn [17]. We set the number
   of estimators to 40. We set the number of samples needed for a split to 2.
   We varied the number of estimators from the range: 10,20,50,100.
5
    https://github.com/xflows/rdm
6
    https://github.com/xflows/rdm
           Towards Explainable Relational Boosting via Propositionalization




Fig. 2. Carcinogenesis database scheme. Propositionalization algorithms attempt to
traverse the database by foreign keys and capture the patterns of relevance to the task.


Extreme Gradient Boosting [3]. The known XGB classifier is one of the
   state-of-the-art learners commonly used in contests. We varied the number
   of estimators from the range: 10,20,50,100.

4.3   Data sets used
In this section, we describe the data sets considered for experimental evaluation.
The data sets were obtained from the CTU repository [15]. The data sets were
selected as they include diverse schemas and come from different domains. An
example scheme for the Carcinogenesis data is shown in Figure 2.

Carcinogenesis [21] task is to predict the carcinogenicity of a diverse set of
   chemical compounds. The data set was obtained by testing different chem-
   icals on rodents, where each trial would take several years and hundreds of
   animals. The data set consists of 329 compounds, of which One hundred
   eighty-two are carcinogens.
Mutagenesis [4] task addresses the problem of predicting mutagenicity of aro-
   matic and heteroaromatic nitro compounds. Predicting mutagenicity is an
   important task as it is very relevant to the prediction of carcinogenesis. The
   compounds from the data are known to be more structurally heterogeneous
   than in any other ILP data set of chemical structures. The database contains
   230 compounds, of which 138 have positive levels of mutagenicity and are
   labelled as ‘active’. Others have class value ‘inactive’ and are considered to
   be negative examples. We took the data sets of the original paper [4], where
   the data was split into two subsets: 188 compound data set and a smaller
   data set with 42 compounds.
        Škrlj and Lavrač.

Trains [14] the task addresses the prediction of a given train’s direction based
   on the cargo and other trains’ properties. This is one of the canonical data
   sets for relational learning.
Facebook [10] the data set, consisting of 10,137 rows and 265 columns, repre-
   sents Facebook friendships, where the task is to predict the gender of users.

4.4   The scoring scheme
We validated individual propositionalization-learner pairs using 10-fold stratified
cross-validation, as implemented in PyRDM. For each fold split, we run first a
given propositionalization algorithm, followed by a learner. This way, folds re-
main consistent; hence the algorithms’ results can be compared. We computed
Accuracy and AUCs for individual classifiers (all are binary classification prob-
lems). If the propositionalization-learner pair was not able to finish in 3 hours,
we marked such runs as unsuccessful. We present the results as average classifier
ranks, plotted in the same space as vertical lines along a horizontal line corre-
sponding to all possible algorithm combinations. Such representation is sufficient,
as we were mainly interested in whether Extreme Gradient Boosting (XGB) per-
forms well when combined with the considered transformation methods, as this
relationship was not explored before. Further, good performance of XGB would
indicate explanations are sensible, as if the model performed badly, explanations
can be less reliable. We use the SHAP algorithm as follows. For a given data
set, we train the XGB model on all instances. We compute the Shapley matrix,
which contains feature relevance information. Feature importances are finally
visualized for selected (correctly classified) examples.


5     Results
We present the results obtained using the proposed benchmark. We begin by
showing bar diagrams where differences between classifiers are shown with re-
spect to a given propositionalization algorithm. Next, we present the critical
distance diagrams showing classifier-propositionalization algorithm performance.
Finally, we show how XGB’s predictions can be interpreted using SHAP.

5.1   Results - classification
We observe the best overall classifier performances were obtained when RSD was
used as the propositionalization method. As can be observed in the following
results corresponding to individual classifier performances, the Aleph proposi-
tionalization performed the worst on average. The average rank diagram showing
the performance of individual classifier-propositionalization pairs (or just propo-
sitionalization) is shown in Figures 3, 4 and 5. Note that we omit computation
of critical distances due to reasons stated in [1].
    The classification results further indicate that propositionalization represents
the key aspect of the overall classification performance. This observation would
           Towards Explainable Relational Boosting via Propositionalization

                                                                                                                             SVM treeliker

                             LR treeliker                                                                                            DT aleph

                             XT aleph                                                                                                LR aleph

                             DT RSD                                                                                                SVM aleph

                             DT treeliker                                                                                         MF treeliker

                             XT treeliker                                                                                        XGB treeliker

                             SVM RSD                                                                                                 MF RSD

                             LR RSD                                                                                          GBM treeliker

                             GBM RSD                                                                                               GBM aleph

                             XT RSD                                                                                                 MF aleph

                             XGB RSD                                                                                               XGB aleph




             1       2        3       4       5       6       7       8       9       10    11   12   13   14   15     16   17       18      19     20    21




Fig. 3. Propositionalization + learner Accuracy comparison across data sets. Average
ranks are shown.                                              LR treeliker

                     XT aleph                                                                                                             SVM aleph

                     SVM treeliker                                                                                                   XGB treeliker

                     XGB RSD                                                                                                         GBM treeliker

                     XT treeliker                                                                                                         GBM aleph

                     DT treeliker                                                                                                        MF treeliker

                     DT RSD                                                                                                                 LR aleph

                     XT RSD                                                                                                                DT aleph

                     GBM RSD                                                                                                                MF RSD

                     SVM RSD                                                                                                               MF aleph

                     LR RSD                                                                                                               XGB aleph




                 1       2        3       4       5       6       7       8       9    10   11   12   13   14   15    16    17      18      19    20     21




Fig. 4. Propositionalization + learner AUC comparison across data sets. Average ranks
are shown.

                                                                                                                     treeliker

                                              RSD                                                                       aleph




                                                  1                   2                     3

Fig. 5. Comparison of propositionalization approaches (classifier accuracy was aver-
aged).



imply that no matter how well the classifier can learn if the input space is not
representative of the modelled problem, the classifier will not learn well, which is
        Škrlj and Lavrač.

indeed a sensible assumption in the relational setting, as the considered propo-
sitionalization algorithms consider different aspects of the relational database,
and do not output equal representations.


5.2   Results - explanation example

In this section, we look at an example where we analysed feature importances
via SHAP. The computed importance-based diagram is shown in Figure 6.




Fig. 6. Feature importances using Shapley values (Mutagenesis). Here, an XGB clas-
sifier was trained on the feature space, constructed with RSD.


    Conjuncts of atom types and charges emerged as relevant for the classi-
fication performance for the mutagenesis (42) data set (Figure 6), indicating
that the two attributes are highly relevant for correctly predicting the muta-
genicity. Note, however, that the propositionalization is not able to perform
discretization, meaning that the explanations are potentially too specific for non-
categorical/ordinal variables. Further, it can be observed that there are at least
three relational features that impact the model’s prediction the most, indicating
that diverse relational features were used by the XGBoost learner.


6     Discussion and conclusions

In this section, we discuss the obtained results and present some of the relevant
future prospects regarding boosting in a relational setting.
            Towards Explainable Relational Boosting via Propositionalization

    Overall, we tested a set of different learners trained on feature matrices ob-
tained by using three different propositionalization algorithms. We show that
boosting based learners outperform many other classifiers. Further, we believe
the conducted experiments show a direction for future research, where scalability
could be an issue (boosting is very scalable). One of the results of this study also
shows that the RSD algorithm is one of the top-performing propositionalization
algorithms. Although it performs well, we believe that RSD and other proposi-
tionalization algorithms do not scale well, which is an issue that can be addressed
using ideas proposed in [18, 19], which will be addressed in future work. To the
best of our knowledge, this is one of the first works which utilize SHAP to ex-
plain relational features. The obtained results indicate that Shapley values are
as such one of the possible ways to explain how the black box learners learn in
a relational setting. We have yet to explore the different discretization schemes,
which would yield more relevant explanations. We believe the presented results
demonstrate the relevance of such approaches and are to our knowledge the first
of such kind.


7     Availability

The code is freely available as part of the PyRDM library7 .


References

 1. Benavoli, A., Corani, G., Demšar, J., Zaffalon, M.: Time for a change: a tutorial for
    comparing multiple classifiers through bayesian analysis. The Journal of Machine
    Learning Research 18(1), 2653–2688 (2017)
 2. Chang, C.C., Lin, C.J.: Libsvm: a library for support vector machines. ACM trans-
    actions on intelligent systems and technology (TIST) 2(3), 27 (2011)
 3. Chen, T., He, T., Benesty, M., Khotilovich, V., Tang, Y.: Xgboost: extreme gradi-
    ent boosting. R package version 0.4-2 pp. 1–4 (2015)
 4. Debnath, A.K., Lopez de Compadre, R.L., Debnath, G., Shusterman, A.J., Han-
    sch, C.: Structure-activity relationship of mutagenic aromatic and heteroaromatic
    nitro compounds. correlation with molecular orbital energies and hydrophobicity.
    Journal of Medicinal Chemistry 34(2), 786–797 (1991)
 5. Demšar, J., Curk, T., Erjavec, A., Gorup, Č., Hočevar, T., Milutinovič, M., Možina,
    M., Polajnar, M., Toplak, M., Starič, A., et al.: Orange: data mining toolbox in
    python. The Journal of Machine Learning Research 14(1), 2349–2353 (2013)
 6. Flach, P., Lachiche, N.: 1BC: A first-order Bayesian classifier. In: Proceedings of
    the 9th International Workshop on Inductive Logic Programming (ILP-99). pp.
    92–103. Springer (1999)
 7. Flach, P.A., Lachiche, N.: Confirmation-guided discovery of first-order rules with
    Tertius. Machine Learning 42(1/2), 61–95 (2001)
 8. Goodfellow, I., Bengio, Y., Courville, A.: Deep Learning. MIT Press (2016), http:
    //www.deeplearningbook.org
7
    https://github.com/xflows/rdm
        Škrlj and Lavrač.

 9. Kearns, M., Valiant, L.: Cryptographic limitations on learning boolean formulae
    and finite automata. Journal of the ACM (JACM) 41(1), 67–95 (1994)
10. Leskovec, J., Mcauley, J.J.: Learning to discover social circles in ego networks. In:
    Advances in neural information processing systems. pp. 539–547 (2012)
11. Lundberg, S.M., Lee, S.I.: A unified approach to interpreting model predictions.
    In: Guyon, I., Luxburg, U.V., Bengio, S., Wallach, H., Fergus, R., Vishwanathan,
    S., Garnett, R. (eds.) Advances in Neural Information Processing Systems 30,
    pp. 4765–4774. Curran Associates, Inc. (2017), http://papers.nips.cc/paper/
    7062-a-unified-approach-to-interpreting-model-predictions.pdf
12. Mason, L., Baxter, J., Bartlett, P.L., Frean, M.R.: Boosting algorithms as gradient
    descent. In: Advances in neural information processing systems. pp. 512–518 (2000)
13. McKinney, W.: pandas: a foundational python library for data analysis and statis-
    tics. Python for High Performance and Scientific Computing 14 (2011)
14. Michalski, R.S.: A Theory and Methodology of Inductive Learning. In: Michal-
    ski, R.S., Carbonell, J.G., Mitchell, T.M. (eds.) Machine Learning: An artificial
    intelligence approach, pp. 83–129. Palo Alto: Tioga Publishing Company (1983)
15. Motl, J., Schulte, O.: The ctu prague relational learning repository (2015)
16. Muggleton, S.: Inverse entailment and Progol. New Generation Computing, Special
    issue on Inductive Logic Programming 13(3-4), 245–286 (1995)
17. Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O.,
    Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., et al.: Scikit-learn: Ma-
    chine learning in python. Journal of machine learning research 12(Oct), 2825–2830
    (2011)
18. Perovšek, M., Vavpetič, A., Cestnik, B., Lavrač, N.: A wordification approach
    to relational data mining. In: Fürnkranz, J., Hüllermeier, E., Higuchi, T. (eds.)
    Proceedings of the 16th International Conference on Discovery Science, Singapore,
    October 6–9, 2013. Lecture Notes in Computer Science, vol. 8140, pp. 141–154.
    Springer (2013)
19. Perovšek, M., Vavpetič, A., Kranjc, J., Cestnik, B., Lavrač, N.: Wordification:
    Propositionalization by unfolding relational data into bags of words. Expert Syst.
    Appl. 42(17-18), 6442–6456 (2015)
20. Srinivasan, A.: The Aleph manual. http://web.comlab.ox.ac.uk/oucl/
    research/areas/machlearn/Aleph/ (1999)
21. Srinivasan, A., King, R.D., Muggleton, S., Sternberg, M.J.: Carcinogenesis pre-
    dictions using ILP. In: Lavrač, N., Džeroski, S. (eds.) ILP-97. Proceedings of the
    7th International Workshop on Inductive Logic Programming, Prague, Czech Re-
    public, September 17-20, 1997, Lecture Notes in Computer Science, vol. 1297, pp.
    273–287. Springer (1997)
22. Vavpetič, A.: Semantic Subgroup Discovery: Doctoral Dissertation. Ph.D. thesis,
    A. Vavpetič (2016)
23. Železný, F., Lavrač, N.: Propositionalization-based relational subgroup discovery
    with RSD. Machine Learning 62(1-2), 33–63 (2006)