Applying Machine Learning to Enhance
    Optimization Techniques for OWL Reasoning

                         Razieh Mehri? and Volker Haarslev

                   Concordia University, Montréal, Québec, Canada
                            r mehrid@encs.concordia.ca
                            haarslev@cse.concordia.ca


        Abstract. Various (tableau) optimization techniques have been inte-
        grated into OWL reasoners to speed up reasoning. Many of the tech-
        niques rely on heuristics that have been manually fine tuned for achiev-
        ing a good performance but might fail dramatically when encountering
        ontologies exhibiting unexpected design patterns. A typical example are
        heuristics applied to disjunctions in order to select a disjunct to be added
        to the tableau. Evidences indicate that the order of selecting disjuncts
        can have a significant impact on reasoning speed. Our approach presented
        in this paper applies machine learning to make the selection process more
        effective and removes the need for manual fine tuning. We extended the
        OWL reasoner JFact accordingly to control the disjunct selection pro-
        cess. We demonstrate that one can successfully learn to choose a disjunct
        based on the most effective heuristic method. As a first step we focused
        on propositional SAT testing. Our results show that machine learning
        can speed up JFact by one to two orders of magnitude.

        Keywords: OWL reasoning · Description logic reasoning · Machine
        learning · Tableau optimization techniques


1     Introduction and Related Work
A large amount of research has been devoted to the design of tableau optimiza-
tion techniques for Web Ontology Language (OWL) reasoning. Heuristic guided
optimization techniques try to decrease the size of the search space by making
better decisions while dealing with non-deterministic expansions (disjunctions).
While these methods try to improve the optimization techniques, learning from
these techniques has not yet been considered for helping OWL reasoners to make
the best decision when applying a non-deterministic rule.
    Among the many optimization techniques, semantic branching is directly
affected by the non-determinism caused by disjunctions; nevertheless, seman-
tic branching was developed to reduce the non-deterministic cost of syntactic
branching as a part of traditional tableau algorithms.
    In our enhanced version of semantic branching we apply machine learning
to choose a disjunct that could have a drastic impact on reasoning speed. One
?
    Corresponding author
heuristic technique suggested for this purpose is to use the MOMS heuristic that
looks for a disjunct with the maximum number of occurrences in disjunctions
of minimum size [4]. This technique could be more effective with backjump-
ing which is an optimized backtracking variant [2]. Moreover, a combination of
characteristics of axioms such as size, maximum quantifier depth, frequency of
concepts occurring in axioms can help to make better decisions [16].
    Semantic branching uses the Davis-Putnam-Logemann Loveland (DPLL) al-
gorithm that splits branches by making them disjoint from each other. DPLL
is used to solve the satisfiability problem (SAT) in propositional logic. There-
fore, learning to apply the most proper order for variables while solving a SAT
problem helps with the order of applying variables in semantic branching.
    A SAT solver’s runtime varies based on the given instance and the heuristic
algorithm used for choosing variables in each decision level. In order to solve a
SAT problem efficiently, the approach in [7] applies reinforcement learning to
select the branching heuristic with a maximum long-term reward to give the
most proper order for variables with the least overall cost. In [12], authors use
a different learning technique called multinomial logistic regression to solve the
Quantified Boolean Formulas (QBF) that is a more expressive generalization of
SAT. The solution was only suggested for QBF expressions with binary clauses.
    In our work, we employ multinomial logistic regression too. Our approach,
however, applies machine learning to semantic branching in OWL reasoners and
not to SAT solvers. Although, propositional logic is less expressive than De-
scription Logic (DL) [1], yet we believe that a significant percentage of OWL
ontologies contain concept descriptions resembling propositional logic. Moreover,
we use a variety of available branching heuristics, from traditional to new ones
such as [12], which uses VSIDS along with other heuristics derived from VSIDS .
Also, our method does not restrict the number of clauses or variables in logic
expressions.
    There also exist methods that do not consider heuristics while detecting
the best possible expansion. An optimization technique called learning-based
disjunct was introduced in [14]. This method reduces the expansions of inherently
clash generating disjuncts. It orders the disjuncts based on the characteristics
they share with already expanded clash free disjuncts. Further, some reasoners
dramatically benefit from caching [11]. Caching can be used indirectly for the
(un)satisfiability status of a (sub/super) concepts in disjunctions [3, 6]. Caching
is used with quantifiers and in this case, the status of generated successors can
be cached for look-alike concepts.
    In order to integrate our approach into JFact, for each clause in the reasoner’s
input, the order of variables is changed. The new order is specified based on the
order given by solving an input as a SAT problem using machine learning; in
other words, JFact uses our fully trained model to find the new order. Thus, the
clauses in each branching level of reasoning have a new order that could lead to
an earlier solution. The speed improvement is caused by the reduced occurrence
of backjumping due to the changes in (semantic) branching.
   The remainder of the paper is as follows: Section 2 provides a brief back-
ground on semantic branching and its relation to DPLL as well as backjumping,
whereas Section 3 describes how machine learning is used in our algorithm to-
gether with branching heuristics. Section 4 describes our experiments.


2     Reasoning Optimizations
Testing the satisfiability of a concept in tableau-based systems might be costly
due to non-deterministic expansions. OWL/DL reasoners try to avoid such a cost
by using optimization techniques that can save memory and/or time. Our learn-
ing technique has an (in)direct impact on two of these optimization techniques:
semantic branching and backjumping.

Semantic Branching: Traditional tableau algorithms use a not very efficient al-
gorithm called syntactic branching. Due to the redundant search space, which
syntactic branching might explore, new reasoners apply semantic branching in-
stead [4].

Backjumping: This is a search reduction technique which uses dependency lists
to avoid redundant backtracking in completion graphs. Despite backtracking,
backjumping does not just backtrack from a clash to the most recent expanded
disjunction, but it also considers whether a disjunction is related to a clash by
creating a dependency set for each disjunct and labeling the concepts with sets
containing the non-deterministic choices in each branch [2].


3     Learning to Select Variables
3.1    Branching Heuristics
As already noted, the order of applying variables in semantic branching can
significantly affect the resulting runtime. Based on different branching heuristics,
variables can have different scores. These branching heuristics try to prune the
search tree by targeting specific variables and clauses.
    The equation below determines the score of each variable (Score(Var )) by
considering the score of its literals in each branch, so it could balance between
the branches to increase the variable’s score. The first branch considers the
positive from score(Lit + ) and the second branch considers the negation from
score(Lit− ): Score(Var ) = score(Lit + ) + score(Lit − )
    Choosing the variable with the highest score (Score(Var )) may save time
by leading us to an earlier solution. Moreover, by knowing the scores of both
literals or branches (score(Lit + ) and score(Lit − )), one can also choose between
the branches. In our implementation, we are currently only concerned about
changing the order of selecting variables in the reasoning process.
    Below are the well-known branching heuristics used in our implementation.

First literal The first literal occurring in the boolean formula is returned.
MOMS The literal with the maximum number of occurrences in the clauses
  with minimum size is considered.
                MOMS (l ) = occurrences of l in minimum size clauses.
MOMSF It is the alternative of MOMS . If f (x) is the number of occurrences
  of the variable x in the clauses of minimum size, we return the variable max-
  imizing (f (Lit + ) + f (Lit − )) ∗ 2 k + (f (Lit + ) ∗ f (Lit − )) where k is a defined
  constant. [5] indicated some factors which lead to choosing the best priority
  function.
MAXO The literal with the maximum number of occurrences in the boolean
  formula has the greatest score.
                MAXO(l) = number of occurrences of l in the formula.
JW The Jeroslaw-Wang chooses the literal that maximizes the following equa-
  tion where nj is the number of literalsPin the clause Cj :
                                    JW (l) = j,l∈Cj 2−nj
JW2 The two sided Jeroslaw-Wang is the same as JW , but to choose a variable
  it considers its score by adding the score of its positive and negative literals
  together.
DLCS It counts the number of clauses in which the literal and its negation
  occur in and assigns them respectively with CP and CN :
                           DLCS (Var ) = CP (Var ) + CN (Var )
  The variable with maximum DLSC is selected. If CP ≥ CN , the positive
  form of variable (l) is chosen and if CP < CN , the negative form (¬l) is
  chosen. DLSC is faster than JW .
POSIT it is similar to DLCS , but the search is only done on clauses with
  minimum size.
DLIS It is the same as DLCS , but it chooses the maximum value among all CP
  and CN values; then, it picks the variable with the maximum value. DLIS
  performs faster than DLCS and on some benchmarks it performs twice as
  fast as JW .


3.2   Learning Model

A learning method is applied to assign a heuristic method to each logic expression
based on its features. In this study, we use logistic regression.
    In comparison with linear regression, which predicts a continuous dependent
variable for each independent variable (instance), logistic regression predicts a
categorical dependent variable (class) for each independent variable.
    The model built for logistic regression considers the dependent variable as a
conditional probability given as P (Y = 1|X = x) that indicates the probability
of choosing a specific class (identified as 1) for an instance x. If the resulting
probability is more than 0.5, then the instance belongs to class 1; otherwise, it
belongs to class 0 [13].
The represented model for linear regression is a linear equation that relates
independent variables (x with n features x1 , x2 , ..., xn ) to dependent variables
(h(x)):
                                              Pn
                             h(x) = wT x =      i=0 wi xi

Note that x0 = 1 and w stands for parameters or weights.
Using the same model for logistic regression leads to:

                                P (Y = 1|x) = wT x

which is not promising, since the probability should be bounded between 0 and
1. Therefore, the modified equation represented for logistic regression is:
                                                   T
                                                ew x
                             P (Y = 1|x) =
                                              1 + ew T x


Multinomial Logistic Regression If the number of classes to predict is more
than two (several branching heuristics), then the logistic regression technique is
called multinominal logistic regression and it is defined by:
                                                   wT x
                                                 e Hj
                           P (Y = Hj |x) = P             T x
                                                        wH
                                                 H0 e
                                                           0

       P
where H P (H|x) = 1, H = H1 , H2 , ..., Hj , .., Hk . Notice that for all k classes,
the probability of an instance labeled as each of those classes will be obtained;
finally, among the obtained probabilities the maximum one defines the class that
the instance belongs to.
     To obtain the best value for parameters (w), the Newton-Raphson method
is used to maximize the approximation of the predictor function. To obtain the
best (w), Newton-Raphson uses Hessian and Gradient that are the second and
first derivative of the error function (J(w)), respectively. In our experiment, the
following formula is repeated 11 times to obtain the best w (w0 is initiated with
a vector of 0s):
                                                 J 0 (wi )
                                  wi+1 = wi − 00 i
                                                 J (w )
     A set of benchmarks is used as training data for this method. The training
data from the benchmarks contains instances (x). To obtain a class or heuristic
(yi ) of each instance (xi ), all heuristics are applied to the instances and for each
instance the heuristic that runs fastest will be considered as the class of that
instance. Afterwards, the training data pairs (xi , yi ) are ready to be used for the
Newton-Raphson method. While computing wHj , the training instances with
class Hj will be identified as class 1 and others are 0.
     The error function (J(w)) in logistic regression is defined as:
                         Pm
              J(w) = −      i=1 yi log(h(xi )) + (1 − yi )log(1 − h(xi ))
                                            Pm
To avoid overfitting, the regularized term λ i=1 wi2 is added to the cost function
(Hessian and Gradient formulas will also be changed accordingly) where λ is a
regularization parameter.
         Table 1. Training data chosen from SATLIB (number of samples)

    Library     JNH AIM PARITY Flat CBS RTI BMS UF3SAT Others Overall
Training samples 47 24    9     79 623 24 80      367   147    1400
   Satisfiable   14 16    9     79 623 24 80      279   107    1231
  Unsatisfiable  33  8    0      0   0   0   0     88    40     169

                     Table 2. Overall training data attributes

                    Number of variables     Number of clauses
                   Min  Max Mean          Min   Max Mean
                    1   558      103       1    1801 434


4     Empirical Evaluation
For our implementation and experiments, we use JFact1 , which is a Java port
of the FaCT++2 reasoner [15]. JFact is an open source Java-based tool which
makes it easily modifiable and portable to all platforms.

4.1   Training Data
The training data used in our experiment are files in DIMACS CNF format.
The DIMACS format was introduced for encoding the input of SAT solvers,
in which each literal is a number and each clause is a line containing literals
followed by zero. Our training data are from different benchmarks available on
SATLIB.3 Those benchmarks are: JNH, AIM, PARITY, Flat, CBS, RTI, BMS,
and Uniform Random. The available benchmarks in the website are limited to
the specific range of variable and clause numbers; for example, there are only
few CNF files with less than 50 variables and all of the files have 20 variables. To
avoid this restriction, we added about 147 additional CNF files. These additional
files contain small CNF inputs with between 1 to 50 variables. The final training
data contains 1400 files including both satisfiable and unsatisfiable CNFs. Tables
1 and 2 show some characteristics of the training data from SATLIB.
     In this experiment, we only deal with propositional logic input. Therefore,
the training data contains DIMACS CNF format files to be solved with a SAT
solver. We used a standard SAT solver based on the DPLL algorithm.
     To verify whether providing more training data can give us more accurate
results, we use a learning curve to show how increasing the number of training
data can increase/decrease the accuracy of our classifier. As shown in Fig. 1, we
are confident that adding more training data will not change the accuracy of the
classifier. The ten features, which have been used for our experiments, are

 – Number of variables (var )
1
  https://sourceforge.net/projects/jfact/
2
  https://code.google.com/archive/p/factplusplus/
3
  http://www.cs.ubc.ca/~hoos/SATLIB/benchm.html
                      Fig. 1. Cross-validation learning curve

Table 3. Number and percentage of training samples and the improved runtime for
each heuristic

Heuristics First Literal DLSC DLIS MOMS MOMSF POSIT ZM JW JW2
  Count         92         159   129    90    292    170   111 137      220
Percentage    6.57%      10.71% 9.21% 6.42% 20.85% 12.14% 7.92% 9.78% 15.71%


 – Number of clauses (cls)
 – var
    cls
 – ( var
     cls )
           2
     var 3
 – ( cls )
 – Fraction of binary clauses
 – Fraction of ternary clauses
 – Fraction of horn clauses (clauses with exactly one positive literal)
 – Number of positive literals
 – Number of negative literals

    These features are chosen based on the characteristics that play a main role
when using heuristics for solving SAT problems. Selecting too many features
may lead to overfitting depending on the number of training samples. Regular-
ization helps to prevent overfitting if there are too many —and possibly irrelevant
—features [10]. Therefore, in this study, we use regularization to allow as many
features as possible before overfitting.
    The nine branching heuristics used for our experiments are: FirstLiteral ,
MOMS , MOMF , JW , JW2 , POSIT , ZM , DLSC , and DLIS . Table 3 shows
for each heuristic on how many training samples (and for what percentage) it
resulted in the shortest runtime compared to the other heuristics.
    Although more heuristic methods have been developed recently, none of them
is the most effective solution for all benchmarks. For example, MOMS may only
be good for logic expressions with many binary or unary clauses [8], but when
considering all possible input and including features such as the number of unary
and binary clauses, MOMS may lead to very effective results. Therefore, in our
study the branching heuristics were chosen in such a way to consider CNF files
with different shapes. Most of our sample tests, which are introduced later,
choose MOMSF based on their features and the application of the learning
algorithm, which determines the probability of the heuristics.
    Based on the algorithm, for each training data, the selected branching heuris-
tic is used until the solver process is finished. The same applies for the sample
tests.
    The sample cases chosen as input are OWL files in conjunctive normal form.
Depending on the complexity of the input (it does not matter if it is already in
conjunctive normal form), JFact builds a directed acyclic graph (DAG) structure
of that input [9]. The built DAG could be explicit or implicit based on the
structure of the input. Changing the order in the DAG structure can be done
while building the DAG.
    Our samples are OWL files created from CNF DIMACS format files in which
a concept, which is targeted for checking its satisfiability, is equal to the CNF.
Since the target concept contains the whole CNF, JFact builds an explicit struc-
ture from the OWL file; therefore, targeting specific points of the DAG structure
can be done easily since all levels of the DAG structure are directly involved in
building the concept. More consideration is needed when the DAG is not explicit
since one needs to figure out which levels of the DAG are directly involved in
building a concept, i.e., the process of checking satisfiability of that concept. As
an example, let us consider the following CNF DIMACS format data.

p cnf 4 3
2 3 -4 1 0
-4 0
2 3 4 0

   Our OWL generator builds an OWL file from the CNF DIMACS file and
adds a fresh concept named D that is equivalent to the given CNF. Therefore,
we get the following description logic axiom.


    D ≡ (A t B t C t ¬E) u
         ¬E u
         (B t C t E)

   The DAG structure built by JFact for the above axiom is

1 *TOP*
2 primconcept(urn:jfact#temp) [= 1
3 concept(ontology:#D) = 10
4 primconcept(ontology:#A) [= 1
5 primconcept(ontology:#B) [= 1
                        Table 4. Samples (runtimes in milliseconds)

                          No. of    No. of
    Runtime Runtime backjumps backjumps                                          Sat/
    (without (with      (without    (with   Runtime Backjump No. of No. of Unsat
No. learning) learning) learning) learning) speedup reduction variables clauses status

 1   22,651    10,569    1,892,308   834,000     2.14     2.26        60   250    Sat
 2   25,247    3,699     2,252,016   267,111     6.82     8.43        65   257    Sat
 3   597,447 534,735 56,537,985 50,330,597       1.11     1.12        60   160   Unsat
 4   52,989    31,587    4,380,675 2,609,718     1.67     1.67        75   325   Unsat
 5   20,229    10,222    1,771,462   789,925     1.97     2.24        59   261   Unsat
 6   16,770    5,511     1,019,601   350,294     3.04     2.91        90   354   Unsat
 7     7,268   1,800      709,942    158,874     4.03     4.46        45   160    Sat
 8   106,941   4,393     7,699,075   271,423    24.34     28.36       75   305    Sat
 9   18,779     800      1,212,269    32,067    23.47     37.80       75   325    Sat
10   19,493    9,511     1,593,995   740,132     2.04     2.15        59   243   Unsat
11     6,866   1,641      677,743    136,166     4.18     4.97        45   160    Sat
12     5,744    753       551,661     49,356     7.62     11.17       34   155    Sat
13 110,952     15,244    6,627,625   850,760     7.28      7.8        75   343   Unsat


 6 primconcept(ontology:#C) [= 1
 7 primconcept(ontology:#E) [= 1
 8 and 7 -4 -5 -6
 9 and -5 -6 -7
 10 and -7 -8 -9
     Here, D is a newly added defined concept, which stands for a non-primitive
 concept, and it is linked to node 10. Other concepts called A, B, C, E are defined
 as primconcept (stands for primitive concept). They are used to build D and are
 linked to the concept > (Top) as node 1.

 4.2     Results
 Our experiment contains 40 sample tests with the number of clauses between
 150 and 400 and the number of variables between 35 and 90. Half of the samples
 are satisfiable and half are unsatisfiable. The experiments were performed on
 a MacBook Pro with 2.2 GHz Intel Core i7, RAM 16 GB 1600 MHz DDR3
 using a correspondingly modified version of JFact. We carried out a 7-fold cross
 validation on 1400 training sets with a learning accuracy of 83%.
     The performance of 13 out of 40 of our sample tests are shown in Table 4.
 These samples with their characteristics give an adequate representation (more
similar in their characteristics compared to other samples) of all 40 samples. For
each sample the table lists the original runtime (no learning), the improved run-
time (with learning), the number of original backjumps (no learning), the number
of reduced backjumps (with learning), the runtime speedup (defined as original
runtime divided by improved runtime), the backjump reduction (defined as the
number original backjumps divided by the number of reduced backjumps), and
also the number of clauses, variables, and the SAT status. Table 4 also demon-
strates that there is a direct relationship between the speed improvement and
the reduction in the number of backjumps. For example, for the sample number
8, the number of backjumps went from 7,699,075 down to 271,423 (reduction of
28) while the runtime improved from 106,941 milliseconds to 4,393 milliseconds
(speedup of 24).
    The overall performance of the algorithm is given in Table 5. It lists the
maximum, minimum and mean runtime improvement in milliseconds and the
corresponding speedup factors. This shows that applying learning improves the
speed of satisfiability checking by one to two orders of magnitude.
    Moreover, the percentage of improvement based on (un)satisfiability suggests
that the learning technique is more successful for satisfiable cases. The rate of
success for unsatisfiable and satisfiable input are 53.01% and 78.25%, respec-
tively.
    Since we want to achieve a significant runtime improvement (e.g., at least
10%), we mostly consider sample tests that perform in a sufficient amount of
time (e.g., at least several seconds) such that the improvement is obvious. For
example, the samples with below 50 variables and 100 clauses mostly perform
in less than one second; therefore, improving their speed is not our primary
concern.
    On the other hand, the sample number 3 from Table 4 (with 60 variables
and 160 clauses) has a initial runtime of 597,447 milliseconds and the runtime
is improved by 62,712 milliseconds after learning. Therefore, the percentage of
improvement is 10.49%, which is still considered as good. Moreover, for the
CNF files with more than 100 variables and 400 clauses, our experiments require
several minutes of runtime. For example, if a sample test is executed in 5 minutes
and the improvement is only 10 seconds, then the improvement percentage is
only 3 percent which we do not consider as significant. That is also one of the
reasons, that in our 40 successful sample tests we only included very few with
long running times since the learning improvement is not always very significant.
    Our implementation does not improve the runtime for every input. For one
out of every five input files (20%) the runtime is mostly unchanged or increased
by less than 10%. For example, in one sample test, the runtime without learning
is 502,131 milliseconds. After learning, the runtime has increased by 2.43%,
which we consider as a tolerable performance loss.
    To reduce the number of cases without a significant speedup, we believe it
is a good idea to discover specific CNF patterns where a learning improvement
is not expected. For example, in one of the identified patterns for every clause
there also exist clauses with the exact same variables of that clause but every
                          Table 5. Speed improvement

                       Improvement      Max Min Mean
                       in milliseconds 102,548 1,892 18,500
                       Speedup Factor 311.2 1.11 14.3


other possible combinations of their literals. An example of such pattern is the
following description logic axiom.


    D ≡ (A t E t F ) u (A t ¬E t F ) u (A t E t ¬F ) u (A t ¬E t ¬F ) u
        (¬A t E t F ) u (¬A t ¬E t F ) u (¬A t E t ¬F ) u (¬A t ¬E t ¬F ) u
        (B t C) u (B t ¬C) u (¬B t C) u (¬B t ¬C)

Note that the features such as the number of variables and clauses are close to
the successful samples.
Finding these patterns can also help us to categorize OWL input to enhance the
reliability of our future implementation, which will also be extended for more
expressive OWL input.


5     Conclusion and Future Work

In this paper, we focused on improving one of the well known optimization
techniques for OWL reasoners called semantic branching. Even though semantic
branching is guaranteed to avoid redundant search space occurring in traditional
syntactic branching, machine learning techniques also help to further decrease re-
dundant search space exploration. The task is possible by learning new orderings
for applying variables in each branching level. Our algorithm has demonstrated
a significant improvement in our sample tests.
    As part of our future work, we started to expand our approach beyond propo-
sitional description logic and integrate the proper treatment of universal, exis-
tential quantifiers, and qualified number restrictions in our machine learning ap-
proach. For instance, we started working on strategies to apply machine learning
in a similar way while also considering interactions between OWL axioms.
    Similar strategies could be applied to simplify OWL expressions and prepare
them for machine learning. Later, for training purposes, OWL expressions could
be treated as propositional logic or Quantified Boolean Formulas (QBF) so they
can take advantage of available training sets to be solved by QBF or SAT solvers.
Moreover, as already mentioned in the previous section, categorizing OWL input
based on its shape and pattern could be also helpful for further exploration.
Finally, we plan to consider other learning techniques such as Naive Bayes and
k-Nearest Neighbor based on the features and their correlation.
References
 1. Baader, F., Calvanese, D., McGuinness, D., Nardi, D., Patel-Schneider, P.F. (eds.):
    The Description Logic Handbook: Theory, Implementation, and Applications.
    Cambridge University Press, 2nd edn. (2007)
 2. Baker, A.B.: Intelligent backtracking on constraint satisfaction problems: experi-
    mental and theoretical results. Ph.D. thesis, University of Oregon (1995)
 3. Donini, F.M., Massacci, F.: Exptime tableaux for ALC. Artif. Intell. 124(1), 87–138
    (Nov 2000)
 4. Freeman, J.W.: Hard random 3-SAT problems and the Davis-Putnam procedure.
    Artificial intelligence 81(1-2), 183–198 (1996)
 5. Freeman, J.W.: Improvements to propositional satisfiability search algorithms.
    Ph.D. thesis, University of Pennsylvania (1995)
 6. Haarslev, V., Möller, R.: High performance reasoning with very large knowledge
    bases: A practical case study. In: IJCAI 2001. pp. 161–168 (2001)
 7. Lagoudakis, M.G., Littman, M.L.: Learning to select branching rules in the DPLL
    procedure for satisfiability. Electronic Notes in Discrete Mathematics 9, 344–359
    (2001)
 8. Maandag, P., Barendregt, H., Silva, A.: Solving 3-SAT. Bachelor’s thesis, Radboud
    University Nijmegen (2012)
 9. Nebot, V., Berlanga, R.: Efficient retrieval of ontology fragments us-
    ing an interval labeling scheme. Information Sciences 179(24), 4151–4173,
    doi:10.1016/j.ins.2009.08.012 (2009)
10. Ng, A.Y.: Feature selection, L 1 vs. L 2 regularization, and rotational invariance.
    In: Proceedings of the twenty-first international conference on Machine learning.
    p. 78. ACM (2004)
11. Patel-Schneider, P.F.: DLP system description. In: In Proc. Description Logics
    (DL)-98. pp. 78–79 (1998)
12. Samulowitz, H., Memisevic, R.: Learning to solve QBF. In: In Proceedings of the
    22nd National Conference on Artificial Intelligence (AAAI’07). vol. 7, pp. 255–260
    (2007)
13. Shalizi, C.: Advanced data analysis from an elementary point of view. Cambridge
    University Press (2013)
14. Sirin, E., Grau, B.C., Parsia, B.: From wine to water: Optimizing description logic
    reasoning for nominals. In: In: Proc. of the 10th Int. Conf. on Principles of Knowl-
    edge Representation and Reasoning (KR 2006). pp. 90–99 (2006)
15. Tsarkov, D., Horrocks, I.: FaCT++ description logic reasoner: System description.
    In: International Joint Conference on Automated Reasoning. vol. 4130 of LNCS,
    pp. 292–297. Springer (2006)
16. Tsarkov, D., Horrocks, I., Patel-Schneider, P.F.: Optimizing terminological rea-
    soning for expressive description logics. Journal of Automated Reasoning 39(3),
    277–316 (2007)