DortmundAI at LeQua 2022: Regularized SLD
Martin Senz, Mirko Bunse
TU Dortmund University, Artificial Intelligence Group, D-44227 Dortmund, Germany


                                      Abstract
                                      The LeQua 2022 competition was conducted with the purpose of evaluating different quantification
                                      methods on text data. In the following, we present the solution of our team “DortmundAI”, which ranked
                                      first in the multi-class quantification task T1B. This solution is based on a modification of the well-known
                                      Saerens-Latinne-Decaestecker (SLD) method. Here, the SLD method, which is based on expectation
                                      maximization, is extended by a regularization technique. Additional experiments with the test data,
                                      which we took out after the competition closed, reveal that our excellent ranking stems primarily from
                                      an extensive hyperparameter tuning of the classifier.

                                      Keywords
                                      Quantification, Supervised prevalence estimation


1. Introduction
Quantification is a supervised learning task which consists of training a predictor for class
prevalences in a sample of unlabelled data items [1]. This task has received increased attention
in recent years because, in many applications, the class distribution of a batch of data is relevant,
rather than the prediction of individual instances of the data.
   The LeQua 2022 competition [2] was initiated with the intention of evaluating the performance
of methods that address quantification. Here, the focus is the quantification of text data, where
the data consisted of collected customer reviews from Amazon. Two key learning tasks were
formulated:
               • (A) Binary quantification of reviews by positive (more than 3 stars) or negative rating
                 (less than 3 stars)
               • (B) Multiclass quantification of reviews according to 28 product categories
These tasks were further divided according to the data representation. Namely, the organizers
provided vectorized data (1), as well as the raw text data (2), for a total of four tasks: T1A, T1B,
T2A, and T2B. For more information about the competition and the evaluation protocol, see [2].
  Our contributed solution focuses on the performance of the quantifier, rather than the
representation of the data. Therefore, we relied on the vectorized representation and addressed
only the tasks T1A and T1B. Our solution ranked first in T1B and fifth in T1A.
  In Sec. 2, we describe the quantification method that we used. We complement our pre-
sentation in Sec. 3 with additional experiments that we conducted with the test set after the
competition was closed.
CLEF 2022: Conference and Labs of the Evaluation Forum, September 5–8, 2022, Bologna, Italy
$ martin.senz@tu-dortmund.de (M. Senz); mirko.bunse@cs.tu-dortmund.de (M. Bunse)
                                    © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
 CEUR
 Workshop
 Proceedings
               http://ceur-ws.org
               ISSN 1613-0073
                                    CEUR Workshop Proceedings (CEUR-WS.org)
2. Method
Our solution is based on the well-known quantification method SLD [3], which is extended by
a regularization technique. Here, the idea of regularization is taken from another quantification
method [4] that is domain-specific for experimental physics. Originally, our extension was
proposed for ordinal quantification in particular [5].
   We use the following notation. By x ∈ 𝜎 we denote a data item from an unlabelled data
set 𝜎 = {x𝑖 ∈ 𝒳 : 1 ≤ 𝑖 ≤ 𝑚}. By 𝑦 ∈ 𝒴 we denote a class from a set of classes
𝒴 = {𝑦1 , ..., 𝑦𝑛 }. Furthermore, ℎ : 𝒳 → 𝒴 represents a soft classifier that returns posterior
probabilities [ℎ(x)]𝑖 ≡ P(𝑦𝑖 | x). By 𝑝^𝜎 (𝑦) we denote the prevalence of class 𝑦, as estimated by
a quantification method that receives 𝜎 as an input. The goal of quantification is to return a
^𝜎 (𝑦) that is close to the true prevalence P(𝑦 | 𝜎).
𝑝

2.1. Saerens-Latinne-Decaestecker (SLD)
The SLD method [3] follows an expectation maximization approach, which (i) leverages Bayes’
theorem in the E-step, and (ii) updates the prevalence estimates in the M-step. Both steps can
be combined in a single update rule
                                                    (𝑘−1)
                                                    𝑝
                                                    ^𝜎         (𝑦𝑖 )
                                                                       · [ℎ(x)]𝑖
                                      1 ∑︁           𝑝
                                                         (0)
                                                     ^𝜎 (𝑦𝑖 )
                        ^(𝑘)
                        𝑝 𝜎 (𝑦𝑖 ) =                            (𝑘−1)
                                                                                   .            (1)
                                      |𝜎| x∈𝜎 ∑︀𝑛    𝑝
                                                     ^𝜎     (𝑦𝑖 )
                                                                  · [ℎ(x)]𝑗
                                               𝑗=1 =    (0)
                                                               𝑝
                                                               ^𝜎 (𝑦𝑖 )

                                                                                        (0)
  This update rule is applied until the estimates converge. The initial estimates 𝑝
                                                                                  ^𝜎 (𝑦𝑖 ) are
given by the class prevalence values of the training set.

2.2. Regularization in SLD
We employ the regularization technique of the Iterative Bayesian Unfolding [4]. This physics-
specific quantification method revolves around an expectation maximization with Bayes’ theo-
rem, and thus has a common foundation with SLD.
    In IBU, each intermediate estimate 𝑝^(𝑘) is regularized in the following way. First, a low-order
polynomial is fitted to 𝑝^ . Second, a linear interpolation between 𝑝
                           (𝑘)
                                                                         ^(𝑘) and this polynomial is
used as the prior of the next iteration. Due to the smoothness of low-order polynomials, this
replacement of 𝑝 ^(𝑘) reduces the differences between neighbouring prevalence estimates. Hence,
the estimates are regularized towards smooth solutions. The interpolation factor between 𝑝      ^(𝑘)
and the polynomial and the order of the polynomial are hyperparameters of IBU through which
the strength of the regularization is controlled.
    The IBU regularization is particularly suitable for ordinal quantification tasks [5], where the
classes follow a total order 𝑦𝑖 < 𝑦𝑖+1 . Without an order, the idea of “neighbouring classes” is
not well-defined. However, we hypothesized that the IBU regularization might also benefit
non-ordinal multi-class quantification tasks, like T1B in LeQua2022. This hypothesis is based
on the idea that smoothing can suppress over- and under-estimations of class prevalences, even
if the classes are not totally ordered. One of our motivations to participate in LeQua2022 was to
test this hypothesis.
   We call our quantification method o-SLD, as it was originally proposed for ordinal quan-
tification [5]. Our method has two hyperparameters, the order of the polynomial and the
interpolation factor. The pseudo code is displayed in Alg. 1.

Algorithm 1 o-SLD [5], our regularized version of SLD.
                                                                          (0)
input: a soft multi-class classifier ℎ : 𝒳 → R𝑛 , the prevalences 𝑝       ^𝜎 (𝑦𝑖 ) of the training set,
a data sample 𝜎 = {x𝑖 ∈ 𝒳 : 1 ≤ 𝑖 ≤ 𝑚}, a polynomial order 𝑜 ∈ N, and an interpolation
factor 0 ≤ 𝜆 ≤ 1
output: a class prevalence estimate 𝑝   ^𝜎
  1: 𝑘 ← 0
  2: repeat
  3:   𝑘 ←𝑘+1
  4:   if 𝑘 > 1 then
                                                 (𝑘−1)
  5:      fit a polynomial 𝑓𝑜 : R → R to 𝑝     ^𝜎
            (𝑘−1)                     (𝑘−1)
  6:      𝑝
          ^𝜎      (𝑦𝑖 ) ← (1 − 𝜆) · 𝑝
                                    ^𝜎      (𝑦𝑖 ) + 𝜆 · 𝑓𝑜 (𝑦𝑖 ) (regularization)
  7:   end if
                  (𝑘)
  8:   update 𝑝 ^𝜎 , as according to Eq. 1 (standard SLD step)
                                       (𝑘)        (𝑘−1)
  9: until some distance between 𝑝   ^𝜎 and 𝑝   ^𝜎      is small
               (𝑘)
 10: return 𝑝 ^𝜎


3. Evaluation
The primary objective of this evaluation is to measure the performance of the o-SLD method.
For this purpose, a detailed process of model selection based on the relevant hyperparameters
was performed. This process involved the optimization of a variety of model configurations on
the training data and the estimation of their performance from the given validation data. To
this end, we ran an exhaustive full grid search, where the respective hyperparameter search
spaces were iteratively adjusted.
  Overall, the following hyperparameters were identified as being relevant:
    • Degree 𝑜 ∈ N of the polynomial which replaces 𝑝   ^(𝑘)
    • Impact 𝜆 ∈ [0, 1] of the linear interpolation between the polynomial and 𝑝
                                                                               ^(𝑘)
    • Inverse of regularization strength 𝐶 (Logistic Regression)
Initially, Logistic Regression and Support Vector Machines (SVM) were found to be suitable
classifier candidates. Since no improvement of the results was observable with SVM, the focus
was then put on Logistic Regression classifiers.
   Based on the validation results, it became obvious that the choice of 𝐶 has a high impact on
the obtained results. Accordingly, careful tuning of 𝐶 was essential to get satisfactory results.
Considering the hyperparameters 𝑜 and 𝜆, a configuration with 𝐶 = 0.006 was found for
subtask T1A, as well as 𝐶 = 0.01 for task T1B. As can be seen from Fig. 1, there is yet a different
𝐶 = 0.01 for task T1A, which has a smaller test error than the one selected by the validation
data. The 𝐶 parameter has a major impact on the performance of o-SLD.
                                                                                                o-SLD Val.
               10−0.4                                                                           o-SLD Test
                                                                                                             100.4

               10−0.6
         RAE

                                                                                                             100.2

               10−0.8

                                                                                                             100

                10−1
                         10−4      10−3   10−2    10−1      100    10−4   10−3     10−2    10−1       100
                                           𝐶                                        𝐶

                                (a) Task T1A                                       (b) Task T1B
Figure 1: Influence of the hyperparameter 𝐶 on the results by o-SLD on the validation and test data.
The vertical line symbolizes the minimum validation error found.


Table 1
Impact of the o-SLD regularization hyperparameters, in terms of RAE, for Task T1B and 𝐶 = 0.01.
                          𝑜=0          𝑜=1         𝑜=2          𝑜=3         𝑜=4          𝑜=5           𝑜=6
     𝜆 = 0.1            1.27155      1.25751     1.23959       1.22277    1.21605      1.21044        1.19767
     𝜆 = 0.01           0.94607      0.944458    0.944458      0.94142    0.938579     0.938455      0.937126
     𝜆 = 0.001          0.913082     0.913042    0.912714     0.912534    0.912491     0.912623      0.912485

Table 2
The final configurations and results of o-SLD and SLD, based on the model selection performed. The
testing scores and the SLD results were generated after the completion of the challenge.
        Task       Model                                          Validation RAE     Test RAE      Test AE
        T1A        o-SLD (𝐶 = 0.006, 𝑜 = 1, 𝜆 = 0.1)                 0.122869         0.1140        0.0271
                   SLD(𝐶 = 0.006)                                    0.122869         0.1140        0.0271
        T1B        o-SLD (𝐶 = 0.01, 𝑜 = 6, 𝜆 = 0.001)                0.912485         0.8799       0.0117
                   SLD (𝐶 = 0.01)                                    0.910511         0.8780        0.0118


   In the progress of the model selection for task T1B, it also appeared that model configurations
with a small influence value 𝜆 are preferred, see Tab. 1 for example. Since o-SLD approaches the
standard SLD method when 𝜆 decreases, this indicates that the additional smooth regularization
is not an improvement in T1B.
   This indication also shows when comparing the final o-SLD results with the standard SLD
in Tab. 2. During the competition, we omitted an evaluation of the standard SLD because the
hyperparameter grid of o-SLD included also small regularization impacts, with which the two
methods are nearly equivalent.

Synopsis In the LeQua 2022 competition, the presented o-SLD achieved the first place for
task T1B. As our experiments from Tab. 2 show, this excellent ranking was not achieved due to
the regularization provided by o-SLD, but due to an extensive model selection that focused on
optimizing the regularization parameter 𝐶. Although o-SLD could not achieve a lower error
in this specific competition, the method has the capability to be useful in other quantification
tasks, like ordinal quantification.


References
[1] G. Forman, Counting positives accurately despite inaccurate classification, in: European
    Conference on Machine Learning, 2005, pp. 564–575.
[2] A. Esuli, A. Moreo, F. Sebastiani, G. Sperduti, A detailed overview of LeQua 2022: Learning
    to quantify, in: Working Notes of the Conference and Labs of the Evaluation Forum, 2022.
    To appear.
[3] M. Saerens, P. Latinne, C. Decaestecker, Adjusting the outputs of a classifier to new a priori
    probabilities: A simple procedure, Neural computation 14 (2002) 21–41.
[4] G. D’Agostini, A multidimensional unfolding method based on Bayes’ theorem, Nuclear
    Instruments and Methods in Physics Research Section A: Accelerators, Spectrometers,
    Detectors and Associated Equipment 362 (1995) 487–498.
[5] M. Bunse, A. Moreo, F. Sebastiani, M. Senz, Ordinal quantification through regularization, in:
    Joint European Conference on Machine Learning and Knowledge Discovery in Databases,
    2022. To appear.