Qualifier Recommendation for Wikidata⋆
                                Andrei Mihai Ducu1 , Michael Cochez1
                                1
                                    Vrije Universiteit Amsterdam, The Netherlands


                                                                         Abstract
                                                                         Wikidata, a collaborative knowledge base for structured data, empowers both human and machine users
                                                                         to contribute and access information. Its main role is in supporting Wikimedia projects by acting as
                                                                         the central storage database for the Wikimedia movement. To optimize the manual process of adding
                                                                         new facts, Wikidata utilizes the association rule-based PropertySuggester tool. However, a recent paper
                                                                         introduced the SchemaTree, a novel approach that surpasses the state-of-the-art PropertySuggester in all
                                                                         performance metrics. The new recommender employs a trie-based method and frequentist inference
                                                                         to efficiently learn and represent property set probabilities within RDF graphs. In this paper, we adapt
                                                                         that recommendation approach, to recommend qualifiers. Specifically, we want to find out whether
                                                                         the recommendation can be done using co-occurrence information of the qualifiers, or whether type
                                                                         information of the item and the value of statements improves performance. We found that the qualifier
                                                                         recommender that uses co-occurring qualifiers and type information leads to the best performance.

                                                                         Keywords
                                                                         Wikidata, Qualifiers, Recommender


                                1. Introduction
                                In today’s world, the era of big data and rapid ever-changing information, effective systems are
                                needed to organize and structure the abundance of information available online. A great number
                                of databases require constant editing and frequent updates to provide reliable, accurate, and
                                readily available information. An example of such an openly available resource is Wikidata[1].
                                   The Wikidata project is part of the Wikimedia movement. It is widely accessible and used,
                                not only by other Wikimedia projects, but also by external applications and organizations. Just
                                in a one-year period (Apr. 2022 - Apr. 2023), Wikidata reached around 5 billion page views.
                                Most of these are automated actions that propagate information to and from other sources.
                                However, the platform saw a monthly average user base of around 3 million unique devices,
                                proving its direct usefulness to the public, as well. More importantly to the theme of the paper,
                                it had an average of around 43 thousand editors, both human and automated. Another metric of
                                interest is the number of edited pages, which averages at 10 million per month1 . This intensive
                                usage suggests that a better assistive editing system could make the work of many editors easier,
                                more efficient, and less error prone.
                                              ⋆
                                    This work is based on the BSc. thesis of Andrei Mihai Ducu, under the supervision of Michael Cochez
                                Wikidata’23: Wikidata workshop at ISWC 2023
                                $ a.ducu@student.vu.nl (A. M. Ducu); m.cochez@vu.nl (M. Cochez)
                                 https://www.cochez.nl (M. Cochez)
                                 0000-0001-5726-4638 (M. Cochez)
                                                                       © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
                                    CEUR
                                    Workshop
                                    Proceedings          CEUR Workshop Proceedings (CEUR-WS.org)
                                                  http://ceur-ws.org
                                                  ISSN 1613-0073


                                                  1
                                                      Wikimedia Statistics: https://stats.wikimedia.org/#/wikidata.org


                                                                                                                                           1


CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                                 1–11


   Contributors with diverse backgrounds and varying levels of expertise may encounter diffi-
culties when editing records in a complex knowledge base such as Wikidata. Erroneous updates
could potentially result in data inconsistencies and incompleteness. Therefore, assisting the
users in the editing process is of utmost importance to preserve data quality and accuracy,
while greatly reducing the workload. At the user interface side, there are two systems active on
Wikidata to improve the quality. First, there are constraints on properties, which check whether
they are applied on items of the right type, and whether the values are within the expected
range. Second, there are recommender systems, which can act as a guide for the end users when
adding properties to items and qualifiers to properties.
   This papers focuses on the latter aspect, improving the recommendations for adding qualifiers
on properties, such that manual editors can update qualifier information for statements about
items on Wikidata. The recommender system used for this purpose was adapted from the
previous work on the SchemaTree property recommender [2]. SchemaTree introduces a novel
approach, rooted in trie structures, to compute probability distributions of property sets within
RDF graphs. For this project, the recommender suggests qualifiers instead of properties, based
on a) co-occurring qualifiers and b) type information of the item (subject), as well as the
corresponding property value (object).
   In order to determine the best configuration of the new qualifier recommender, four con-
figurations were tested, each corresponding to a different level of information provided to
the system. The first configuration recommends qualifiers using only co-occurring qualifier
information. The second and third, use either item (subject) or value (object) type information.
Lastly, full contextual information was used for the recommendation, namely co-occurring
qualifiers, and both item and value type information.
   Two questions were formulated around the four configurations investigated. The main
question, and a secondary one that stems from the main question. The two research questions
are:

    • Does including type information improve the performance of the qualifier recommender
      system?
    • What kind of type information is more informative, the type of the item or the type of
      the value?

These questions are investigated by evaluating each configuration, with two different evaluation
methods, against a held-out test set consisting of 20% of all data extracted from Wikidata. The
performance of the configurations is compared to a baseline model that make recommendations
solely on absolute qualifier occurrence frequency, without using any other contextual infor-
mation. What we find is that adding type information is nearly always beneficial. The code is
available on GitHub2 .


   2
       Qualifier Recommender: https://github.com/Duculet/QualifierRecommender/tree/eval_handlers


                                                      2
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                                         1–11


2. Background and Related Work
2.1. SchemaTree Recommender
The SchemaTree Recommender 3 proposes a novel approach to new property recommendations
within the Wikidata project. This system comes as an alternative to the currently used Prop-
ertySuggester4 . The newly introduced recommender makes use of the maximum-likelihood
of properties to suggest additional ones. The recommender leverages a compact trie-based
data structure called the SchemaTree, which integrates the representation of property and type
co-occurrences. It specializes on the efficient lookup of such patterns, being constructed as an
adapted trie construction of a frequent pattern tree. This data structure type enables efficient
probability calculations and efficient property pair retrieval. Next, the SchemaTree structure is
introduced and described. This information was adapted from [2].
   The SchemaTree is created as a data structure to facilitate property recommendations based
on maximum-likelihood estimation. These recommendations are generated for a given item,
denoted as 𝐸, and its set of properties, denoted as 𝑆 = 𝑠1 , . . . , 𝑠𝑛 , where 𝑆 ⊆ 𝐴 (subset of the
available properties 𝐴 in Wikidata). The goal of recommending maximum-likelihood properties
is to identify the most likely property 𝑎ˆ ∈ 𝐴 ∖ 𝑆, meaning a property the item does not have
already. The property 𝑎 ˆ has to be found such that the following holds:

                                                                     𝑃 ({𝑎, 𝑠1 , . . . , 𝑠𝑛 })
                    ˆ = argmax𝑃 (𝑎 | {𝑠1 , . . . , 𝑠𝑛 }) = argmax
                    𝑎                                                                                        (1)
                          𝑎∈(𝐴∖𝑆)                             𝑎∈(𝐴∖𝑆) 𝑃 ({𝑠1 , . . . , 𝑠𝑛 })

where 𝑃 ({𝑡1 , . . . , 𝑡𝑚 }) denotes the probability that a selected entity has at least the properties
𝑡1 , . . . , 𝑡𝑚 . In line with this, the recommended properties are the ones that exhibit the highest
frequency of co-occurrence with the properties already possessed by the given entity [2].
    By adopting a frequentist probability interpretation the joint probabilities are estimated
based on the relative frequency of occurrence. The absolute frequency of a set of properties, i.e.
the number of items that have (at least) this set of properties, is represented as supp(𝐴). By
reformulating Equation 1, the estimation of the most probable property recommendation can
be expressed as follows:
                                                   supp (𝑎, 𝑠1 , . . . , 𝑠𝑛 )
                                       ˆ ≃ argmax
                                       𝑎                                                            (2)
                                            𝑎∈(𝐴∖𝑆) supp (𝑠1 , . . . , 𝑠𝑛 )

   The SchemaTree structure aims to optimize the computation time for estimating this proba-
bility in the context of all data already contained within Wikidata.
   Besides finding the properties with the highest probability, the SchemaTree also uses back-off
strategies in case the recommendations are not good enough. They found that the best back-off
strategy was to rerun the system with the least popular property removed from the property set,
in case there are no recommended properties, which happens when all have a zero probability.
This gets repeated up to four times, until a recommendation is found. In this work, we use the
same setup. In future work, it should be investigated whether there is a better backoff strategy
specifically for qualifiers.

    3
        SchemaTree Recommender: https://github.com/lgleim/SchemaTreeRecommender
    4
        PropertySuggester: http://gerrit.wikimedia.org/r/admin/projects/mediawiki/extensions/PropertySuggester


                                                        3
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                         1–11


2.2. Other Works
Other recommender systems were proposed throughout the years. Another such system
that was recently put forth is the WikidataRec [3]. The system employs a hybrid approach
that combines content-based and collaborative filtering techniques to rank items for editors.
This hybrid approach considers both the features of the items themselves and the previous
interactions between items and editors. To achieve this, a neural network called "neural mixture
of representations" is developed. This neural network is specifically designed to learn optimal
weights for combining item-based representations and editor-based representations, taking
into account the interactions between items and editors. By leveraging these interactions,
the system aims to optimize the ranking of items and improve the overall recommendation
quality for editors. Based on their experimental data, the system proved to perform well in
situations where the data fed into the model was dense. However, collaborative filtering was
found to be less useful in the case of sparse editing data, which makes up most of the available
data [3]. Another approach taken to handle Wikidata qualifiers was by using reasoning. This
entails defining inference rules, specifically on ontological properties. The paper proposes
handling of qualifiers using inference rules, although the system presented does not implement
a recommender system. However, it is interesting to see how they overcame the massive
number of qualifiers and practically implemented a prototype that can express all of Wikidata’s
ontological properties [4].


3. Qualifier Recommender
This section provides a comprehensive description and analysis of the work conducted, from
data extraction to data structure, including adaptations made to the SchemaTree code to work
with qualifiers instead of properties.

3.1. SchemaTree Adaptation
Several parts of the original SchemaTree recommender code were adapted such that it could be
used to enable the recommendation of qualifiers instead of properties. Also, some new additions
were made in order to extract qualifier information from the Wikidata dump file, and evaluate
the new recommender system. Changes and additions were made to the following components:
Extractor Added the functionality to extract qualifier information from the Wikidata JSON
     dump and save it in one TSV file for each property type.
Splitter Used for the evaluation to split the TSV files in a train and a test set.
Datatypes Updated so it can detect and count the types that occur alongside the qualifiers
     (item / value types)
RecommenderServer Added a handler for qualifier recommendation requests. This handler
    receives the property of the statement for which recommendations are sought, as well as
    potentially the types of the item and value of the statement.
These updates and additions will be detailed in the following sub-sections.


                                                4
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                                                                                       1–11


3.2. Configurations
Four main configurations were explored, as previously mentioned. They are the following:
    • FF - No type information included
    • TF - Value (object) type information included
    • FT - Item (subject) type information included
    • TT - Both value (object) and item (subject) type information included
Each of the experiments that follow were conducted four times, once for each configuration.
The next subsections will only detail the TT configuration. The same pipeline was applied to
the other configurations, leaving out the respective types of information.
  Besides these configurations, we also have a baseline configuration, which makes no use of
property, nor item and value type information.

3.3. Data Extraction
Data used to generate the models for the recommender system was obtained in the form of
a BZIP2 compressed Wikidata JSON dump5 consisting of all items and their representative
features in the knowledge base. For each unique property in the dataset, a TSV file was generated
incorporating information about the occurrences of that specific property throughout Wikidata.
For example, in the file for P50, each entry (row) contains information about one occurrence of
the property being used on wikidata. We store the used qualifiers, as well as the item (subject)
and value (object) types for this occurrence. We call the collected information for one occurrence,
a transaction, in accordance with the frequent item set literature. The extraction workflow is
further detailed in fig. 1.


                                                                                                                  Extract type               Store in
                                                                                                      Yes
                                                                                                                  information               dictionary


                       Wikidata          Items               Read item
        Start           JSON           available      Yes                               First pass          No
                                                               data
                        dump            to read


                         No                                                 No
                                                                                                                  Statements                   Read
                                                                                                                   available       Yes
                                                                                                                                          statement data
        End                                                                                                         to read
                                                                                        Store list as
                                                                                       transaction in
                                                                                           tsv file

                                                                                                                                    No

                                                                                            No


                                      Append type                                           Type                                            Statement
                                                            Get type from                                        Save qualifiers
                                     information to                              Yes    information                                 Yes      contains
                                                             dictionary                                             to a list
                                         the list                                          exists                                           qualifiers


Figure 1: Flowchart of the algorithm constructed for data extraction from the dump.

   Referring to the example structure of the item "Douglas Adams" (Q42), an example of a TSV
file construction and its format for the property educated at (69) can be seen in fig. 2.
   5
       For our experiments we used the one of 27th of March 2023: https://dumps.wikimedia.org/wikidatawiki/entities/


                                                                  5
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                                1–11


  educated at (P69)         St John's College (types Q1055028 and Q19844914)

                              end time (P582)            1974
                              academic major (P812)      English literature
                              academic degree (P512)     Bachelor of Arts
                              start time (P580)          1971


                            Brentwood School (types Q2418495 and Q269770)

                              end time (P582)            1970
                              start time (P580)          1959


                              P582         P812        P512 P580 s/Q5 o/Q1055028    o/Q19844914
        P69.tsv
                              P582         P580        s/Q5 o/Q2418495 o/Q269770


Figure 2: Example TSV file construction based on the Douglas Adams item (Q42), for the property
educated at (P69). The item has two statements for this property, hence two transactions are
generated, it includes the qualifiers on this statement (P[0-9]+), the types of the item – Q5 for Human
(s/Q[0-9]+), and the types of the value, e.g., the types of St. John’s College (o/Q[0-9]+).


   To adapt the SchemaTree to this data, we treat all parts of the transaction uniformly, and
in the same way the SchemaTree dealt with properties. To do this, we rewrite the types by
pre-pending information on in which role (as item/subject or as value/object) they occurred. an
example can be found in the lower part of fig. 2, the item types are prefixed with “s/”, while the
value types are prefixed by “o/”. The qualifiers are saved simply by their property identifier.

3.4. Data Preparation
Once all data available was extracted from the dump, it required further processing to allow for
a valid evaluation. Therefore, we separate the extracted data into a train and test set. A random
(80% - 20%) split was applied to the transactions of each TSV file. For a real deployment, one
would of course use all available data to create the SchemaTree model.

3.5. Model Generation
The next step in the context of the recommender is generating SchemaTrees that work as
input models for the final recommender system. We create one SchemaTree for each TSV file
constructed before, i.e., one for each property. These models are just SchemaTree structures.
   Another setup was explored, where a single large model was created, based on a concatena-
tion of all TSV files into one. This had, however, two negative effects. First, the recommender
became slower, because a larger tree needs to be considered. Second, the quality of the rec-
ommendations went down; we suspect this is caused by information about other properties


                                                          6
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                               1–11


causing mistakes, especially in combination with the backoff strategies. What we notice is that
having property-specific models gives the recommender the opportunity to learn the context of
qualifier occurrences better and more efficiently. Thus, the choice for many-models was made.

3.6. Recommender Server
Finally, a new method to serve the results was created, with small adaptations to the data
structures of the request and response. Examples can be found in fig. 3 below. When the
recommender server is started, one model per property type is pre-loaded into memory.
   Despite the backoff strategies, the recommendations will sometimes still be empty. To solve
this, and to make the evaluation more sound, we make sure that the recommender always
ranks all possible qualifiers. That is, first the results of the SchemaTree recommender, then the
recommendation where all qualifier information is stripped from the request, and finally the
order as provided by a purpose-built SchemaTree that does not even use the property type.
 {                                             {"recommendations": [
 "property": "P69",                            {"qualifier": "P512", "probability": 0.0252},
 "Qualifiers": ["P582", "P580"],               {"qualifier": "P812", "probability": 0.0101},
 "subjTypes": ["Q5"],                          {"qualifier": "P1326", "probability": 0.0050},
 "objTypes": ["Q2418495", "Q269770"]           {"qualifier": "P1534", "probability": 0.0050},
 }                                             ...

Figure 3: Examples of a request on the left and a (truncated) response on the right. The request asks
for more qualifiers for the educated at (P69) property with value Brentwood School from the example
in fig. 2. Besides the qualifiers, also the types of the item and the value are provided. The response
are qualifiers with their corresponding probabilities. Here the answers are academic degree (P512),
academic major (P812), latest date (P1326), and end cause (P1534).


4. Evaluation
Two experimental methods were employed to evaluate the recommender system. Each of them
provides insight into how informative specific types of data are to the recommender when
making suggestions. An additional method was added to act as the baseline when evaluating the
system. Key metrics about the individual models were computed and additional and aggregated
ones are also presented. This section describes the evaluation protocol.

4.1. Evaluation Methods
To generate recommendation tasks to solve, we first use the leave-one-out evaluation method,
as was used in [2]. This means that one qualifier of the transaction is left out, to be recommended
back by the system. The system thus receives the property type, co-occurring qualifiers, and,
depending on the configuration, type information.
   A second way to generate recommendations tasks for evaluation is what we call leave-all-
out. Here, all qualifiers are stripped from the test transactions and the recommender is expected
to recommend these back, solely relying on the property type and potentially type information.


                                                  7
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                               1–11


4.2. Obtaining Results
For each of the configurations, the two evaluation methods (leave-one-out and leave-all-out)
were performed to generate evaluation results. Additionally, results was generated for the
baseline recommender, which does not use any contextual information to make predictions,
and is hence independent of the method.
   For this process, only qualifier models with more than 100 transactions in the test set were
used. We noticed that those with fewer transactions lead to very spurious results, most likely
since not enough information was available for the model. This led to approximately 1060
models used for the evaluation. The recommendations results are evaluated using ranking
metrics, such as rank, hits@1, hits@5, and hits@10. We also record the left out qualifier and
the number of co-occurring qualifiers and types for further analysis.
   This further analysis is done by either grouping the results by a specific set size, or by com-
puting more general statistics for entire experiments. An important aspect that was considered
was the way averages are computed. In this scenario, using the leave-one-out method, each
transaction generally equates to more than one evaluation. Therefore, the metrics were first
micro-averaged by transaction, then further processed. For instance, when computing the
model average rank, the first step was micro-averaging all evaluation ranks by transaction, then
macro-averaging those values to obtain the final average. Also, when exploring the evaluation
results, some of the transactions appeared to have no qualifier prediction rank, being denoted by
the value 5843 (preset for this purpose). The percentage of missing recommendations was saved,
but the missing transactions were eliminated from the final evaluation set, as they make up a
very small amount, and would otherwise skew the results gravely. The reason for encountering
such results stems from the train-test split, as some of the qualifiers in the test set never appear in
the train set that was used for modeling. This, however, would never occur in a real production
setting where the models would be trained on the full Wikidata dump.
   After obtaining the results and closer inspection, two outlier models were identified, namely
the ones for P1855 and P5192. The results for these two models were unexpectedly poor. We
decided to not include them into the final evaluation because they are very generic properties,
namely properties for a Wikidata property example, and a Wikidata property example for
lexemes.

4.3. Results
For each of the evaluation methods, a table and two plots were generated. The first plot regards
the general metrics per configuration, whereas the second groups those metrics by SetSize and
aggregates the results. The table contains the evaluation results, calculated in terms of the
configuration type.
   leave-one-out As can be seen in table 1, the percentage of missing recommendations is very
low. The best performing configuration is TT, which includes all type information (both item
and value types). The second-best performer is the FT configuration.
   leave-all-out As can be seen in table 2, we obtain the same rankings with the TT as best
performing configuration and FT as second best. More visualizations of the results can be found
in the appendix in fig. 4 and fig. 5.


                                                  8
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                              1–11


Table 1
leave-one-out aggregated results.
    Recommender             Mean     Median   StdDev     Top1      Top5      Top10    Missing
    Baseline                1.8320   1.4962   1.1371    70.6825   96.2079   98.6246   0.0031
    No Type Info (FF)       5.1816   3.1143   7.0066    52.6749   77.4949   87.8853   0.0030
    Value Type Info (TF)    1.4301   1.0972   0.7393    87.6284   97.8587   99.1271   0.0032
    Item Type Info (FT)     1.3535   1.0812   0.6670    89.7055   98.2256   99.3074   0.0031
    Full Type Info (TT)     1.3132   1.0666   0.5918    91.1623   98.4774   99.3390   0.0031

Table 2
leave-all-out aggregated results.
    Recommender             Mean     Median   StdDev     Top1      Top5      Top10    Missing
    Baseline                1.8320   1.4962   1.1371    70.6825   96.2079   98.6246   0.0031
    No Type Info (FF)       1.8298   1.4887   1.1380    70.7192   96.2196   98.6489   0.0030
    Value Type Info (TF)    1.6807   1.4261   0.8752    73.6173   97.1556   98.9881   0.0032
    Item Type Info (FT)     1.5738   1.3534   0.7794    75.8690   97.8243   99.2349   0.0031
    Full Type Info (TT)     1.5293   1.3369   0.6823    77.0860   98.1101   99.2896   0.0031


5. Conclusions and Future Work
In this paper, a qualifier recommender system was built around the previously created trie-based
SchemaTree property recommender. Adaptations to the data extraction technique were made
to allow it to extract qualifier information from the Wikidata dump. Further modifications
revolved around the input-output request-response structure of the recommendation server.
The results of the four configurations evaluated in the paper were close to expectations. We
found that both the item and value type information, as well as the co-occurring qualifiers, are
important information when making recommendations. We further found that, models based
on item (subject) types outperformed the ones based on value (object) type on average.
   The current implementation of the qualifier recommender is limited by several factors. A first
improvement would be to restrict the type of qualifiers suggested using Wikidata constraints.
Moreover, when extracting the data for building the SchemaTree, more type information could
be obtained by traversing the subclass of (P279) type-hierarchy and collecting additional
types, rather than only the leaf-type as is currently done.
   Another aspect is that we currently use the back-off strategy which was shown to be best for
properties. A large scale evaluation could find that a different back-off is better for qualifiers.
   Currently, the recommender only gets information about the type of the item, the property,
the type of the value of the claim, and other qualifiers on the claim. However, also other claims
on the same item might have a useful predictive property. For example, if a Human (Q5) has
an employer (P108), then the educated at (P69) property is very likely has the qualifier end
time (P582). Incorporating this information is left as future work.
   One further idea is that we could make a single model for all properties by including the
property as part of the input set of the recommender. This might give some benefits when
two similar properties would have similar qualifier information, especially when one of the
properties is rarely used. This might also further reduce the overall memory usage at the cost


                                                9
Andrei Mihai Ducu et al. CEUR Workshop Proceedings                                          1–11


of a small performance hit.
   In terms of evaluation, one more method that would more accurately determine whether the
recommender performs well in a real setting could be integrated. Such an evaluation technique
would consist in generating a whole list of qualifiers from scratch, only making use of the type
information in the statements. Leave-all-out evaluation comes closest to this method.
   Finally, the best way to evaluate this is to have an actual A/B testing of the recommender,
where the current system used for Wikidata is compared with the proposed system in a practical
evaluation.


Acknowledgments
The SchemaTree is rather efficient, but requires a large in memory index to achieve this speed.
Besides, we experiment with many different configurations and do perform a lot of queries.
Therefore, Snellius (the Dutch national supercomputer) was used to run these experiments.
  Michael Cochez was partially funded by the Graph-Massivizer project, funded by the Horizon
Europe programme of the European Union (grant 101093202).


References
[1] D. Vrandečić, Wikidata: A new platform for collaborative data collection, in: Proceedings
    of the 21st international conference on world wide web, 2012, pp. 1063–1064.
[2] L. C. Gleim, R. Schimassek, D. Hüser, M. Peters, C. Krämer, M. Cochez, S. Decker, Schema-
    Tree: Maximum-likelihood property recommendation for Wikidata, in: European Semantic
    Web Conference, Springer, 2020, pp. 179–195.
[3] K. AlGhamdi, M. Shi, E. Simperl, Learning to recommend items to Wikidata editors, in:
    The Semantic Web–ISWC 2021: 20th International Semantic Web Conference, ISWC 2021,
    Virtual Event, October 24–28, 2021, Proceedings 20, Springer, 2021, pp. 163–181.
[4] S. Aljalbout, G. Falquet, D. Buchs, Handling Wikidata qualifiers in reasoning, arXiv preprint
    arXiv:2304.03375 (2023).


                                               10
Andrei Mihai Ducu et al. CEUR Workshop Proceedings       1–11


A. Result visualizations


Figure 4: Visualization of leave-one-out results.


                                                    11
Andrei Mihai Ducu et al. CEUR Workshop Proceedings       1–11


Figure 5: Visualization of leave-all-out results.


                                                    12