<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Qualifier Recommendation for Wikidata⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrei Mihai Ducu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Michael Cochez</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Vrije Universiteit Amsterdam</institution>
          ,
          <country country="NL">The Netherlands</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Wikidata, a collaborative knowledge base for structured data, empowers both human and machine users to contribute and access information. Its main role is in supporting Wikimedia projects by acting as the central storage database for the Wikimedia movement. To optimize the manual process of adding new facts, Wikidata utilizes the association rule-based PropertySuggester tool. However, a recent paper introduced the SchemaTree, a novel approach that surpasses the state-of-the-art PropertySuggester in all performance metrics. The new recommender employs a trie-based method and frequentist inference to eficiently learn and represent property set probabilities within RDF graphs. In this paper, we adapt that recommendation approach, to recommend qualifiers. Specifically, we want to find out whether the recommendation can be done using co-occurrence information of the qualifiers, or whether type information of the item and the value of statements improves performance. We found that the qualifier recommender that uses co-occurring qualifiers and type information leads to the best performance.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Wikidata</kwd>
        <kwd>Qualifiers</kwd>
        <kwd>Recommender</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>Contributors with diverse backgrounds and varying levels of expertise may encounter
dificulties when editing records in a complex knowledge base such as Wikidata. Erroneous updates
could potentially result in data inconsistencies and incompleteness. Therefore, assisting the
users in the editing process is of utmost importance to preserve data quality and accuracy,
while greatly reducing the workload. At the user interface side, there are two systems active on
Wikidata to improve the quality. First, there are constraints on properties, which check whether
they are applied on items of the right type, and whether the values are within the expected
range. Second, there are recommender systems, which can act as a guide for the end users when
adding properties to items and qualifiers to properties.</p>
      <p>
        This papers focuses on the latter aspect, improving the recommendations for adding qualifiers
on properties, such that manual editors can update qualifier information for statements about
items on Wikidata. The recommender system used for this purpose was adapted from the
previous work on the SchemaTree property recommender [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. SchemaTree introduces a novel
approach, rooted in trie structures, to compute probability distributions of property sets within
RDF graphs. For this project, the recommender suggests qualifiers instead of properties, based
on a) co-occurring qualifiers and b) type information of the item (subject), as well as the
corresponding property value (object).
      </p>
      <p>In order to determine the best configuration of the new qualifier recommender, four
conifgurations were tested, each corresponding to a diferent level of information provided to
the system. The first configuration recommends qualifiers using only co-occurring qualifier
information. The second and third, use either item (subject) or value (object) type information.
Lastly, full contextual information was used for the recommendation, namely co-occurring
qualifiers, and both item and value type information.</p>
      <p>Two questions were formulated around the four configurations investigated. The main
question, and a secondary one that stems from the main question. The two research questions
are:
• Does including type information improve the performance of the qualifier recommender
system?
• What kind of type information is more informative, the type of the item or the type of
the value?
These questions are investigated by evaluating each configuration, with two diferent evaluation
methods, against a held-out test set consisting of 20% of all data extracted from Wikidata. The
performance of the configurations is compared to a baseline model that make recommendations
solely on absolute qualifier occurrence frequency, without using any other contextual
information. What we find is that adding type information is nearly always beneficial. The code is
available on GitHub2.</p>
      <p>
        2Qualifier Recommender: https://github.com/Duculet/QualifierRecommender/tree/eval_handlers
The SchemaTree Recommender3 proposes a novel approach to new property recommendations
within the Wikidata project. This system comes as an alternative to the currently used
PropertySuggester4. The newly introduced recommender makes use of the maximum-likelihood
of properties to suggest additional ones. The recommender leverages a compact trie-based
data structure called the SchemaTree, which integrates the representation of property and type
co-occurrences. It specializes on the eficient lookup of such patterns, being constructed as an
adapted trie construction of a frequent pattern tree. This data structure type enables eficient
probability calculations and eficient property pair retrieval. Next, the SchemaTree structure is
introduced and described. This information was adapted from [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>
        The SchemaTree is created as a data structure to facilitate property recommendations based
on maximum-likelihood estimation. These recommendations are generated for a given item,
denoted as , and its set of properties, denoted as  = 1, . . . , , where  ⊆  (subset of the
available properties  in Wikidata). The goal of recommending maximum-likelihood properties
is to identify the most likely property ˆ ∈  ∖ , meaning a property the item does not have
already. The property ˆ has to be found such that the following holds:
ˆ = argmax ( | {1, . . . , }) = argmax
∈(∖) ∈(∖)  ({1, . . . , })
 ({, 1, . . . , })
where  ({1, . . . , }) denotes the probability that a selected entity has at least the properties
1, . . . , . In line with this, the recommended properties are the ones that exhibit the highest
frequency of co-occurrence with the properties already possessed by the given entity [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>By adopting a frequentist probability interpretation the joint probabilities are estimated
based on the relative frequency of occurrence. The absolute frequency of a set of properties, i.e.
the number of items that have (at least) this set of properties, is represented as supp(). By
reformulating Equation 1, the estimation of the most probable property recommendation can
be expressed as follows:
ˆ ≃ argmax
∈(∖) supp (1, . . . , )</p>
      <p>supp (, 1, . . . , )</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and Related Work</title>
      <sec id="sec-2-1">
        <title>2.1. SchemaTree Recommender</title>
        <p>The SchemaTree structure aims to optimize the computation time for estimating this
probability in the context of all data already contained within Wikidata.</p>
        <p>Besides finding the properties with the highest probability, the SchemaTree also uses back-of
strategies in case the recommendations are not good enough. They found that the best back-of
strategy was to rerun the system with the least popular property removed from the property set,
in case there are no recommended properties, which happens when all have a zero probability.
This gets repeated up to four times, until a recommendation is found. In this work, we use the
same setup. In future work, it should be investigated whether there is a better backof strategy
specifically for qualifiers.</p>
        <p>3SchemaTree Recommender: https://github.com/lgleim/SchemaTreeRecommender
4PropertySuggester: http://gerrit.wikimedia.org/r/admin/projects/mediawiki/extensions/PropertySuggester
1–11
(1)
(2)</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Other Works</title>
        <p>
          Other recommender systems were proposed throughout the years. Another such system
that was recently put forth is the WikidataRec [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. The system employs a hybrid approach
that combines content-based and collaborative filtering techniques to rank items for editors.
This hybrid approach considers both the features of the items themselves and the previous
interactions between items and editors. To achieve this, a neural network called "neural mixture
of representations" is developed. This neural network is specifically designed to learn optimal
weights for combining item-based representations and editor-based representations, taking
into account the interactions between items and editors. By leveraging these interactions,
the system aims to optimize the ranking of items and improve the overall recommendation
quality for editors. Based on their experimental data, the system proved to perform well in
situations where the data fed into the model was dense. However, collaborative filtering was
found to be less useful in the case of sparse editing data, which makes up most of the available
data [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ]. Another approach taken to handle Wikidata qualifiers was by using reasoning. This
entails defining inference rules, specifically on ontological properties. The paper proposes
handling of qualifiers using inference rules, although the system presented does not implement
a recommender system. However, it is interesting to see how they overcame the massive
number of qualifiers and practically implemented a prototype that can express all of Wikidata’s
ontological properties [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Qualifier Recommender</title>
      <p>This section provides a comprehensive description and analysis of the work conducted, from
data extraction to data structure, including adaptations made to the SchemaTree code to work
with qualifiers instead of properties.</p>
      <sec id="sec-3-1">
        <title>3.1. SchemaTree Adaptation</title>
        <p>Several parts of the original SchemaTree recommender code were adapted such that it could be
used to enable the recommendation of qualifiers instead of properties. Also, some new additions
were made in order to extract qualifier information from the Wikidata dump file, and evaluate
the new recommender system. Changes and additions were made to the following components:
Extractor Added the functionality to extract qualifier information from the Wikidata JSON
dump and save it in one TSV file for each property type.</p>
        <p>Splitter Used for the evaluation to split the TSV files in a train and a test set.
Datatypes Updated so it can detect and count the types that occur alongside the qualifiers
(item / value types)
RecommenderServer Added a handler for qualifier recommendation requests. This handler
receives the property of the statement for which recommendations are sought, as well as
potentially the types of the item and value of the statement.</p>
        <p>These updates and additions will be detailed in the following sub-sections.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Configurations</title>
        <p>Four main configurations were explored, as previously mentioned. They are the following:
• FF - No type information included
• TF - Value (object) type information included
• FT - Item (subject) type information included
• TT - Both value (object) and item (subject) type information included
Each of the experiments that follow were conducted four times, once for each configuration.
The next subsections will only detail the TT configuration. The same pipeline was applied to
the other configurations, leaving out the respective types of information.</p>
        <p>Besides these configurations, we also have a baseline configuration, which makes no use of
property, nor item and value type information.</p>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Data Extraction</title>
        <p>Data used to generate the models for the recommender system was obtained in the form of
a BZIP2 compressed Wikidata JSON dump5 consisting of all items and their representative
features in the knowledge base. For each unique property in the dataset, a TSV file was generated
incorporating information about the occurrences of that specific property throughout Wikidata.
For example, in the file for P50, each entry (row) contains information about one occurrence of
the property being used on wikidata. We store the used qualifiers, as well as the item (subject)
and value (object) types for this occurrence. We call the collected information for one occurrence,
a transaction, in accordance with the frequent item set literature. The extraction workflow is
further detailed in fig. 1.</p>
        <p>Start
End</p>
        <p>Wikidata
JSON
dump
No</p>
        <p>Items
available
to read</p>
        <p>Yes</p>
        <p>Read item
data</p>
        <p>No</p>
        <p>Yes</p>
        <p>Extract type
information</p>
        <p>Store in
dictionary
First pass</p>
        <p>No
Store list as
transaction in
tsv file
No</p>
        <p>Satavtaeimlabelnets Yes stateRmeeandt data
to read</p>
        <p>No
Append type
information to
the list</p>
        <p>Get type from
dictionary</p>
        <p>Type
Yes information
exists</p>
        <p>Save qualifiers Yes
to a list</p>
        <p>Statement
contains
qualifiers</p>
        <p>Referring to the example structure of the item "Douglas Adams" (Q42), an example of a TSV
ifle construction and its format for the property educated at (69) can be seen in fig. 2.
5For our experiments we used the one of 27th of March 2023: https://dumps.wikimedia.org/wikidatawiki/entities/
educated at (P69)</p>
        <p>To adapt the SchemaTree to this data, we treat all parts of the transaction uniformly, and
in the same way the SchemaTree dealt with properties. To do this, we rewrite the types by
pre-pending information on in which role (as item/subject or as value/object) they occurred. an
example can be found in the lower part of fig. 2, the item types are prefixed with “s/”, while the
value types are prefixed by “o/”. The qualifiers are saved simply by their property identifier.</p>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Data Preparation</title>
        <p>Once all data available was extracted from the dump, it required further processing to allow for
a valid evaluation. Therefore, we separate the extracted data into a train and test set. A random
(80% - 20%) split was applied to the transactions of each TSV file. For a real deployment, one
would of course use all available data to create the SchemaTree model.</p>
      </sec>
      <sec id="sec-3-5">
        <title>3.5. Model Generation</title>
        <p>The next step in the context of the recommender is generating SchemaTrees that work as
input models for the final recommender system. We create one SchemaTree for each TSV file
constructed before, i.e., one for each property. These models are just SchemaTree structures.</p>
        <p>Another setup was explored, where a single large model was created, based on a
concatenation of all TSV files into one. This had, however, two negative efects. First, the recommender
became slower, because a larger tree needs to be considered. Second, the quality of the
recommendations went down; we suspect this is caused by information about other properties
causing mistakes, especially in combination with the backof strategies. What we notice is that
having property-specific models gives the recommender the opportunity to learn the context of
qualifier occurrences better and more eficiently. Thus, the choice for many-models was made.</p>
      </sec>
      <sec id="sec-3-6">
        <title>3.6. Recommender Server</title>
        <p>Finally, a new method to serve the results was created, with small adaptations to the data
structures of the request and response. Examples can be found in fig. 3 below. When the
recommender server is started, one model per property type is pre-loaded into memory.</p>
        <p>Despite the backof strategies, the recommendations will sometimes still be empty. To solve
this, and to make the evaluation more sound, we make sure that the recommender always
ranks all possible qualifiers. That is, first the results of the SchemaTree recommender, then the
recommendation where all qualifier information is stripped from the request, and finally the
order as provided by a purpose-built SchemaTree that does not even use the property type.
{
"property": "P69",
"Qualifiers": ["P582", "P580"],
"subjTypes": ["Q5"],
"objTypes": ["Q2418495", "Q269770"]
}
{"recommendations": [
{"qualifier": "P512", "probability": 0.0252},
{"qualifier": "P812", "probability": 0.0101},
{"qualifier": "P1326", "probability": 0.0050},
{"qualifier": "P1534", "probability": 0.0050},
...</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Evaluation</title>
      <p>Two experimental methods were employed to evaluate the recommender system. Each of them
provides insight into how informative specific types of data are to the recommender when
making suggestions. An additional method was added to act as the baseline when evaluating the
system. Key metrics about the individual models were computed and additional and aggregated
ones are also presented. This section describes the evaluation protocol.</p>
      <sec id="sec-4-1">
        <title>4.1. Evaluation Methods</title>
        <p>
          To generate recommendation tasks to solve, we first use the leave-one-out evaluation method,
as was used in [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. This means that one qualifier of the transaction is left out, to be recommended
back by the system. The system thus receives the property type, co-occurring qualifiers, and,
depending on the configuration, type information.
        </p>
        <p>A second way to generate recommendations tasks for evaluation is what we call
leave-allout. Here, all qualifiers are stripped from the test transactions and the recommender is expected
to recommend these back, solely relying on the property type and potentially type information.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Obtaining Results</title>
        <p>For each of the configurations, the two evaluation methods (leave-one-out and leave-all-out)
were performed to generate evaluation results. Additionally, results was generated for the
baseline recommender, which does not use any contextual information to make predictions,
and is hence independent of the method.</p>
        <p>For this process, only qualifier models with more than 100 transactions in the test set were
used. We noticed that those with fewer transactions lead to very spurious results, most likely
since not enough information was available for the model. This led to approximately 1060
models used for the evaluation. The recommendations results are evaluated using ranking
metrics, such as rank, hits@1, hits@5, and hits@10. We also record the left out qualifier and
the number of co-occurring qualifiers and types for further analysis.</p>
        <p>This further analysis is done by either grouping the results by a specific set size, or by
computing more general statistics for entire experiments. An important aspect that was considered
was the way averages are computed. In this scenario, using the leave-one-out method, each
transaction generally equates to more than one evaluation. Therefore, the metrics were first
micro-averaged by transaction, then further processed. For instance, when computing the
model average rank, the first step was micro-averaging all evaluation ranks by transaction, then
macro-averaging those values to obtain the final average. Also, when exploring the evaluation
results, some of the transactions appeared to have no qualifier prediction rank, being denoted by
the value 5843 (preset for this purpose). The percentage of missing recommendations was saved,
but the missing transactions were eliminated from the final evaluation set, as they make up a
very small amount, and would otherwise skew the results gravely. The reason for encountering
such results stems from the train-test split, as some of the qualifiers in the test set never appear in
the train set that was used for modeling. This, however, would never occur in a real production
setting where the models would be trained on the full Wikidata dump.</p>
        <p>After obtaining the results and closer inspection, two outlier models were identified, namely
the ones for P1855 and P5192. The results for these two models were unexpectedly poor. We
decided to not include them into the final evaluation because they are very generic properties,
namely properties for a Wikidata property example, and a Wikidata property example for
lexemes.</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Results</title>
        <p>For each of the evaluation methods, a table and two plots were generated. The first plot regards
the general metrics per configuration, whereas the second groups those metrics by SetSize and
aggregates the results. The table contains the evaluation results, calculated in terms of the
configuration type.</p>
        <p>leave-one-out As can be seen in table 1, the percentage of missing recommendations is very
low. The best performing configuration is TT, which includes all type information (both item
and value types). The second-best performer is the FT configuration.</p>
        <p>leave-all-out As can be seen in table 2, we obtain the same rankings with the TT as best
performing configuration and FT as second best. More visualizations of the results can be found
in the appendix in fig. 4 and fig. 5.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusions and Future Work</title>
      <p>In this paper, a qualifier recommender system was built around the previously created trie-based
SchemaTree property recommender. Adaptations to the data extraction technique were made
to allow it to extract qualifier information from the Wikidata dump. Further modifications
revolved around the input-output request-response structure of the recommendation server.
The results of the four configurations evaluated in the paper were close to expectations. We
found that both the item and value type information, as well as the co-occurring qualifiers, are
important information when making recommendations. We further found that, models based
on item (subject) types outperformed the ones based on value (object) type on average.</p>
      <p>The current implementation of the qualifier recommender is limited by several factors. A first
improvement would be to restrict the type of qualifiers suggested using Wikidata constraints.
Moreover, when extracting the data for building the SchemaTree, more type information could
be obtained by traversing the subclass of (P279) type-hierarchy and collecting additional
types, rather than only the leaf-type as is currently done.</p>
      <p>Another aspect is that we currently use the back-of strategy which was shown to be best for
properties. A large scale evaluation could find that a diferent back-of is better for qualifiers.</p>
      <p>Currently, the recommender only gets information about the type of the item, the property,
the type of the value of the claim, and other qualifiers on the claim. However, also other claims
on the same item might have a useful predictive property. For example, if a Human (Q5) has
an employer (P108), then the educated at (P69) property is very likely has the qualifier end
time (P582). Incorporating this information is left as future work.</p>
      <p>One further idea is that we could make a single model for all properties by including the
property as part of the input set of the recommender. This might give some benefits when
two similar properties would have similar qualifier information, especially when one of the
properties is rarely used. This might also further reduce the overall memory usage at the cost
of a small performance hit.</p>
      <p>In terms of evaluation, one more method that would more accurately determine whether the
recommender performs well in a real setting could be integrated. Such an evaluation technique
would consist in generating a whole list of qualifiers from scratch, only making use of the type
information in the statements. Leave-all-out evaluation comes closest to this method.</p>
      <p>Finally, the best way to evaluate this is to have an actual A/B testing of the recommender,
where the current system used for Wikidata is compared with the proposed system in a practical
evaluation.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>The SchemaTree is rather eficient, but requires a large in memory index to achieve this speed.
Besides, we experiment with many diferent configurations and do perform a lot of queries.
Therefore, Snellius (the Dutch national supercomputer) was used to run these experiments.</p>
      <p>Michael Cochez was partially funded by the Graph-Massivizer project, funded by the Horizon
Europe programme of the European Union (grant 101093202).
Andrei Mihai Ducu et al. CEUR Workshop Proceedings</p>
    </sec>
    <sec id="sec-7">
      <title>A. Result visualizations</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D.</given-names>
            <surname>Vrandečić</surname>
          </string-name>
          ,
          <article-title>Wikidata: A new platform for collaborative data collection</article-title>
          ,
          <source>in: Proceedings of the 21st international conference on world wide web</source>
          ,
          <year>2012</year>
          , pp.
          <fpage>1063</fpage>
          -
          <lpage>1064</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Gleim</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Schimassek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Hüser</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Krämer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Cochez</surname>
          </string-name>
          , S. Decker,
          <article-title>SchemaTree: Maximum-likelihood property recommendation for Wikidata</article-title>
          , in: European Semantic Web Conference, Springer,
          <year>2020</year>
          , pp.
          <fpage>179</fpage>
          -
          <lpage>195</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>K.</given-names>
            <surname>AlGhamdi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Shi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Simperl</surname>
          </string-name>
          , Learning to recommend items to Wikidata editors,
          <source>in: The Semantic Web-ISWC</source>
          <year>2021</year>
          : 20th International Semantic Web Conference,
          <string-name>
            <surname>ISWC</surname>
          </string-name>
          <year>2021</year>
          ,
          <string-name>
            <given-names>Virtual</given-names>
            <surname>Event</surname>
          </string-name>
          ,
          <source>October 24-28</source>
          ,
          <year>2021</year>
          , Proceedings 20, Springer,
          <year>2021</year>
          , pp.
          <fpage>163</fpage>
          -
          <lpage>181</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Aljalbout</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Falquet</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Buchs</surname>
          </string-name>
          ,
          <article-title>Handling Wikidata qualifiers in reasoning</article-title>
          ,
          <source>arXiv preprint arXiv:2304.03375</source>
          (
          <year>2023</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>