Do Users Matter? The Contribution of User-Driven Feature
              Weights to Open Dataset Recommendations
                            Anusuriya Devaraju                                               Shlomo Berkovsky
                        CSIRO Mineral Resources                                                CSIRO Data61
                    Kensington, Western Australia 6151                                Eveleigh, New South Wales 2015
                       anusuriya.devaraju@csiro.au                                      shlomo.berkovsky@csiro.au

ABSTRACT                                                                 which are defined heuristically by the system designers. The second
The vast volumes of open data pose a challenge for users in finding      utilizes the weights derived from a survey among target users of the
relevant datasets. To address this, we developed a hybrid dataset rec-   system. Our evaluation aims to uncover whether user-driven feature
ommendation model that combines content-based similarity with            weights lead to better recommendations than the uniform weights. The
item-to-item co-occurrence. The features used by the recommender         results indicate that the user-driven weights can improve dataset
include dataset properties and usage statistics. In this paper, we       recommendations, although this observation is mainly valid at cer-
focus on fine-tuning the weights of these features. We experimen-        tain level of data relevance. This finding highlights the importance
tally compare two feature weighting approaches: a uniform one            of considering the opinions of target users when designing a dataset
with predefined weights and a user-driven one, where the weights         recommender system.
are informed by the opinions of system users. We evaluated the
two approaches in a study, involving the users of a real-life data       2    OPEN DATA RECOMMENDATION MODEL
portal. The results suggest that user-driven feature weights can         Given a target dataset d examined by a user, we recommend its
improve dataset recommendations, although not at all levels of data      n most relevant datasets (d 1 , . . . , dn ) that are ranked according to
relevance, and highlight the importance of incorporating target          their similarity to d. The similarity between of d and di is:
users in the design of recommender systems.                                                                     Xn
                                                                                     overall_sim(d, d i ) =          (ωi · simi (d, d i )) ,   (1)
KEYWORDS                                                                                                     i=1
feature weighting, hybrid recommender system, open data.                 where ωi is the weight associated with a feature i and simi (d, d i ) is
                                                                         the similarity of d and di with respect to i. In total, we consider ten
                                                                         features: title, description, keyword, activity, research field, creator,
1    INTRODUCTION                                                        contributor, spatial, search, and download [2]. We deploy content-
The adoption of open data policies by research institutions and          based similarity and item-to-item co-occurrence [3] to compute the
government agencies has led to a dramatic increase in the volume         similarity of datasets. For the first eight features, the content-based
of open data. Although open data brings numerous benefits, the           similarity is used to identify similar datasets based on their meta-
proliferation and the diversity of data make it difficult for users to   data. For example, we use TF-IDF term weighting with Cosine
find relevant datasets. Current data repositories primarily support      Similarity for text-based features like title and description, and Jac-
keyword and faceted search modes. These may benefit users, who           card’s coefficient for categorical features like research field and
can precisely express their needs and are familiar with the data         creator. The item-to-item co-occurrence quantifies the similarity of
repository, but may pose a challenge otherwise. In addition, the         datasets by comparing their statistical co-occurrence based on their
search may return a long list of loosely related results, which may      joint appearance in search results and joint download by users. The
aggravate the dataset discovery task. All this raises the issue of       underlying assumption is that two datasets are related if they are
delivering personalized dataset recommendations to users. Recom-         returned in response to similar queries or are downloaded in the
mender systems were applied in the past to assist the discovery of       same session.
scholars, articles, and citations [1]. To the best of our knowledge,
recommending open datasets has not been thoroughly investigated          3    EXPERIMENT AND RESULTS
yet. Singhal et al. [4] developed a context-based search for research    As shown in Equation 1, feature-based similarity scores simi (d, d i )
datasets, which deployed similarity-based ranking based on topic,        are aggregated linearly by using feature weights ωi . However, how
abstract, and authors of datasets [4]. In our previous work, we de-      should these weights be set? Will different weighting models affect
veloped a hybrid dataset recommendation model that identified            the quality of the recommendations? We consider two weighting
relevant datasets by using both content-based and statistical fea-       models. The first uses a fixed set of weights, which are defined
tures, including dataset metadata and observable usage patterns          heuristically by the system designers. For the sake of simplicity, no
[2]. The features were combined in a linear manner into a single         domain knowledge is applied, and the weights of all ten features
dataset-to-dataset similarity score.                                     are set to ωi = 0.1. We refer to this as the uniform weighting model.
   In this paper, we focus on the feature weights. We deploy and         The second weighting model is a user-driven one, as it is informed
evaluate two weighting models. The first uses fixed uniform weights,     by the feature importance perceptions of the target system users.
RecSys ’17 Poster Proceedings, Como, Italy                               We conducted a survey, which involved 151 users of a real data
2017.                                                                    repository. These users were shown the 8 eight features in the
RecSys ’17 Poster Proceedings, August 27–31, 2017, Como, Italy                                        Anusuriya Devaraju and Shlomo Berkovsky


       Figure 1: Distribution of relevance judgments: (left) uniform feature weights, (right) user-driven feature weights.

above list and asked to rate their importance on a 5-Likert scale.
The survey revealed that title, description, and keywords were more
important features, while creators and contributors were deemed
less important. These importance scores were mapped onto the
feature weights, e.g., ωt itl e = 0.123 and ωcr eat or s = 0.086.
   Experimental setup: We evaluated the uniform and user-driven
                                                                             Figure 2: Average similarity of top-200 of 1000 datasets.
weighting models in two intra-group user studies that were approx-
imately 2 months apart. In both studies, we showed to users a target
                                                                         2 the average similarity of the recommended datasets at various
dataset d they were familiar with, as well as with a list of 5 rec-
                                                                         ranks. This similarity exhibits a long-tail distribution. We believe
ommended datasets, at fixed ranks i = 1, 3, 20, 80, 100 in the list of
                                                                         that the recommended datasets at rank 1 were related regardless
datasets most similar to d. We showed the recommended datasets
                                                                         of the fine-tuned weights, as the strong user support of about 80%
in a random order and asked the users to rate their relevance to
                                                                         suggests. Hence, the improvement was insignificant. However, at
d on a 4-Likert scale, ranging from ‘very similar’ to ‘dissimilar’.
                                                                         rank 3, the average similarity is about 10% lower than at rank 1, as
We obtained the judgments of 50 users who participated in both
                                                                         reflected by the lower user support dropping to the 50-60% mark.
studies and jointly rated 82 target datasets. Thus, our results are
                                                                         Hence, the improvement introduced by the user-driven weighting
based on 410 judgments obtained in each study. Note that in both
                                                                         was found to be strongly significant. We conclude, therefore, that
studies every user judged recommendations by referring to the
                                                                         user-driven feature weights turn out to be particularly critical in
same target dataset d. That said, the 5 recommended datasets might
                                                                         the borderline areas where the relevance of the datasets is unclear.
have changed due to the different feature weighting model.
                                                                         We believe that this finding reflects the importance of the target
   Results: Figures 1-left and 1-right depict the distribution of
                                                                         system users’ opinions.
the users’ relevance judgments assigned to the recommendations
produced by the uniform and user-driven weighting models, respec-
tively. The horizontal axis represents the rank i of the recommended
                                                                         4     CONCLUSION
dataset and the vertical axis indicates the distribution of the judg-    In this paper, we studied the importance of user-driven feature
ments. Since the results of the two studies are similar, we also         weights in producing open data recommendations. We compared
include the exact judgment distributions below the plots. It can be      their performance against the baseline of heuristically set uniform
observed that the user-driven weighting achieves a slight improve-       weights. The results have showed that user-driven feature weights
ment for datasets at rank 1. Here, 81.7% of the datasets were judged     have a positive effect on user judgments, although this finding may
‘highly similar’ or ‘similar’, compared to the 79.3% obtained for the    not necessarily be applicable at all ranks. We consider this work
uniform weighting. The differences are more pronounced at rank           to provide an important argument in favor of incorporating target
3 where the user-driven weighting was judged ‘highly similar’ or         users in the early stage of designing a data recommender system.
‘similar’ in 62.2% of cases, while the uniform weighting resulted in
47.5%. The obtained judgments at ranks 20, 80, and 100 are predom-       REFERENCES
                                                                          [1] J. Beel, B. Gipp, S. Langer, and C. Breitinger. 2016. Research-paper recommender
inantly negative, so these datasets cannot be recommended and                 systems: a literature survey. International Journal on Digital Libraries 17, 4 (2016),
are excluded from the analysis. We compared the user judgments                305–338.
obtained across the two studies by using a pairwise t-test for means.     [2] A. Devaraju and S. Berkovsky. 2017. A Hybrid Recommendation Approach for
                                                                              Open Research Datasets. In Proceedings of the 26th ACM International Conference
We observed statistically significant differences at rank 3, p < 0.001        on Information and Knowledge Management (under review).
while at rank 1 the differences were not significant.                     [3] L. Leydesdorff and L. Vaughan. 2006. Co-occurrence Matrices and Their Appli-
   Discussion: Although the results of both studies were compara-             cations in Information Science: Extending ACA to the Web Environment. J. Am.
                                                                              Soc. Inf. Sci. Technol. 57, 12 (Oct. 2006), 1616–1628.
ble at most ranks, our findings suggest that the user-driven feature      [4] A. Singhal, R. Kasturi, V. Sivakumar, and J. Srivastava. 2013. Leveraging Web In-
weighting improves the quality of the recommendations at ranks 1              telligence for Finding Interesting Research Datasets. In International Conferences
                                                                              on Web Intelligence (WI). 321–328.
and 3. To acquire a better understanding of this, we plot in Figure