Do Users Matter? The Contribution of User-Driven Feature Weights to Open Dataset Recommendations Anusuriya Devaraju Shlomo Berkovsky CSIRO Mineral Resources CSIRO Data61 Kensington, Western Australia 6151 Eveleigh, New South Wales 2015 anusuriya.devaraju@csiro.au shlomo.berkovsky@csiro.au ABSTRACT which are defined heuristically by the system designers. The second The vast volumes of open data pose a challenge for users in finding utilizes the weights derived from a survey among target users of the relevant datasets. To address this, we developed a hybrid dataset rec- system. Our evaluation aims to uncover whether user-driven feature ommendation model that combines content-based similarity with weights lead to better recommendations than the uniform weights. The item-to-item co-occurrence. The features used by the recommender results indicate that the user-driven weights can improve dataset include dataset properties and usage statistics. In this paper, we recommendations, although this observation is mainly valid at cer- focus on fine-tuning the weights of these features. We experimen- tain level of data relevance. This finding highlights the importance tally compare two feature weighting approaches: a uniform one of considering the opinions of target users when designing a dataset with predefined weights and a user-driven one, where the weights recommender system. are informed by the opinions of system users. We evaluated the two approaches in a study, involving the users of a real-life data 2 OPEN DATA RECOMMENDATION MODEL portal. The results suggest that user-driven feature weights can Given a target dataset d examined by a user, we recommend its improve dataset recommendations, although not at all levels of data n most relevant datasets (d 1 , . . . , dn ) that are ranked according to relevance, and highlight the importance of incorporating target their similarity to d. The similarity between of d and di is: users in the design of recommender systems. Xn overall_sim(d, d i ) = (ωi · simi (d, d i )) , (1) KEYWORDS i=1 feature weighting, hybrid recommender system, open data. where ωi is the weight associated with a feature i and simi (d, d i ) is the similarity of d and di with respect to i. In total, we consider ten features: title, description, keyword, activity, research field, creator, 1 INTRODUCTION contributor, spatial, search, and download [2]. We deploy content- The adoption of open data policies by research institutions and based similarity and item-to-item co-occurrence [3] to compute the government agencies has led to a dramatic increase in the volume similarity of datasets. For the first eight features, the content-based of open data. Although open data brings numerous benefits, the similarity is used to identify similar datasets based on their meta- proliferation and the diversity of data make it difficult for users to data. For example, we use TF-IDF term weighting with Cosine find relevant datasets. Current data repositories primarily support Similarity for text-based features like title and description, and Jac- keyword and faceted search modes. These may benefit users, who card’s coefficient for categorical features like research field and can precisely express their needs and are familiar with the data creator. The item-to-item co-occurrence quantifies the similarity of repository, but may pose a challenge otherwise. In addition, the datasets by comparing their statistical co-occurrence based on their search may return a long list of loosely related results, which may joint appearance in search results and joint download by users. The aggravate the dataset discovery task. All this raises the issue of underlying assumption is that two datasets are related if they are delivering personalized dataset recommendations to users. Recom- returned in response to similar queries or are downloaded in the mender systems were applied in the past to assist the discovery of same session. scholars, articles, and citations [1]. To the best of our knowledge, recommending open datasets has not been thoroughly investigated 3 EXPERIMENT AND RESULTS yet. Singhal et al. [4] developed a context-based search for research As shown in Equation 1, feature-based similarity scores simi (d, d i ) datasets, which deployed similarity-based ranking based on topic, are aggregated linearly by using feature weights ωi . However, how abstract, and authors of datasets [4]. In our previous work, we de- should these weights be set? Will different weighting models affect veloped a hybrid dataset recommendation model that identified the quality of the recommendations? We consider two weighting relevant datasets by using both content-based and statistical fea- models. The first uses a fixed set of weights, which are defined tures, including dataset metadata and observable usage patterns heuristically by the system designers. For the sake of simplicity, no [2]. The features were combined in a linear manner into a single domain knowledge is applied, and the weights of all ten features dataset-to-dataset similarity score. are set to ωi = 0.1. We refer to this as the uniform weighting model. In this paper, we focus on the feature weights. We deploy and The second weighting model is a user-driven one, as it is informed evaluate two weighting models. The first uses fixed uniform weights, by the feature importance perceptions of the target system users. RecSys ’17 Poster Proceedings, Como, Italy We conducted a survey, which involved 151 users of a real data 2017. repository. These users were shown the 8 eight features in the RecSys ’17 Poster Proceedings, August 27–31, 2017, Como, Italy Anusuriya Devaraju and Shlomo Berkovsky Figure 1: Distribution of relevance judgments: (left) uniform feature weights, (right) user-driven feature weights. above list and asked to rate their importance on a 5-Likert scale. The survey revealed that title, description, and keywords were more important features, while creators and contributors were deemed less important. These importance scores were mapped onto the feature weights, e.g., ωt itl e = 0.123 and ωcr eat or s = 0.086. Experimental setup: We evaluated the uniform and user-driven Figure 2: Average similarity of top-200 of 1000 datasets. weighting models in two intra-group user studies that were approx- imately 2 months apart. In both studies, we showed to users a target 2 the average similarity of the recommended datasets at various dataset d they were familiar with, as well as with a list of 5 rec- ranks. This similarity exhibits a long-tail distribution. We believe ommended datasets, at fixed ranks i = 1, 3, 20, 80, 100 in the list of that the recommended datasets at rank 1 were related regardless datasets most similar to d. We showed the recommended datasets of the fine-tuned weights, as the strong user support of about 80% in a random order and asked the users to rate their relevance to suggests. Hence, the improvement was insignificant. However, at d on a 4-Likert scale, ranging from ‘very similar’ to ‘dissimilar’. rank 3, the average similarity is about 10% lower than at rank 1, as We obtained the judgments of 50 users who participated in both reflected by the lower user support dropping to the 50-60% mark. studies and jointly rated 82 target datasets. Thus, our results are Hence, the improvement introduced by the user-driven weighting based on 410 judgments obtained in each study. Note that in both was found to be strongly significant. We conclude, therefore, that studies every user judged recommendations by referring to the user-driven feature weights turn out to be particularly critical in same target dataset d. That said, the 5 recommended datasets might the borderline areas where the relevance of the datasets is unclear. have changed due to the different feature weighting model. We believe that this finding reflects the importance of the target Results: Figures 1-left and 1-right depict the distribution of system users’ opinions. the users’ relevance judgments assigned to the recommendations produced by the uniform and user-driven weighting models, respec- tively. The horizontal axis represents the rank i of the recommended 4 CONCLUSION dataset and the vertical axis indicates the distribution of the judg- In this paper, we studied the importance of user-driven feature ments. Since the results of the two studies are similar, we also weights in producing open data recommendations. We compared include the exact judgment distributions below the plots. It can be their performance against the baseline of heuristically set uniform observed that the user-driven weighting achieves a slight improve- weights. The results have showed that user-driven feature weights ment for datasets at rank 1. Here, 81.7% of the datasets were judged have a positive effect on user judgments, although this finding may ‘highly similar’ or ‘similar’, compared to the 79.3% obtained for the not necessarily be applicable at all ranks. We consider this work uniform weighting. The differences are more pronounced at rank to provide an important argument in favor of incorporating target 3 where the user-driven weighting was judged ‘highly similar’ or users in the early stage of designing a data recommender system. ‘similar’ in 62.2% of cases, while the uniform weighting resulted in 47.5%. The obtained judgments at ranks 20, 80, and 100 are predom- REFERENCES [1] J. Beel, B. Gipp, S. Langer, and C. Breitinger. 2016. Research-paper recommender inantly negative, so these datasets cannot be recommended and systems: a literature survey. International Journal on Digital Libraries 17, 4 (2016), are excluded from the analysis. We compared the user judgments 305–338. obtained across the two studies by using a pairwise t-test for means. [2] A. Devaraju and S. Berkovsky. 2017. A Hybrid Recommendation Approach for Open Research Datasets. In Proceedings of the 26th ACM International Conference We observed statistically significant differences at rank 3, p < 0.001 on Information and Knowledge Management (under review). while at rank 1 the differences were not significant. [3] L. Leydesdorff and L. Vaughan. 2006. Co-occurrence Matrices and Their Appli- Discussion: Although the results of both studies were compara- cations in Information Science: Extending ACA to the Web Environment. J. Am. Soc. Inf. Sci. Technol. 57, 12 (Oct. 2006), 1616–1628. ble at most ranks, our findings suggest that the user-driven feature [4] A. Singhal, R. Kasturi, V. Sivakumar, and J. Srivastava. 2013. Leveraging Web In- weighting improves the quality of the recommendations at ranks 1 telligence for Finding Interesting Research Datasets. In International Conferences on Web Intelligence (WI). 321–328. and 3. To acquire a better understanding of this, we plot in Figure