=Paper=
{{Paper
|id=Vol-1137/lakdatachallenge2014_submission_2
|storemode=property
|title=RecLAK: Analysis and Recommendation of Interlinking Datasets
|pdfUrl=https://ceur-ws.org/Vol-1137/lakdatachallenge2014_submission_2.pdf
|volume=Vol-1137
|dblpUrl=https://dblp.org/rec/conf/lak/LopesLNC14
}}
==RecLAK: Analysis and Recommendation of Interlinking Datasets==
RecLAK: Analysis and Recommendation of Interlinking Datasets Giseli Rabello Lopes Luiz André P. Paes Leme Departament of Informatics, PUC-Rio Computer Science Institute, UFF Rio de Janeiro/RJ, Brazil Niterói/RJ, Brazil grlopes@inf.puc-rio.br lapaesleme@ic.uff.br Bernardo Pereira Nunes Marco A. Casanova Departament of Informatics, PUC-Rio Departament of Informatics, PUC-Rio Rio de Janeiro/RJ, Brazil Rio de Janeiro/RJ, Brazil bnunes@inf.puc-rio.br casanova@inf.puc-rio.br ABSTRACT collection of open data available on the Web. One of the This paper presents the RecLAK , a Web application devel- datasets covered by the LinkedUp project is the Learning oped for the LAK Challenge 2014. RecLAK focuses on the Analytics and Knowledge (LAK) dataset. The LAK dataset, analysis of the LAK dataset metadata and provides recom- referred to as lak, provides access to structured fulltext and mendations of potential candidate datasets to be interlinked metadata from key research publications in the field of learn- with the LAK dataset. RecLAK follows an approach to gen- ing analytics and educational data mining6 . lak is regu- erate recommendations based on Bayesian classifiers and on larly updated with data, for instance, from the LAK (Learn- Social Networks Analysis measures. Furthermore, RecLAK ing Analytics and Knowledge) and EDM (Educational Data generates graph visualizations that explore the LAK dataset Mining) conference series. According to the DataHub meta- over other datasets in the Linked Open Data cloud. The re- data, lak was not linked to other datasets, except DBpedia. sults of the experiments contribute to the understanding and However, an exploratory search in the DataHub in fact re- improvement of the LAK dataset. Furthermore, it can also vealed related datasets that lak could be linked to, such as help researchers of the fields covered by LAK dataset, such other bibliographic datasets. as learning analytics and educational data mining. This scenario is very common. Most of the published datasets are still awaiting to be linked and, therefore, they do not ful- 1. INTRODUCTION fill the requirements to be considered 5-star [1] and fail to The effort of publishing Linked Data has been accompanied take advantage of other data. Basically, as argued in [10], by the creation of catalogs of Linked Data datasets, such as the linkage to popular datasets is favoured for two main the DataHub 1 , to make data findable and reusable. How- reasons: the difficulty of finding related open datasets; and ever, despite the fact that extensive lists of open datasets the strenuous task of discovering instance mappings between are available in these catalogs, most of the data publishers different datasets. typically link their datasets only to popular ones, such as DBpedia2 , Freebase3 and Geonames4 . Although the link- In this sense, lak will be explored as a case study. The rec- age to popular datasets allows the exploration of external ommendation challenge associated to the interlinking of lak resources, it fails to cover more specialized data. in the LOD can be posed by considering two main questions: As a practical example of this scenario, we may highlight the LinkedUp project5 , which is an initiative that aims at Q1. For a dataset d, published in the LOD, is it interesting providing educational organizations and institutions with a for the publisher of d to try to link it to lak ? 1 http://datahub.io/ 2 http://dbpedia.org Q2. For a dataset d, published in the LOD, is it interesting 3 http://www.freebase.com for the lak administrator to try to link his dataset to 4 http://www.geonames.org/ d? 5 http://linkedup-project.eu In more detail, let t and di be two datasets. A link from t to di is a triple of the form (s, p, o) such that s is defined in t and o is defined in di . We say that t is linked to di , or that di is linked from t, iff there is at least a link from t to di . We also say that di is relevant for t iff there is at least a resource defined in di that can be linked from a resource defined in t. 6 http://lak.linkededucation.org Questions Q1 and Q2 are special cases of the dataset inter- linking recommendation problem posed as follows: X ! score(di , t) = log(P (Fj |Di )) + log(P (Di )) (1) j=1..n Given a finite set of datasets D and a dataset t, compute a rank score for each dataset di ∈ D Based on the maximum likelihood estimate of the probabil- such that the rank score of di increases with the ities [8] in a training set of datasets, the above probabilities chances of di being relevant for t. can be estimated as follows: In this paper, we first introduce two rank score functions to count(fj , di ) count(di ) P (Fj |Di ) = Pn ; P (Di ) = Pm address the dataset interlinking recommendation problem. j=1 count(f j , d i ) i=1 count(di ) Then, we apply the functions to answer question Q2. where count(fj , di ) is the number of datasets in the train- The remainder of this paper is organized as follows. Sec- ing set that have feature fj and that are linked to di , and tion 2 presents related work. Section 3 briefly describes count(di ) is the number of datasets in the training set that the recommendation approaches. Section 4 shows the re- are linked to di , disregarding the feature set. sult analysis of the metadata exploration and the generated recommendations. Finally, Section 5 presents some final re- For the score function computation, some auxiliary functions marks. help to avoid computing log(0) replacing this value by c, which is a constant small enough to penalize the datasets di 2. RELATED WORK that do not have datasets with features Fj linked to them In this paper, we use an extended version [5] of previous work or that do not have links from other datasets [5]. Thus, the [3, 4], that introduced the rank score functions based on the idea is that, if the set of features of t is very often correlated Bayesian and the Social Network approaches. The extended with datasets that are linked to di and t is not already linked version also explores different sets of features related to the to di , then it is recommended to try to link t to di . metadata of the datasets, such as properties, classes and vocabularies, to compute the rank score functions. 3.2 Social Network-based ranking We propose to analyze the dataset interlinking recommen- Nikolov et al. [9, 10] propose an approach to identify rel- dation problem in much the same way as the link prediction evant datasets for interlinking applying keywords searches problem in Social Networks [7]. Analogously, the Linked and ontology matching techniques. Kuznetsov [2] describes Data network for D is a directed graph such that the nodes a linking system, which is responsible for discovering rele- are the datasets in D and there is an edge between datasets vant datasets for a given dataset and for creating instance u and v in D iff there is a link from u to v. To obtain more level linkage. When compared with these approaches, the accurate results, we combine two measures, Preferential At- rank score functions applied in this paper use only meta- tachment (pa) and Resource Allocation (ra), into a single data and are, therefore, much simpler to compute and yet score [5], defined as follows: achieve a good performance [5]. Lóscio et al. [6] and Wagner et al. [15] propose techniques to pa(t, di ) score(t, di ) = ra(t, di ) + (2) find relevant datasets for user queries. The first approach is |D| based on information quality criteria of correctness, schema completeness and data completeness while the second one X 1 is based on the overlapping of sets of instances of datasets. pa(t, di ) = |Pdi | ; ra(t, di ) = dj ∈St ∩Pd Pdj Oliveira et al. [13] use application queries and user feedback i to discover relevant datasets. These papers aim at recom- mending datasets with respect to user queries, which is a where Pdi is the popularity set of a dataset di ∈ D, that problem close, but not identical to the problem discussed in is, the set of all datasets in D that have links to di , and this paper. St is the similarity set of a dataset t, that is, the set of all datasets in D that have features in common with t. Nunes et al. [11, 12] performed several analysis on lak but their focus was mainly in the dataset content. They also pro- The combined score induces the ranking of the datasets in D posed other datasets to be interlinked with lak considering (from the largest to the smallest score) and gives priority to their links with DBpedia. By contrast, this paper focuses on the ra score; the pa score, normalized by the total number analyzing the metadata for creating rankings of candidate of datasets to be ranked (|D|), will play a role when there datasets to be interlinked with lak using different recom- is a tie or when the ra value is zero. mendation techniques. 4. RESULT ANALYSIS 3. RECOMMENDATION APPROACHES 4.1 Data used in the experiments 3.1 Bayesian ranking We selected a subset of the datasets indexed by the DataHub, A rank score function, inspired on conditional probabilities, using the Learning Analytics and Knowledge dataset [14] as that induces the ranking of the datasets in D (from the the target of the recommendation. From the DataHub cat- largest to the smallest score), can be defined as follows: alog, we managed to obtain 295 datasets with at least one 21/02/14 12 transparency-linked-data ecb-linked-data bfs-linked-data eu-who-is-who ontos-news-portal eu-institutions grrp oecd-linked-data world-bank-linked-data eu-parliament-media the-eurostat-linked-data red-uno-internacional-santillana istat-immigration global-hunger-index-2011 euskadi-farmacias lak b3kat eprtr norwegian-geo-divisions rechtspraak educationalprograms_sisvu lobid-resources dblp-deusto-gnoss eea-rod open-data-risp sandrart-net eurostat-rdf interactivemaps-gnoss rkb-explorer-darmstadt national-diet-library-authorities enakting-energy museosespania-gnoss rkb-explorer-kaunas proyectoapadrina rkb-explorer-jisc lobid-organisations prospects-and-trends-gnoss rkb-explorer-nsf farmers-markets-geographic-data-united-states nextweb-gnoss rkb-explorer-ibm rkb-explorer-risks rkb-explorer-epsrc gnoss knoesis-linked-sensor-data nytimes-linked-open-data rkb-explorer-ieee didactalia geonames-semantic-web rkb-explorer-pisa rkb-explorer-laas rkb-explorer-southampton telegraphis uk-legislation-api rkb-explorer-roma ineverycrea rkb-explorer-wiki environment-agency-bathing-water-quality miguiadeviajes-gnoss garnicaplywood rkb-explorer-newcastle enakting-population event-media rkb-explorer-deploy rkb-explorer-eprints rkb-explorer-kistirkb-explorer-irit rkb-explorer-ft aegp-spanish-textile-and-clothing-association artenuevosmedios-gnoss deustoentrepreneurship rkb-explorer-rae2001 fao-linked-data rkb-explorer-resexrkb-explorer-citeseer rkb-explorer-dblp green-competitiveness-gnoss ineverycrea-argentina ordnance-survey-linked-data rkb-explorer-acm rkb-explorer-eurecom rkb-explorer-lisbon open-data-euskadi chronicling-america rkb-explorer-curriculum museums-in-italy rkb-explorer-ulm rkb-explorer-budapest rkb-explorer-ecs rkb-explorer-cordis biographical-directory-of-the-united-states-congress idreffr japan-radioactivity-stat southampton-ecs-eprints rkb-explorer-dotac ecs my-experiment hellenic-police geowordnet rkb-explorer-courseware rkb-explorer-italy jamendo-dbtune psh-subject-headings hellenic-fire-brigade dnb-gemeinsame-normdatei john-goodwins-family-tree sweto-dblp sudocfr lcsh rkb-explorer-deepblue rkb-explorer-unlocode geospecies enakting-mortality eunis semanticweb-org enakting-crime diavgeia dcs-sheffield national-diet-library-subject-headings icane rkb-explorer-wordnet hebis-bibliographic-resources enakting-nhs rkb-explorer-webconf w3c-wordnet agris msc lexvo rkb-explorer-os stw-thesaurus-for-economics sztaki-lod fao-geopolitical-ontology oclc-fast glottolog-langdoc linked-open-camera deutsche-biographie taxonconcept tags2con-delicious gesis-thesoz asjp Figure 1: The datasets and their links. feature (class, property or vocabulary). Among the datasets and that are used in many datasets, such as owl:sameAs, with links defined, there are 139 datasets with 697 known rdf:Property, rdfs:Resource, among others. The core of the links. Figure 1 presents a graph representing the datasets selected set comes from the SWC ontology7 (Semantic Web and their known links. In this graph, the size of a dataset Conference), which describes academic conferences and es- node is proportional to the number of datasets linked to it tablishes a convention on how to use classes and properties (in-degree). from other ontologies, mostly FOAF (Friend of a Friend ), for people and organizations, and SWRC (Semantic Web for The number of distinct features between classes and proper- Research Communities), for papers. It also includes meta- ties was 11,868. The number of relations between datasets data from other ontologies, such as SIOC (Semantically- and classes or properties was 16,750, where 6,447 were refer- Interlinked Online Communities) and DC (Dublin Core). ences to classes and 10,303 were references to properties. For The selected lak features added to 37, where 31 of them the details on how we extracted metadata from the DataHub are shared by other datasets in our set of data. A preview catalog, see [5]. of the RecLAK interface showing the selected lak classes is presented in Figure 2. 4.2 LAK features 4.3 Datasets with LAK features As features of lak, we used a selected set of classes and The set of datasets (represented by their id in DataHub) properties obtained from the lak and from the LinkedUp that have at least one feature in common with lak consists project Web site. We filtered out, from 51 initial features, 7 those that were not related to the content of the dataset http://data.semanticweb.org/ns/swc/ontology RecLAK interface presenting the recommendations for LAK is presented in Figure 4. The top 10 recommendations generated by each of the two approaches (Bayesian and Social Network-based rankings) and the respective score values estimated for each recom- mended dataset are presented in Table 2. The top 10 ranked datasets for each approach will be briefly described below. Bayesian ranking. The topmost-ranked is a generic dataset with concepts from the Semantic Web community. Dataset #2 is a well-known lexical database of English. Datasets from #3 to #6 positions of the Bayesian ranking presented tied scores. Dataset #3 is a dataset with concepts from tags generated by human annotators. Dataset #4 de- scribes people, research groups and publications of the mem- bers of the Computer Science Department at the University Figure 2: Preview of the RecLAK interface showing of Sheffield. Dataset #5 is maintained by the chamber of the selected lak classes. deputies in Italy, which is working to publish quality linked data in several domains, including research. Dataset #6 describes the DBLP digital library, which provides biblio- Table 1: Top 10 datasets sharing features with lak. graphic information on major computer science journals and Dataset id # shared features proceedings. dblp also indexes the papers published in the rkb-explorer-webconf 31 LAK and EDM conferences. Dataset #7 is the Geonames linked-open-vocabularies-lov 8 dataset, which contains information about geographical lo- krystian-pietruszka 7 cations. Dataset #8 contains information about languages, aksworg 7 words, characters, and other human language-related enti- dcs-sheffield 6 ties to the Linked Data Web and Semantic Web. lexvo has southampton-ac-uk-profile 6 links to WordNet and thesauris. Dataset #9 is a Linked jamendo-dbtune 6 Data version of the Association for Computing Machinery sudocfr 6 (ACM) digital library. Finally, dataset #10 is a dataset of rkb-explorer-webscience 6 the Library of Congress Subject Headings (LCSH), which msc 6 catalogs materials stored by the Library of Congress and other libraries around the United States. of 132 datasets, with 376 associations between datasets and Social Network-based ranking. Since, there is some lak features. Figure 3 presents a graph representing the overlap between the top 10 recommendations of Social datasets and their associated lak features. In this graph, Network-based (SN-based) and Bayesian ranking, we will the size of a feature node is proportional to the number of comment the top 10 datasets ranked only by the SN-based datasets having it. approach. Dataset #2 publishes the news vocabularies used by The New York Times as Linked Open Data. It cov- Among the lak features, the most popular are from ers data and resources about people, locations and orga- DC: dc:title, shared by 60 datasets, and dc:creator, with nizations. Dataset #3 covers topics related to innovation, 56 datasets references, and from FOAF: foaf:name and technology, business and education. Dataset #6 has links foaf:homepage with, respectively, 41 and 36 other datasets catalogued in the DataHub for other bibliographic datasets beyond lak referring to them. The least popular features are such as Citeseer, DBLP, ACM, IEEE and EPrints. Dataset metadata directly from SWC and SWRC ontologies (some #7 was created with the objective of being capable of net- of them used by only 1 dataset other than lak ). working the wide range of resources and information held by libraries and other cultural institutions in German-speaking The datasets with more than 5 features shared with lak are countries. This dataset uses established vocabularies, such shown in Table 1. The more expressive result is obtained as FOAF. Dataset #9 describes e-prints and has links cata- by the rkb-explorer-webconf dataset which shares 31 fea- logued in the DataHub for other bibliographic datasets such tures with lak. This was the most correlated dataset with as Citeseer, DBLP, ACM and IEEE. Dataset #10 is also the selected classes and properties of lak. The rkb-explorer- a Linked Data version of publications information of the webconf is a semantic repository that publishes RDF linked DBLP digital library, similar to sweto-dblp. data and co-reference information from the RKB Explorer initiative. This dataset includes information about authors and publications in several conferences, such as ESWC. Discussion. Based on the top 10 rankings of both ap- proaches, we identified three main groups of candidate datasets that were recommended to be interlinked with lak : 4.4 Dataset Interlinking recommendations Using the score functions, briefly described in Section 3, we generated recommendations for lak. A preview of the • generic: semanticweb-org, w3c-wordnet, tags2con- 21/02/14 secold acorn-sat ifpri-linked-open-data-global-hunger-index rkb-explorer-lisbon miguiadeviajes-gnoss sweto-dblp rkb-explorer-newcastle rkb-explorer-ibm toxcast-toxrefdb proyectoapadrina rkb-explorer-acm rkb-explorer-cordis rkb-explorer-roma twc-healthdata chronicling-america osm-semantic-network rkb-explorer-irit rkb-explorer-italy vivo-indiana-university rkb-explorer-deploy bibsonomy rkb-explorer-deepblue rkb-explorer-unlocode libver vivo-university-of-florida rkb-explorer-courseware datagov-catalog rkb-explorer-pisa educationalprograms_sisvu southampton-ac-uk-jargon geospecies business_terms rkb-explorer-epsrc rkb-explorer-ulm hellenic-police rkb-explorer-ieee swrc:abstract rkb-explorer-ft rkb-explorer-nsf dce:creator jita bibo:authorList swc:relatedToEvent productontology rkb-explorer-era swc:isPartOf rkb-explorer-rae2001 rkb-explorer-budapest dce:subject swrc:year swc:hasPart rkb-explorer-darmstadt dce:titleglottolog-langdoc newsweek oecd-linked-data swrc:month rkb-explorer-southampton rkb-explorer-laas open-data-euskadi rkb-explorer-risks hebis-bibliographic-resources iso-3166-2-data swc:hasAcronym world-bank-linked-data temple-ov-thee-lemur-datasets rkb-explorer-digitaleconomy swrc:InProceedings hedatuz jamendo-dbtune foaf:made iris2 landscape-portrait rkb-explorer-kaunas rkb-explorer-eurecom linked-open-vocabularies-lov ndaa2011 psh-subject-headings glastonbury-2011 garnicaplywood swc:completeGraph swrc:booktitle lak bizkaisense swrc:affiliation foaf:based_near public-record-office-victoria-semantic-wiki sudocfr interactivemaps-gnoss rkb-explorer-webconf instance-hub-us-civil-servants southampton-ac-uk-pressinfo rkb-explorer-jisc hellenic-fire-brigade rdfohloh foaf:maker geological-survey-of-austria-thesaurus nytimes-linked-open-data aksworg greek-legal-entities rkb-explorer-citeseer idreffrsouthampton-ac-uk-photos foaf:homepage southampton-ac-uk-profile dcs-sheffield swc:hasRelatedDocument dct:subject krystian-pietruszka swc:ConferenceEvent camera-deputati-linked-data linked-structured-product-labels fao-geopolitical-ontology foaf:name ontos-news-portal dbtune-john-peel-sessions foaf:member foaf:mbox_sha1sum msc foaf:Person euskadi-farmacias instance-hub-us-congressional-committees movies-argentina courts-thesaurus swrc:Proceedings debian-package-tracking-system nobelprizes southampton-ac-uk-apps event-media open-food-facts rkb-explorer-webscience foaf:Organization lod2 cablegate southampton-ac-uk-phonebook german-labor-law-thesaurus swrc:series twc-data-gov semanticweb-org beneficiaries-of-the-european-commission transparency-linked-data foaf:firstName eu-parliament-media national-diet-library-authorities sparql-endpoint-status instance-hub-people foaf:lastName eurostat-rdf southampton-ac-uk-services taxonconcept kdata national-diet-library-subject-headings instance-hub-organizations sandrart-net vivo-weill-cornell-medical-college southampton-ac-uk-org qualitywebdata-org vivo instance-hub-us-federal-agencies arrayexpress_e-mtab-104 vivo-scripps-research-institute vivo-ponce enipedia Figure 3: The datasets and their associated lak features. delicious, geonames-semantic-web, lexvo, nytimes- with smaller popularity and having at least one feature of linked-open-data, rkb-explorer-wiki lak. • bibliographic: dcs-sheffiedl, linked-open-camera, The results also indicate that the selection of the feature set sweto-dblp, rkb-explorer-acm, lcsh, dnb-gemeinsame- is very important because it directly influences the generated normdatei, rkb-explorer-eprints, rkb-explorer-dblp rankings and can lead to recommendations of datasets which are more as well as less generic. In our experiments with lak, • educational area: gnoss. we filtered out some generic features (e.g., owl:sameAs), but included DC and FOAF elements. Thus, we expected that both generic and specific datasets from our set of datasets The top 10 recommendations of the rankings differ in were recommended. As the metadata used to triplify lak some aspects. Considering the groups identified above, were not using classes and properties specifically related the Bayesian ranking contains a higher number of generic to the application domain, this characteristic was not ev- datasets, while the Social Network-based ranking contains idenced in the recommendation results. a higher number of bibliographic datasets. This probably happens because Bayesian ranking prioritizes recommenda- 5. CONCLUSIONS tions for lak of datasets linked from the larger number of This paper presented a detailed analysis, based on Bayesian other datasets having the larger number of lak features. On classifiers and on Social Network Analysis techniques, to ad- the other hand, the Social Network-based ranking prioritizes dress the dataset interlinking recommendation problem for the datasets pointed by the larger number of other datasets lak, using only metadata. Thus, the rank score functions are Table 2: Top 10 ranked recommendations for lak. # Bayesian ranking score∗ # SN-based ranking score 1 semanticweb-org -162.025 1 geonames-semantic-web 13.738 2 w3c-wordnet -162.236 2 nytimes-linked-open-data 3.558 3 tags2con-delicious -163.025 3 gnoss 3.051 4 dcs-sheffield -163.025 4 lcsh 3.017 5 linked-open-camera -163.025 5 rkb-explorer-acm 2.430 6 sweto-dblp -163.025 6 rkb-explorer-wiki 2.408 7 geonames-semantic-web -3281.339 7 dnb-gemeinsame-normdatei 2.020 8 lexvo -4107.754 8 lexvo 2.017 9 rkb-explorer-acm -4114.493 9 rkb-explorer-eprints 1.632 10 lcsh -4273.558 10 rkb-explorer-dblp 1.466 ∗ Estimated using log2 , c=-170 and considering only lak features shared with at least one dataset. Casanova, and S. Dietze. Recommending tripleset interlinking through a social network approach. In WISE’13, pages 149–161, 2013. [5] G. R. Lopes, L. A. P. Paes, B. P. Nunes, M. A. Casanova, and S. Dietze. Comparing recommendation approaches for dataset interlinking. Technical report, Department of Informatics, PUC-Rio, 2013. [6] B. F. Lóscio, M. Batista, and D. Souza. Using information quality for the identification of relevant web data sources. In IIWAS’12, pages 36–44, New York, NY, USA, 2012. ACM. [7] L. Lü, C.-H. Jin, and T. Zhou. Similarity index based on local paths for link prediction of complex networks. Physical Review E, 80(4):046122, 2009. [8] C. D. Manning and H. Schütze. Foundations of Statistical Natural Language Processing. MIT Press, 2002. [9] A. Nikolov and M. d’Aquin. Identifying Relevant Sources for Data Linking using a Semantic Web Index. In LDOW’11, 2011. Figure 4: Preview of the RecLAK recommendation [10] A. Nikolov, M. d’Aquin, and E. Motta. What Should I interface. Link to? Identifying Relevant Sources and Classes for Data Linking. In JIST’12, pages 284–299. Springer potentially useful to reduce the cost of dataset interlinking. Berlin Heidelberg, 2012. For more information, including the full set of data used [11] B. P. Nunes, B. Fetahu, and M. A. Casanova. in the experiments, graphical visualizations and detailed re- Cite4me: Semantic retrieval and analysis of scientific sults, we refer to the RecLAK Web application, avaliable at publications. In LAK (Data Challenge), volume 974 of http://www.inf.puc-rio.br/~grlopes/RecLAK. CEUR Workshop Proceedings. CEUR-WS.org, 2013. [12] B. P. Nunes, B. Fetahu, S. Dietze, and M. A. 6. ACKNOWLEDGMENTS Casanova. Cite4me: A semantic search and retrieval web application for scientific publications. In ISWC This work was partly funded by CNPq, under grants (Posters & Demos), volume 1035 of CEUR Workshop 160326/2012-5, 301497/2006-0 and 57128/2009-9, and Proceedings, pages 25–28. CEUR-WS.org, 2013. by FAPERJ, under grants E-26/170028/2008 and E- 26/103.070/2011. [13] H. R. d. Oliveira, A. T. Tavares, and B. F. Lóscio. Feedback-based data set recommendation for building linked data applications. In I-SEMANTICS’12, pages 7. REFERENCES 49–55, 2012. [1] T. Berners-Lee. Linked Data. In Design Issues. W3C, [14] D. Taibi and S. Dietze. Fostering analytics on learning July 2006. analytics research: the lak dataset. In LAK (Data [2] K. A. Kuznetsov. Scientific data integration system in Challenge), volume 974 of CEUR Workshop the linked open data space. Programming and Proceedings. CEUR-WS.org, 2013. Computer Software, 39(1):43–48, Jan. 2013. [15] A. Wagner, P. Haase, A. Rettinger, and H. Lamm. [3] L. A. P. P. Leme, G. R. Lopes, B. P. Nunes, M. A. Discovering related data sources in data-portals. In Casanova, and S. Dietze. Identifying candidate SemStats workshop, ISWC’13, 2013. datasets for data interlinking. In ICWE’13, pages 354–366, 2013. [4] G. R. Lopes, L. A. P. P. Leme, B. P. Nunes, M. A.