=Paper= {{Paper |id=None |storemode=property |title=R3 - A Related Resource Recommender |pdfUrl=https://ceur-ws.org/Vol-585/paper4.pdf |volume=Vol-585 }} ==R3 - A Related Resource Recommender== https://ceur-ws.org/Vol-585/paper4.pdf
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                            R3 - A Related Resource Recommender

                                         Thomas Kurz, Tobias Bürger and Rolf Sint

                                           Salzburg Research Forschungsgesellschaft
                                        Jakob Haringer Str. 5/3, 5020 Salzburg, Austria
                                          firstname.lastname@salzburgresearch.at



                            Abstract. Due to the ever growing amount of content in the Web of
                            Data, the retrieval of relevant information is challenging. Currently, effi-
                            cient resource recommendation methods are lacking, that could ease the
                            exploration of data in the Web of Data. To alleviate this situation, this
                            paper proposes the R3 resource recommendation framework for retrieval
                            of data in the Linked Open Data (LOD) cloud. It analyses relevant search
                            engines and interlinking frameworks and, based on that, proposes the R3
                            framework which is illustrated both in theoretical and practical details.
                            The framework enables the recommendation of (RDF) resources from
                            the LOD cloud based on textual, structural, or semantic similarity.


                     1    Introduction
                     The goal of Linking Open Data (LOD) community is to bootstrap the Seman-
                     tic Web (the “Web of Data”) by publishing and interconnecting datasets using
                     RDF[1]. The outcome of this movement is the so called LOD cloud which grew
                     to 13.1 billion triples and 142 million RDF links in the last two years and it is
                     still growing [2].
                     As within the traditional, document-centric Web, search and retrieval of infor-
                     mation is of utmost importance. Similarly, a big challenge for a specific end user
                     or application, operating on the Web of Data, is to find relevant data that serves
                     their specific needs. Despite the fact, that Linked Data browsers and search en-
                     gines are available to explore content in the LOD cloud, means to issue complex
                     queries by ordinary users or to recommend content in the cloud based on par-
                     ticular interests, are currently lacking. In case a user is searching for the city
                     of Berlin using a LOD search engine, he is able to retrieve resources with many
                     properties such as their names, descriptions, latitude, longitude, or density of
                     population. If she now would like to retrieve related resources such as a ranked
                     list of cities ordered by geographical distance and/or density of population or
                     resources with similar structure (like countries or provinces) ranked on the se-
                     mantic similarity of their textual description, she will fail with current search
                     engines. Similarly the recommendation of related resources could allow the user
                     to issue a “Query by Example” by defining some kind of a fake-resource and use
                     it as query base, which would be a novel form for searching the Web of Data.

                     In order to alleviate this situation, this paper investigates the state of the art




7th Extended Semantic Web Conference (ESWC 2010)                                                           Page 33 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                     in LOD search engines and interlinking frameworks (Section 2) and, based on
                     that, proposes the R3 resource recommendation framework that is capable of
                     recommending data from the LOD cloud based on the semantic, structural, or
                     textual similarity of given resources. The framework allows to query for related
                     things in the LOD cloud based on a given resource and is illustrated including its
                     requirements, conceptual architecture, and implementation aspects (Section 3).
                     Finally, details are given on how to further advance and implement the frame-
                     work (Section 4).


                     2     Resource Discovery and Interlinking in the LOD Cloud
                     There are some applications on the web, which allow the user to search or browse
                     the web of data. Supplementary to that there are so called Interlinking Frame-
                     works that can be used to check the resources of two or more different datasets
                     pairwise for similarity. Because of the analogies to our approach these frame-
                     works should also be considered in the following discussion.

                     2.1   Browsers and Search Engines
                     Sindice1 , as described in [3], is a scalable index of the Semantic Web. It crawls
                     the Web for RDF Documents and Microformats and indexes resulting resource
                     URIs, Inverse Functional Properties (IFPs) and keywords. A human user can
                     access these documents through a simple user interface, based on indexes men-
                     tioned above.
                     Sigma2 is rather a semantic information mashup enabled by Sindice than a
                     self-contained semantic search service. Nevertheless it enriches a lot of its func-
                     tionalities with some nice additional features. It works as Web of Data browser
                     where the user can start from any entity (found by a fulltext search) and then
                     browse to the resulting page. The resources index is build out of from sites which
                     use RDF, RDFa or Microformats.
                     The Open Link Search3 will list entities with a user-defined text pattern occur-
                     ring in any literal property value or label. It also supports Entity URI lookup.
                     The Search can be redefined by filtering type, property value, etc.
                     It is also possible to execute SPARQL queries by using the SPARQL endpoint.
                     Some demo queries are predefined and can easily be altered via text input fields.
                     Falcons4 is described in [5] as a service for searching and browsing entities on the
                     Semantic Web. It is a keyword-based search engine for the Semantic Web URIs
                     and provides different query types for object, concept and document search.
                     Falcons also gives the facility of facetting over types by dynamically recommend-
                     ing ontologies. The recommendation is based on a combination of the TF-IDF
                     technique and the popularity of ontologies.
                      1
                        http://sindice.com/
                      2
                        http://sig.ma/
                      3
                        http://lod.openlinksw.com/
                      4
                        http://iws.seu.edu.cn/services/falcons/objectsearch/index.jsp




7th Extended Semantic Web Conference (ESWC 2010)                                                            Page 34 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                     Watson5 offers keyword based querying to obtain a URI-list of semantic doc-
                     uments in which the keywords appear as identifiers or in literals of classes, prop-
                     erties, and individual. Search options make it possible to restrict the search space
                     to particular types of entities (classes, properties or individuals) and to partic-
                     ular elements within the entities (e.g. local name, label, comment).
                     SWSE6 is a search engine for the RDF Web. Similar search engines currently
                     provided for the HTML Web it looks like a ordinary fulltext search. But the
                     information retrieval capabilities of SWSE are much more powerful because of
                     the inherent semantics of RDF and other Semantic Web languages.
                     Swoogle7 allows a user to search through ontologies, instance data, and terms
                     of the Semantic Web. Furthermore it supports browsing the Web of Data. This
                     search engine also uses an archive functionality to identify and provide different
                     versions of Semantic Web documents.

                     Like described above, each considered semantic search service provides a cer-
                     tain amount of functionalities. Some of them are part of two or more services,
                     others are exclusive to one certain engine. Though it is possible to search for
                     appearance of a given resource in some of them, neither it is possible to find re-
                     lated resources for a resource and its RDF triples nor to define on which triples
                     the relationship should be calculated on. Also the search engines do not consider
                     a semantic similarity of queries and content, which definitely could increase the
                     quality of result. But there are applications in the area of Semantic Web which
                     match some of these requirements in certain ways - the interlinking frameworks.

                     2.2   Interlinking frameworks
                     Interlinking frameworks for semantic web data try to detect related and link
                     resources in different datasets. In [8] several frameworks are compared to each
                     other concerning their functionalities, which brings us to the decision that the
                     Silk8 approach is rather related to our goals.
                     Silk[7] is a framework for detecting explicit RDF links between data items within
                     different data sources. Using the declarative Silk - Link Specification Language
                     (Silk-LSL), developers can specify which types of RDF links should be discovered
                     between data sources and, based on arbitrary metrics and aggregation functions,
                     which resources should be declared as related. Silk accesses the interlinking can-
                     didates via the SPARQL protocol.
                     The usage of different metrics and aggregation functions for different types of
                     properties can be adopted to our resource recommender. In addition we can
                     remodel Silk-LSL in some ways (e.g. alternative metrics) and use it as query
                     syntax. This language makes it also possible to define the appropriated data-
                     sources by query.
                      5
                        http://kmi-web05.open.ac.uk/WatsonWUI/
                      6
                        http://swse.deri.org/
                      7
                        http://swoogle.umbc.edu/
                      8
                        http://www4.wiwiss.fu-berlin.de/bizer/silk/spec/




7th Extended Semantic Web Conference (ESWC 2010)                                                            Page 35 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                     3     R3 - A Conceptual Overview
                     Our intent is to build a recommender service, which allows to query for related
                     resources from various (predefined) datasources based on a given resource. But
                     what is relatedness, what factors have an impact on it and how can we implement
                     such a recommender service? This is discussed in the following sections.

                     3.1   Requirements
                     In case of RDF resources there are various factors which define relatedness. On
                     the one hand the RDF structure itself (predicates and non-literal objects) reveals
                     something about how similar two resources are. On the other hand the literal
                     properties can be compared according to their types towards different metrics.
                     That can be simple ones like euclidean metric for numbers, or more complex like
                     semantic similarity of texts. A user should be able to specify the factors that
                     are used to find relevant related resources, and also its impact on the result. In
                     addition to that the whole recommendation process should be calculated in an
                     adequate time. So we can specify requirements below:
                      1. Recommend related resources from the LOD cloud based on a given RDF
                         resource.
                      2. Consider semantic similarity of texts and structural similarity of resources.
                      3. Offer a comparison mechanism for literals with adjustable metrics.
                      4. Allow user defined feature boost; that means a certain feature (e.g. property
                         x or structure) has a higher relevance on relatedness than others.
                      5. Return related resources ordered by relevance.

                     3.2   Conceptual Architecture
                     The concept to fulfill these requirements is illustrated in Figure 1. The data must
                     be fetched from the LOD cloud, combined and indexed; it should be queryable
                     via a specific search syntax. This process is described more precisely in this sec-
                     tion.

                     Data Consolidation
                     The service gets recommendable resources out of the Linked Data Cloud. Since
                     it should possible, to build a multi-source index, there must be a kind of ontology
                     alignment. Thus preprocessed data is stored directly into the index. The single
                     datasources must be reindexed in given time intervals.

                     Resource Recommender Index
                     A core index can provide lot of metrics like euclidean distance, date similarity,
                     string equality, etc. Semantic similarity which can be used to evaluate the se-
                     mantic distance of texts and RDF structures is more complex, therefore we need
                     a supplemental semantic index. Semantic textual indices (one for each defined
                     property) as well as the semantic structure index (one for the whole dataset) are




7th Extended Semantic Web Conference (ESWC 2010)                                                           Page 36 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                                             Fig. 1: Design and workflow of R3




                     build out of the core index.

                     Resource Query
                     To get recommended resources based on a given one, the recommender provides
                     a query language, whereby the user can specify, which features should be in-
                     cluded in the calculation according to which metric. Furthermore the factor how
                     intensive a specific feature impacts the result and how the diverse values are
                     combined is configurable by query. To restrict the set of base resources the user
                     can define the included datasets. The searchresult is list of resources ranked by
                     relevance.

                     3.3     Implementation
                     Datasets, which build our resources base is taken out from the LOD cloud via
                     SPARQL. To map different resources from sources we use a simple mapping ta-
                     ble. Complex ontology matching strategies like in [9] are also possible.
                     Because of its high scalability, its fast query processing and the possibility to
                     use integrated functions and numerical as well as token-based comparison, we
                     decided to use SOLR9 as our index base. A lot of metrics like euclidean distance,
                     date similarity, string equality, etc. are provided by or can be directly integrated
                     into SOLR index. As described, for more complex metrics we need supplemental
                     semantic indices build out of the SOLR index.

                     Text-based Semantic Index
                     A potential semantic index can be a Semantic Vector Index. This approach bases
                     upon the Vector Space Model wherein every document is represented as a vec-
                     tor in an n-dimensional term space according to appearing terms. The Semantic
                      9
                          http://lucene.apache.org/solr/




7th Extended Semantic Web Conference (ESWC 2010)                                                            Page 37 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                     Vector Package10 is able to build such an Index (which can be queried for se-
                     mantic related documents) out of the basic Lucene Index.

                     Structure-based Semantic Index
                     The semantic vector index can also be used to index the semantic similarity of
                     RDF structures. Therefore not every word or text module is integrated in the
                     term model but the URI, RDF predicates and non-literal objects of a resource.
                     Figure 2 shows the semantic similarity of a subset of dbpedia resources. To illus-
                     trate this semantic space we build a structure distance matrix of this resources
                     and scaled it to two dimensions using classical multidimensional scaling (MDS)
                     offered by the R statistics software11 . We highlighted resources of different types
                     which shows that related resources have a similar RDF structure.




                                            Fig. 2: Evaluation of Structure Index




                     Query Language
                     As mentioned, the SILK Link Specification Language12 can be used as inspira-
                     tion for a query format that fulfills our query requirements and allows to specify
                     the basic resource (set of RDF triples or URI), the considered datasets (SPARQL
                     endpoints used from data consolidator), relevant features and its impact and the
                     applied metrics (taken from a fix set). Figure 3 shows an simple query example.



                     4    Further Work

                     In this paper we described the conceptual architecture of a resource recommen-
                     dation framework for the Semantic Web. Our future work includes the implemen-
                     tation of this concept and a practical evaluation with real datasets. In a further
                     step we plan to optimize the Semantic Vector package, which is used in one core
                     10
                        http://code.google.com/p/semanticvectors/
                     11
                        http://www.r-project.org/
                     12
                        http://www4.wiwiss.fu-berlin.de/bizer/silk/spec/#specification




7th Extended Semantic Web Conference (ESWC 2010)                                                            Page 38 of 64
1st International Workshop on Adaptation, Personalization and REcommendation in the Social-semantic Web (APRESW 2010)




                                          Fig. 3: Sample for a Recommander Query



                     component of the framework, to enhance its scalability and performance. The
                     resulting recommender will be integrated into the KiWi13 system.

                     References
                     1. C. Bizer et al. Linked Data - The Story So Far. International Journal on Semantic
                        Web and Information Systems (IJSWIS), Vol. 5, Issue 3, 2009.
                     2. Linking Open Data: W3C SWEO Community Project. http://esw.w3.org/topic/
                        SweoIG/TaskForces/CommunityProjects/LinkingOpenData, 2010.
                     3. E. Oren et al. Sindice.com: a document-oriented lookup index for open linked data.
                        Int. J. Metadata, Semantics and Ontologies, Vol. 3, No. 1, 2008.
                     4. DERI Galway: Sindice API for Query Services. http://sindice.com/developers/
                        api, 2008-2009.
                     5. G. Cheng and Y. Qu. Searching linked objects with Falcons: Approach, implemen-
                        tation and evaluation. International Journal on Semantic Web and Information Sys-
                        tems 5(3):49-70, September 2009
                     6. W.B. Frakes and R.A. Baeza-Yates. Information Retrieval: Data Structures and
                        Algorithms. Prentice-Hall, New Jersey, 1992.
                     7. J. Volz et al. SILK - A Link discovery framework for the Web of Data. Linked Data
                        on the Web (LDOW2009), Madrid, 2009.
                     8. F. Scharffe and J. Euzenat. Alignments for data interlinking. http://melinda.
                        inrialpes.fr, 2009
                     9. C. A. Curino et al. X-SOM: A Flexible Ontology Mapper. 18th International Con-
                        ference on Database and Expert Systems Applications (DEXA 2007), 2007.

                     13
                          http://kiwi-project.eu/




7th Extended Semantic Web Conference (ESWC 2010)                                                             Page 39 of 64