Towards Semantic Recommendation of Biodiversity
               Datasets based on Linked Open Data

      Felicitas Löffler                      Bahar Sateli                      René Witte                  Birgitta König-Ries
   Dept. of Mathematics        Semantic Software Lab    Semantic Software Lab    Friedrich Schiller University
  and Computer Science Dept. of Computer Science Dept. of Computer Science           Jena, Germany and
Friedrich Schiller University and Software Engineering and Software Engineering German Centre for Integrative
      Jena, Germany             Concordia University     Concordia University    Biodiversity Research (iDiv)
                                 Montréal, Canada         Montréal, Canada      Halle-Jena-Leipzig, Germany

ABSTRACT                                                                  1.   INTRODUCTION
Conventional content-based filtering methods recommend                       Content-based recommender systems observe a user’s brows-
documents based on extracted keywords. They calculate the                 ing behaviour and record the interests [1]. By means of natu-
similarity between keywords and user interests and return a               ral language processing and machine learning techniques, the
list of matching documents. In the long run, this approach                user’s preferences are extracted and stored in a user profile.
often leads to overspecialization and fewer new entries with              The same methods are utilized to obtain suitable content
respect to a user’s preferences. Here, we propose a seman-                keywords to establish a content profile. Based on previously
tic recommender system using Linked Open Data for the                     seen documents, the system attempts to recommend similar
user profile and adding semantic annotations to the index.                content. Therefore, a mathematical representation of the user
Linked Open Data allows recommendations beyond the con-                   and content profile is needed. A widely used scheme are TF-
tent domain and supports the detection of new information.                IDF (term frequency-inverse document frequency) weights
One research area with a strong need for the discovery of                 [19]. Computed from the frequency of keywords appearing
new information is biodiversity. Due to their heterogeneity,              in a document, these term vectors capture the influence of
the exploration of biodiversity data requires interdisciplinary           keywords in a document or preferences in a user profile. The
collaboration. Personalization, in particular in recommender              angle between these vectors describes the distance or the
systems, can help to link the individual disciplines in bio-              closeness of the profiles and is calculated with similarity mea-
diversity research and to discover relevant documents and                 sures, like the cosine similarity. The recommendation lists of
datasets from various sources. We developed a first prototype             these traditional, keyword-based recommender systems often
for our semantic recommender system in this field, where a                contain very similar results to those already seen, leading
multitude of existing vocabularies facilitate our approach.               to overspecialization [11] and the “Filter-Bubble”-effect [17]:
                                                                          The user obtains only content according to the stored prefer-
                                                                          ences, other related documents not perfectly matching the
Categories and Subject Descriptors                                        stored interests are not displayed. Thus, increasing diversity
H.3.3 [Information Storage And Retrieval]: Informa-                       in recommendations has become an own research area [21, 25,
tion Search and Retrieval; H.3.5 [Information Storage                     24, 18, 3, 6, 23], mainly used to improve the recommendation
And Retrieval]: Online Information Services                               results in news or movie portals.
                                                                             One field where content recommender systems could en-
                                                                          hance daily work is research. Scientists need to be aware
General Terms                                                             of relevant research in their own but also neighboring fields.
Design, Human Factors                                                     Increasingly, in addition to literature, the underlying data
                                                                          itself and even data that has not been used in publications
                                                                          are being made publicly available. An important example
Keywords                                                                  for such a discipline is biodiversity research, which explores
content filtering, diversity, Linked Open Data, recommender               the variety of species and their genetic and characteristic
systems, semantic indexing, semantic recommendation                       diversity [12]. The morphological and genetic information of
                                                                          an organism, together with the ecological and geographical
                                                                          context, forms a highly diverse structure. Collected and
                                                                          stored in different data formats, the datasets often contain or
                                                                          link to spatial, temporal and environmental data [22]. Many
                                                                          important research questions cannot be answered by working
                                                                          with individual datasets or data collected by one group, but
                                                                          require meta-analysis across a wide range of data. Since the
                                                                          analysis of biodiversity data is quite time-consuming, there is
Copyright c by the paper’s authors. Copying permitted only                a strong need for personalization and new filtering techniques
for private and academic purposes.                                        in this research area. Ordinary search functions in relevant
In: G. Specht, H. Gamper, F. Klan (eds.): Proceedings of the 26th GI-     data portals or databases, e.g., the Global Biodiversity In-
Workshop on Foundations of Databases (Grundlagen von Datenbanken),
21.10.2014 - 24.10.2014, Bozen, Italy, published at http://ceur-ws.org.
formation Facility (GBIF)1 and the Catalog of Life,2 only          that several types of relations can be taken into account.
return data that match the user’s query exactly and fail at        For instance, for a user interested in “geology”, the profile
finding more diverse and semantically related content. Also,       contains the concept “geology” that also permits the recom-
user interests are not taken into account in the result list.      mendation of inferred concepts, e.g., “fossil”. The idea of
We believe our semantic-based content recommender system           recommending related concepts was first introduced by Mid-
could facilitate the difficult and time-consuming research         delton et al. [15]. They developed Quickstep, a recommender
process in this domain.                                            system for research papers with ontological terms in the user
   Here, we propose a new semantic-based content recom-            profile and for paper categories. The ontology only considers
mender system that represents the user profile as Linked           is-a relationships and omits other relation types (e.g., part-
Open Data (LOD) [9] and incorporates semantic annotations          of). Another simple hierarchical approach from Shoval et
into the recommendation process. Additionally, the search          al. [13] calculates the distance among concepts in a profile
engine is connected to a terminology server and utilizes the       hierarchy. They distinguish between perfect, close and weak
provided vocabularies for a recommendation. The result list        match. When the concept appears in both a user’s and docu-
contains more diverse predictions and includes hierarchical        ment’s profile, it is called a perfect match. In a close match,
concepts or individuals.                                           the concept emerges only in one of the profiles and a child or
   The structure of this paper is as follows: Next, we de-         parent concept appears in the other. The largest distance is
scribe related work. Section 3 presents the architecture of        called a weak match, where only one of the profiles contains a
our semantic recommender system and some implementation            grandchild or grandparent concept. Finally, a weighted sum
details. In Section 4, an application scenario is discussed. Fi-   over all matching categories leads to the recommendation
nally, conclusions and future work are presented in Section 5.     list. This ontological filtering method was integrated into the
                                                                   news recommender system epaper. Another semantically en-
                                                                   hanced recommender system is Athena [10]. The underlying
2.   RELATED WORK                                                  ontology is used to explore the semantic neighborhood in the
   The major goal of diversity research in recommender sys-        news domain. The authors compared several ontology-based
tems is to counteract overspecialization [11] and to recom-        similarity measures with the traditional TF-IDF approach.
mend related products, articles or documents. More books           However, this system lacks of a connection to a search engine
of an author or different movies of a genre are the classical      that allows to query large datasets.
applications, mainly used in recommender systems based on             All presented systems use manually established vocabular-
collaborative filtering methods. In order to enhance the vari-     ies with a limited number of classes. None of them utilize
ety in book recommendations, Ziegler et al. [25] enrich user       a generic user profile to store the preferences in a seman-
profiles with taxonomical super-topics. The recommendation         tic format (RDF/XML or OWL). The FOAF (Friend Of A
list generated by this extended profile is merged with a rank      Friend) project3 provides a vocabulary for describing and
in reverse order, called dissimilarity rank. Depending on a        connecting people, e.g., demographic information (name, ad-
certain diversification factor, this merging process supports      dress, age) or interests. As one of the first, in 2006 Celma [2]
more or less diverse recommendations. Larger diversification       leveraged FOAF in his music recommender system to store
factors lead to more diverse products beyond user interests.       users’ preferences. Our approach goes beyond the FOAF
Zhang and Hurley [24] favor another mathematical solution          interests, by incorporating another generic user model vo-
and describe the balance between diversity and similarity as       cabulary, the Intelleo User Modelling Ontology (IUMO).4
a constrained optimization problem. They compute a dis-            Besides user interests, IUMO offers elements to store learning
similarity matrix according to applied criterias, e.g., movie      goals, competences and recommendation preferences. This
genres, and assign a matching function to find a subset of         allows to adapt the results to a user’s previous knowledge or
products that are diverse as well as similar. One hybrid           to recommend only documents for a specific task.
approach by van Setten [21] combines the results of several
conventional algorithms, e.g., collaborative and case-based,
to improve movie recommendations. Mainly focused on news           3.    DESIGN AND IMPLEMENTATION
or social media, approaches using content-based filtering             In this section, we describe the architecture and some
methods try to present different viewpoints on an event to         implementation details of our semantic-based recommender
decrease the media bias in news portals [18, 3] or to facilitate   system (Figure 1). The user model component, described in
the filtering of comments [6, 23].                                 Section 3.1, contains all user information. The source files,
   Apart from Ziegler et al., none of the presented approaches     described in Section 3.2, are analyzed with GATE [5], as de-
have considered semantic technologies. However, utilizing          scribed in Section 3.3. Additionally, GATE is connected with
ontologies and storing user or document profiles in triple         a terminology server (Section 3.2) to annotate documents
stores represents a large potential for diversity research in      with concepts from the provided biodiversity vocabularies.
recommender systems. Frasincar et al. [7] define semanti-          In Section 3.4, we explain how the annotated documents are
cally enhanced recommenders as systems with an underly-            indexed with GATE Mı́mir [4]. The final recommendation list
ing knowledge base. This can either be linguistic-based [8],       is generated in the recommender component (Section 3.5).
where only linguistic relations (e.g., synonymy, hypernomy,
meronymy, antonymy) are considered, or ontology-based. In          3.1    User profile
the latter case, the content and the user profile are repre-          The user interests are stored in an RDF/XML format uti-
sented with concepts of an ontology. This has the advantage        lizing the FOAF vocabulary for general user information. In
1                                                                  3
 GBIF, http://www.gbif.org                                          FOAF, http://xmlns.com/foaf/spec/
2                                                                  4
 Catalog of Life, http://www.catalogueoflife.org/col/               IUMO, http://intelleo.eu/ontologies/user-model/
search/all/                                                        spec/
                       Figure 1: The architecture of our semantic content recommender system


order to improve the recommendations regarding a user’s               existing vocabularies. Furthermore, biodiversity is an inter-
previous knowledge and to distinguish between learning goals,         disciplinary field, where the results from several sources have
interests and recommendation preferences, we incorporate              to be linked to gain new knowledge. A recommender system
the Intelleo User Modelling Ontology for an extended profile          for this domain needs to support scientists by improving this
description. Recommendation preferences will contain set-             linking process and helping them finding relevant content in
tings in respect of visualization, e.g., highlighting of interests,   an acceptable time.
and recommender control options, e.g., keyword-search or                 Researchers in the biodiversity domain are advised to store
more diverse results. Another adjustment will adapt the               their datasets together with metadata, describing informa-
result set according to a user’s previous knowledge. In order         tion about their collected data. A very common metadata
to enhance the comprehensibility for a beginner, the system           format is ABCD.7 This XML-based standard provides ele-
could provide synonyms; and for an expert the recommender             ments for general information (e.g., author, title, address),
could include more specific documents.                                as well as additional biodiversity related metadata, like infor-
   The interests are stored in form of links to LOD resources.        mation about taxonomy, scientific name, units or gathering.
For instance, in our example profile in Listing 1, a user is          Very often, each taxon needs specific ABCD fields, e.g., fossil
interested in “biotic mesoscopic physical object”, which is a         datasets include data about the geological era. Therefore,
concept from the ENVO5 ontology. Note that the interest               several additional ABCD-related metadata standards have
entry in the RDF file does not contain the textual description,       emerged (e.g., ABCDEFG8 , ABCDDNA9 ). One document
but the link to the concept in the ontology, i.e., http://purl.       may contain the metadata of one or more species observations
obolibrary.org/obo/ENVO_01000009. Currently, we only                  in a textual description. This provides for annotation and
support explicit user modelling. Thus, the user information           indexing for a semantic search. For our prototype, we use the
has to be added manually to the RDF/XML file. Later, we               ABCDEFG metadata files provided by the GFBio10 project;
intend to develop a user profiling component, which gathers           specifically, metadata files from the Museum für Naturkunde
a user’s interests automatically. The profile is accessible via       (MfN).11 An example for an ABCDEFG metadata file is
an Apache Fuseki6 server.                                             presented in Listing 2, containing the core ABCD structure
                                                                      as well as additional information about the geological era.
Listing 1: User profile with interests stored as                      The terminology server supplied by the GFBio project of-
Linked Open Data URIs                                                 fers access to several biodiversity vocabularies, e.g., ENVO,
                                                                      BEFDATA, TDWGREGION. It also provides a SPARQL
<rdf:Description rdf:about="http://www.semanticsoftware.info/person   endpoint12 for querying the ontologies.
      /felicitasloeffler">
<rdf:type rdf:resource="http://xmlns.com/foaf/0.1/Person"/>
<foaf:firstName>Felicitas</foaf:firstName>
                                                                      3.3    Semantic annotation
<foaf:lastName>Loeffler</foaf:lastName>                                  The source documents are analyzed and annotated accord-
<foaf:name>Felicitas Loeffler</foaf:name>                             ing to the vocabularies provided by the terminology server.
<foaf:gender>Female</foaf:gender>
<foaf:workplaceHomepage rdf:resource="http://dbpedia.org/page/        For this process, we use GATE, an open source framework
      University_of_Jena"/>                                           that offers several standard language engineering components
<foaf:organization>Friedrich Schiller University Jena                 [5]. We developed a custom GATE pipeline (Figure 2) that
</foaf:organization>
<foaf:mbox>felicitas.loeffler@uni−jena.de</foaf:mbox>                 analyzes the documents: First, the documents are split into
<um:TopicPreference rdf:resource="http://purl.obolibrary.org/obo/     tokens and sentences, using the existing NLP components
      ENVO_01000009"/>                                                included in the GATE distribution. Afterwards, an ‘Anno-
</rdf:Description>
                                                                      tation Set Transfer’ processing resource adds the original
                                                                      7
3.2    Source files and terminology server                               ABCD, http://www.tdwg.org/standards/115/
                                                                      8
                                                                         ABCDEFG, http://www.geocase.eu/efg
  The content provided by our recommender comes from the               9
                                                                         ABCDDNA, http://www.tdwg.org/standards/640/
biodiversity domain. This research area offers a wide range of        10
                                                                         GFBio, http://www.gfbio.org
5                                                                     11
 ENVO, http://purl.obolibrary.org/obo/envo.owl                           MfN, http://www.naturkundemuseum-berlin.de/
6                                                                     12
 Apache Fuseki, http://jena.apache.org/documentation/                    GFBio terminology server, http://terminologies.gfbio.
serving_data/                                                          org/sparql/
                    Figure 2: The GFBio pipeline in GATE presenting the GFBio annotations


markups of the ABCDEFG files to the annotation set, e.g.,          the user in steering the recommendation process actively.
abcd:HigherTaxon. The following ontology-aware ‘Large KB           The recommender component is still under development and
Gazetteer’ is connected to the terminology server. For each        has not been added to the implementation yet.
document, all occurring ontology classes are added as specific
“gfbioAnnot” annotations that have both instance (link to
                                                                   Listing 2: Excerpt from a biodiversity metadata file
the concrete source document) and class URI. At the end, a
                                                                   in ABCDEFG format [20]
‘GATE Mı́mir Processing Resource’ submits the annotated
documents to the semantic search engine.                           <abcd:DataSets xmlns:abcd="http://www.tdwg.org/schemas/abcd/2.06"
                                                                          xmlns:efg="http://www.synthesys.info/ABCDEFG/1.0">
                                                                   <abcd:DataSet>
3.4      Semantic indexing                                         <abcd:Metadata>
   For semantic indexing, we are using GATE Mı́mir:13 “Mı́mir      <abcd:Description><abcd:Representation language="en">
                                                                   <abcd:Title>MfN − Fossil invertebrates</abcd:Title>
is a multi-paradigm information management index and               <abcd:Details>Gastropods, bivalves, brachiopods, sponges</abcd:Details>
repository which can be used to index and search over text,              </abcd:Representation></abcd:Description>
annotations, semantic schemas (ontologies), and semantic           <abcd:Scope><abcd:TaxonomicTerms>
metadata (instance data)” [4]. Besides ordinary keyword-           <abcd:TaxonomicTerm>Gastropods, Bivalves, Brachiopods, Sponges</
                                                                         abcd:TaxonomicTerm>
based search, Mı́mir incorporates the previously generated         </abcd:TaxonomicTerms></abcd:Scope>
semantic annotations from GATE to the index. Addition-             </abcd:Metadata>
ally, it can be connected to the terminology server, allowing      <abcd:Units><abcd:Unit>
                                                                   <abcd:SourceInstitutionID>MfN</abcd:SourceInstitutionID>
queries over the ontologies. All index relevant annotations        <abcd:SourceID>MfN − Fossil invertebrates Ia</abcd:SourceID>
and the connection to the terminology server are specified in      <abcd:UnitID>MB.Ga.3895</abcd:UnitID>
an index template.                                                 <abcd:Identifications><abcd:Identification>
                                                                   <abcd:Result><abcd:TaxonIdentified>
                                                                   <abcd:HigherTaxa><abcd:HigherTaxon>
3.5      Content recommender                                       <abcd:HigherTaxonName>Euomphaloidea</abcd:HigherTaxonName>
                                                                   <abcd:HigherTaxonRank>Family</abcd:HigherTaxonRank>
  The Java-based content recommender sends a SPARQL                </abcd:HigherTaxon></abcd:HigherTaxa>
query to the Fuseki Server and obtains the interests and           <abcd:ScientificName>
preferred recommendation techniques from the user profile          <abcd:FullScientificNameString>Euomphalus sp.</
as a list of (LOD) URIs. This list is utilized for a second              abcd:FullScientificNameString>
                                                                   </abcd:ScientificName>
SPARQL query to the Mı́mir server. Presently, this query           </abcd:TaxonIdentified></abcd:Result>
asks only for child nodes (Figure 3). The result set contains      </abcd:Identification></abcd:Identifications>
ABCDEFG metadata files related to a user’s interests. We           <abcd:UnitExtension>
                                                                   <efg:EarthScienceSpecimen><efg:UnitStratigraphicDetermination>
intend to experiment with further semantic relations in the        <efg:ChronostratigraphicAttributions>
future, e.g., object properties. Assuming that a specific fossil   <efg:ChronostratigraphicAttribution>
used to live in rocks, it might be interesting to know if other    <efg:ChronoStratigraphicDivision>System</
                                                                           efg:ChronoStratigraphicDivision>
species, living in this geological era, occured in rocks. An-      <efg:ChronostratigraphicName>Triassic</efg:ChronostratigraphicName>
other filtering method would be to use parent or grandparent       </efg:ChronostratigraphicAttribution></
nodes from the vocabularies to broaden the search. We will               efg:ChronostratigraphicAttributions>
                                                                   </efg:UnitStratigraphicDetermination></efg:EarthScienceSpecimen>
provide control options and feedback mechanisms to support         </abcd:UnitExtension>
                                                                   </abcd:Unit></abcd:Units></abcd:DataSet></abcd:DataSets>
13
     GATE Mı́mir, https://gate.ac.uk/mimir/
Figure 3: A search for “biotic mesoscopic physical object” returning documents about fossils (child concept)


4.    APPLICATION
  The semantic content recommender system allows the
recommendation of more specific and diverse ABCDEFG
metadata files with respect to the stored user interests. List-
ing 3 shows the query to obtain the interests from a user
profile, introduced in Listing 1. The result contains a list of
(LOD) URIs to concepts in an ontology.
                                                                         Figure 4: An excerpt from the ENVO ontology

Listing 3: SPARQL query to retrieve user interests
                                                                    5.     CONCLUSIONS
SELECT ?label ?interest ?syn
WHERE                                                                  We introduced our new semantically enhanced content
{                                                                   recommender system for the biodiversity domain. Its main
    ?s foaf:firstName "Felicitas" .                                 benefit lays in the connection to a search engine supporting
    ?s um:TopicPreference ?interest .
    ?interest rdfs:label ?label .                                   integrated textual, linguistic and ontological queries. We are
    ?interest oboInOwl:hasRelatedSynonym ?syn                       using existing vocabularies from the terminology server of the
}                                                                   GFBio project. The recommendation list contains not only
                                                                    classical keyword-based results, but documents including
   In this example, the user would like to obtain biodiversity      semantically related concepts.
datasets about a “biotic mesoscopic physical object”, which            In future work, we intend to integrate semantic-based rec-
is the textual description of http://purl.obolibrary.org/           ommender algorithms to obtain further diverse results and to
obo/ENVO_01000009. This technical term might be incom-              support the interdisciplinary linking process in biodiversity
prehensible for a beginner, e.g., a student, who would prefer       research. We will set up an experiment to evaluate the algo-
a description like “organic material feature”. Thus, for a          rithms in large datasets with the established classification
later adjustment of the result according to a user’s previous       metrics Precision and Recall [14]. Additionally, we would
knowledge, the system additionally returns synonyms.                like to extend the recommender component with control op-
   The returned interest (LOD) URI is utilized for a second         tions for the user [1]. Integrated into a portal, the result
query to the search engine (Figure 3). The connection to the        list should be adapted according to a user’s recommendation
terminology server allows Mı́mir to search within the ENVO          settings or adjusted to previous knowledge. These control
ontology (Figure 4) and to include related child concepts           functions allow the user to actively steer the recommenda-
as well as their children and individuals. Since there is no        tion process. We are planning to utilize the new layered
metadata file containing the exact term “biotic mesoscopic          evaluation approach for interactive adaptive systems from
physical object”, a simple keyword-based search would fail.         Paramythis, Weibelzahl and Masthoff [16]. Since adaptive
However, Mı́mir can retrieve more specific information than         systems present different results to each user, ordinary eval-
stored in the user profile and is returning biodiversity meta-      uation metrics are not appropriate. Thus, accuracy, validity,
data files about “fossil”. That ontology class is a child node of   usability, scrutability and transparency will be assessed in
“biotic mesoscopic physical object” and represents a semantic       several layers, e.g., the collection of input data and their
relation. Due to a high similarity regarding the content of         interpretation or the decision upon the adaptation strategy.
the metadata files, the result set in Figure 3 contains only        This should lead to an improved consideration of adaptivity
documents which closely resemble each other.                        in the evaluation process.
6.      ACKNOWLEDGMENTS                                              P. B. Kantor, editors, Recommender Systems Handbook,
  This work was supported by DAAD (German Academic                   pages 73–105. Springer, 2011.
Exchange Service)14 through the PPP Canada program and          [12] M. Loreau. Excellence in ecology. International Ecology
by DFG (German Research Foundation)15 within the GFBio               Institute, Oldendorf, Germany, 2010.
project.                                                        [13] V. Maidel, P. Shoval, B. Shapira, and
                                                                     M. Taieb-Maimon. Ontological content-based filtering
7.      REFERENCES                                                   for personalised newspapers: A method and its
                                                                     evaluation. Online Information Review, 34 Issue
 [1] F. Bakalov, M.-J. Meurs, B. König-Ries, B. Sateli,             5:729–756, 2010.
     R. Witte, G. Butler, and A. Tsang. An approach to          [14] C. D. Manning, P. Raghavan, and H. Schütze.
     controlling user models and personalization effects in          Introduction to Information Retrieval. Cambridge
     recommender systems. In Proceedings of the 2013                 University Press, 2008.
     international conference on Intelligent User Interfaces,
                                                                [15] S. E. Middleton, N. R. Shadbolt, and D. C. D. Roure.
     IUI ’13, pages 49–56, New York, NY, USA, 2013. ACM.
                                                                     Ontological user profiling in recommender systems.
 [2] Ò. Celma. FOAFing the music: Bridging the semantic             ACM Trans. Inf. Syst., 22(1):54–88, Jan. 2004.
     gap in music recommendation. In Proceedings of 5th
                                                                [16] A. Paramythis, S. Weibelzahl, and J. Masthoff. Layered
     International Semantic Web Conference, pages 927–934,
                                                                     evaluation of interactive adaptive systems: Framework
     Athens, GA, USA, 2006.
                                                                     and formative methods. User Modeling and
 [3] S. Chhabra and P. Resnick. Cubethat: News article               User-Adapted Interaction, 20(5):383–453, Dec. 2010.
     recommender. In Proceedings of the sixth ACM
                                                                [17] E. Pariser. The Filter Bubble - What the internet is
     conference on Recommender systems, RecSys ’12, pages
                                                                     hiding from you. Viking, 2011.
     295–296, New York, NY, USA, 2012. ACM.
                                                                [18] S. Park, S. Kang, S. Chung, and J. Song. Newscube:
 [4] H. Cunningham, V. Tablan, I. Roberts, M. Greenwood,
                                                                     delivering multiple aspects of news to mitigate media
     and N. Aswani. Information extraction and semantic
                                                                     bias. In Proceedings of the SIGCHI Conference on
     annotation for multi-paradigm information
                                                                     Human Factors in Computing Systems, CHI ’09, pages
     management. In M. Lupu, K. Mayer, J. Tait, and A. J.
                                                                     443–452, New York, NY, USA, 2009. ACM.
     Trippe, editors, Current Challenges in Patent
                                                                [19] G. Salton and C. Buckley. Term-weighting approaches
     Information Retrieval, volume 29 of The Information
                                                                     in automatic text retrieval. Information Processing and
     Retrieval Series, pages 307–327. Springer Berlin
                                                                     Management, 24:513–523, 1988.
     Heidelberg, 2011.
                                                                [20] Museum für Naturkunde Berlin. Fossil invertebrates,
 [5] H. Cunningham et al. Text Processing with GATE
                                                                     UnitID:MB.Ga.3895.
     (Version 6). University of Sheffield, Dept. of Computer
                                                                     http://coll.mfn-berlin.de/u/MB_Ga_3895.html.
     Science, 2011.
                                                                [21] M. van Setten. Supporting people in finding
 [6] S. Faridani, E. Bitton, K. Ryokai, and K. Goldberg.
                                                                     information: hybrid recommender systems and
     Opinion space: A scalable tool for browsing online
                                                                     goal-based structuring. PhD thesis, Telematica Instituut,
     comments. In Proceedings of the SIGCHI Conference
                                                                     University of Twente, The Netherlands, 2005.
     on Human Factors in Computing Systems, CHI ’10,
     pages 1175–1184, New York, NY, USA, 2010. ACM.             [22] R. Walls, J. Deck, R. Guralnick, S. Baskauf,
                                                                     R. Beaman, and et al. Semantics in Support of
 [7] F. Frasincar, W. IJntema, F. Goossen, and
                                                                     Biodiversity Knowledge Discovery: An Introduction to
     F. Hogenboom. A semantic approach for news
                                                                     the Biological Collections Ontology and Related
     recommendation. Business Intelligence Applications
                                                                     Ontologies. PLoS ONE 9(3): e89606, 2014.
     and the Web: Models, Systems and Technologies, IGI
     Global, pages 102–121, 2011.                               [23] D. Wong, S. Faridani, E. Bitton, B. Hartmann, and
                                                                     K. Goldberg. The diversity donut: enabling participant
 [8] F. Getahun, J. Tekli, R. Chbeir, M. Viviani, and
                                                                     control over the diversity of recommended responses. In
     K. Yétongnon. Relating RSS News/Items. In
                                                                     CHI ’11 Extended Abstracts on Human Factors in
     M. Gaedke, M. Grossniklaus, and O. Dı́az, editors,
                                                                     Computing Systems, CHI EA ’11, pages 1471–1476,
     ICWE, volume 5648 of Lecture Notes in Computer
                                                                     New York, NY, USA, 2011. ACM.
     Science, pages 442–452. Springer, 2009.
                                                                [24] M. Zhang and N. Hurley. Avoiding monotony:
 [9] T. Health and C. Bizer. Linked Data: Evolving the Web
                                                                     Improving the diversity of recommendation lists. In
     into a Global Data Space. Synthesis Lectures on the
                                                                     Proceedings of the 2008 ACM Conference on
     Semantic Web: Theory and Technology. Morgan &
                                                                     Recommender Systems, RecSys ’08, pages 123–130, New
     Claypool, 2011.
                                                                     York, NY, USA, 2008. ACM.
[10] W. IJntema, F. Goossen, F. Frasincar, and
                                                                [25] C.-N. Ziegler, G. Lausen, and L. Schmidt-Thieme.
     F. Hogenboom. Ontology-based news recommendation.
                                                                     Taxonomy-driven computation of product
     In Proceedings of the 2010 EDBT/ICDT Workshops,
                                                                     recommendations. In Proceedings of the Thirteenth
     EDBT ’10, pages 16:1–16:6, New York, NY, USA, 2010.
                                                                     ACM International Conference on Information and
     ACM.
                                                                     Knowledge Management, CIKM ’04, pages 406–415,
[11] P. Lops, M. de Gemmis, and G. Semeraro.
                                                                     New York, NY, USA, 2004. ACM.
     Content-based recommender systems: State of the art
     and trends. In F. Ricci, L. Rokach, B. Shapira, and
14
     DAAD, https://www.daad.de/de/
15
     DFG, http://www.dfg.de