<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Cite4Me: A Semantic Search and Retrieval Web Application for Scientific Publications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Bernardo Pereira Nunes</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Besnik Fetahu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefan Dietze</string-name>
          <email>dietzeg@L3S.de</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco A. Casanova</string-name>
          <email>casanovag@inf.puc-rio.br</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Informatics, Pontifical Catholic University of Rio de Janeiro, Rio de Janeiro/RJ - Brazil, CEP 22451-900</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>L3S Research Center, Leibniz University Hannover</institution>
          ,
          <addr-line>Appelstr. 9a, 30167 Hannover</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Cite4Me is a Web application that leverages Semantic Web technologies to provide a new perspective on search and retrieval of bibliographical data. The Web application presented in this work focuses on: (i) semantic recommendation of papers; (ii) novel semantic search &amp; retrieval of papers; (iii) data interlinking of bibliographical data with related data sources from LOD; (iv) innovative user interface design; and (v) sentiment analysis of extracted paper citations. Finally, as this work also targets some educational aspects, our application provides an in-depth analysis of the data that guides a user on his research field.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>The huge amount of Web data and resources, particularly in the academic area, calls for
strategies to analyse and explore resources and data.</p>
      <p>While scientific disciplines are very data- and knowledge-intensive, the lack of
semantic tools hampers information management and decision making. This includes
scientific data as well as unstructured academic publications as one of the key outcome
of scientific work. This is due to information access o ered by digital library providers
such as ACM Digital Library1 and Elsevier2 being mostly based on free text search and
hierarchical classification3.</p>
      <p>
        Thus, we present a novel Web application for exploratory search, retrieval and
visualization of scientific publications. Cite4Me aims at providing a single access point for
accessing papers and, therefore, assisting searchers on finding relevant topics, papers,
and unveiling new nomenclature more e ciently. For this, we use reference datasets
such as DBpedia4 to explore semantic relationships between scientific papers and user
queries. We also perform a topic coverage analysis to provide an overview of di erent
bibliographic datasets. Cite4Me is a Web application which exploits results of previous
research works [
        <xref ref-type="bibr" rid="ref2 ref3 ref4 ref5">2–5</xref>
        ].
      </p>
      <sec id="sec-1-1">
        <title>1 http://dl.acm.org</title>
      </sec>
      <sec id="sec-1-2">
        <title>2 http://www.elsevier.com</title>
      </sec>
      <sec id="sec-1-3">
        <title>3 http://www.acm.org/about/class/</title>
      </sec>
      <sec id="sec-1-4">
        <title>4 http://dbpedia.org</title>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>Cite4Me - The Application</title>
      <p>Cite4Me implements semantic and co-occurrence-based methods to search and retrieve
academic papers and suggest related work in a user-friendly interface that assists users
in exploring relationships between authors, institutions, papers and query terms. Due to
space restrictions, we present in this paper the most relevant features of Cite4Me to the
Semantic Web field.
2.1</p>
      <p>Search and Retrieval
Cite4Me implements standard techniques, such as free text search, to search and
retrieve scientific publications. In this section, we emphasize the semantic and exploratory
search mechanisms.</p>
      <p>Exploratory Search. The exploratory search or graph search component assists users
to discover related work, people and institutions that are working on a specific topic.
A crucial step to provide this type of search is the annotation of the publications’
content. For this, we used DBpedia Spotlight API5 for extracting entities, entity types and
their categories. For instance, the categories of the extracted concepts are used to
interlink publications through the topics they cover. In cases where two publications share
the same category (dcterms:subject property), then a link between both publications is
created. Figure 1 shows an example of topically related publications.</p>
      <p>
        Semantic Search. The semantic search component of Cite4Me is similar to the explicit
semantic analysis (ESA) technique [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. After running the annotation process
aforementioned, the relatedness score between the enriched concepts (DBpedia entities) found
      </p>
      <sec id="sec-2-1">
        <title>5 http://dbpedia.org/spotlight</title>
        <p>in the user query terms and the publications’ content are computed and ranked. The
relatedness score is computed based on the tf-idf score for the entities found in the
publications’ content. The ranking of the retrieved documents is based on the sum of the
tf-idf scores of the matching concepts.</p>
        <p>Figure 2 illustrates the semantic search functionality. Alongside the results of the
semantic search a tag cloud shows the most prominent terms for a given user query.
The tag cloud is updated while browsing through the list of results. The tags are selected
based on the tf-idf score for the entities found in the abstract of the retrieved papers.</p>
        <p>Paper recommendation. Another important feature of Cite4Me and which di
erentiates it from similar tools is the semantic paper recommendation. Given a scientific
publication, the tool recommends a related paper based on a score calculated according
to direct and lateral relationships between the publication of interest and the remaining
papers in our corpus.</p>
        <p>
          To compute the relatedness score, we rely on previous work by Nunes et al. [
          <xref ref-type="bibr" rid="ref2 ref4">2, 4</xref>
          ],
where the paths connecting two enriched concepts in the scientific publications are
analysed using a variation of the Katz index, a measure based on Social Network Theory,
and quantifying the weight of the connectivity between two concepts given a knowledge
graph (in our case DBpedia graph).
        </p>
        <p>
          After computing the relatedness scored between enriched concepts, the paper
recommendation relies on an aggregated measure that takes into account the relatedness
inter-documents. Finally, we generate a ranked list of pairwise publications according
to the overall score (see [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] for more details). Thus, the top-ranked publication is
recommended to the user, as shown in Figure 3.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Datasets</title>
      <p>Currently, Cite4Me is linked to a dataset (LAK Dataset6) which contains
semistructured research publications from the ACM Digital Library (under a special license)</p>
      <sec id="sec-3-1">
        <title>6 http://www.solaresearch.org/resources/lak-dataset/</title>
        <p>
          and other public datasets (see also [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] for details). The dataset contains 315 full papers
along with their descriptive metadata while new publications are added continuously.
Metadata as well as the full text body are freely available in a variety of formats,
including RDF accessible via a public SPARQL endpoint. We are currently working on
expanding the number of papers available in Cite4Me. However, due to copyright reasons,
the process to expose scientific publications from publishers is still under discussion.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusion</title>
      <p>This paper presented the application of previous works in the Semantic Web field within
Cite4Me, a Web application that assists users in finding relevant scientific papers by
exploring semantic relationships between them. For more information about the Cite4Me
Web application please refer to http://www.cite4me.com.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <given-names>E.</given-names>
            <surname>Gabrilovich</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Markovitch</surname>
          </string-name>
          .
          <article-title>Computing semantic relatedness using wikipedia-based explicit semantic analysis</article-title>
          .
          <source>In Proceedings of IJCAI'07</source>
          , pages
          <fpage>1606</fpage>
          -
          <lpage>1611</lpage>
          , San Francisco, CA, USA,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>B.</given-names>
            <surname>Pereira Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>Combining a co-occurrence-based and a semantic measure for entity linking</article-title>
          .
          <source>In ESWC</source>
          ,
          <year>2013</year>
          (to appear).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>B.</given-names>
            <surname>Pereira Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          . Cite4me:
          <article-title>Semantic retrieval and analysis of scientific publications</article-title>
          . In M. d'Aquin,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Drachsler</surname>
          </string-name>
          , E. Herder, and D. Taibi, editors,
          <source>LAK (Data Challenge)</source>
          , volume
          <volume>974</volume>
          <source>of CEUR Workshop Proceedings. CEUR-WS.org</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>B.</given-names>
            <surname>Pereira</surname>
          </string-name>
          <string-name>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Taibi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>Can entities be friends? In Proceedings of WOLE, in conjuction with the ISWC'12</article-title>
          , volume
          <volume>906</volume>
          <source>of CEURWS.org</source>
          , pages
          <fpage>45</fpage>
          -
          <lpage>57</lpage>
          , Nov.
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B.</given-names>
            <surname>Pereira</surname>
          </string-name>
          <string-name>
            <surname>Nunes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Kawase</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Fetahu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Casanova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Maynard</surname>
          </string-name>
          .
          <article-title>Interlinking documents based on semantic graphs</article-title>
          .
          <source>In Proceedings of KES'13</source>
          ,
          <year>2013</year>
          (to appear).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>D.</given-names>
            <surname>Taibi</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Dietze</surname>
          </string-name>
          .
          <article-title>Fostering analytics on learning analytics research: the lak dataset</article-title>
          .
          <source>In Proceedings of the LAK Data Challenge, held at LAK2013</source>
          ,
          <year>April 2013</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>