<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Linked Scientometrics: Designing Interactive Scientometrics with Linked Data and Semantic Web Reasoning</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Grant McKenzie</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Krzysztof Janowicz</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yingjie Hu</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Kunal Sengupta</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Hitzler</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Santa Barbara, CA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Wright State University</institution>
          ,
          <addr-line>Dayton, OH</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this demo paper we introduce a Linked Data-driven, Semantically-enabled Journal Portal (SEJP) that o ers a variety of interactive scientometrics modules. SEJP allows editors, reviewers, authors, and readers to explore and analyze (meta)data published by a journal. Besides Linked Data created from the journal's internal data, SEJP also links out to other sources and includes them to develop more powerful modules. These modules range from simple descriptive statistics, over the spatial analysis of visitors and authors, to topic trending modules. While SEJP will be available for multiple journals, this paper shows its deployment to the Semantic Web journal by IOS Press. Due to its open &amp; transparent review process, SWJ o ers a wide variety of additional information, e.g., about reviewers, editors, paper decisions, and so forth.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>3 The current SEJP version can be used at http://sejp.geog.ucsb.edu/SWJPortal</p>
    </sec>
    <sec id="sec-2">
      <title>The Linked Data Portal for SWJ</title>
      <sec id="sec-2-1">
        <title>Structuring and Publishing Data</title>
        <p>
          SWJ employs a highly customized version of the popular Drupal content
management system (CMS)[
          <xref ref-type="bibr" rid="ref1">1</xref>
          ]. All submissions, reviews, noti cations, and feedbacks
are contributed through the CMS, storing content in a relational database
management system. The rst step in developing the portal was to export all data
from the database, and convert it to the Resource Description Framework (RDF)
format, making use of the bibliographic ontology BIBO [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ]. While most of the
data accessed from SWJ can be modeled by BIBO, the ontology was extended to
include aspects such as the versioning of articles (AcademicArticleVersion). Once
the data was organized and relationships were de ned (e.g., Article hasAuthor ),
a custom Java converter was constructed using the OWL API and published
online via Apache Jena's SPARQL server Fuseki ; see [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] for details.
2.2
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>User Interface</title>
        <p>Once the back-end data was organized, structured, and published to the Web, a
modular user interface was developed to allow visual analysis of the SWJ data. A
modular approach was taken with a plug and play mentality, allowing the analysis
modules to be separated and con gured based on the particular requirements
of the applications. Built through HTML5, CSS, JavaScript, D3, and ExtJS,
the front-end interface for the application is light-weight and compatible with
any modern W3-compatible browser. Given the separation between the back-end
data and the front-end analysis modules, SEJP is able to integrate data from
other SPARQL endpoints and APIs; in our case the Semantic Web Dog Food
portal and Microsoft Academic Search.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Modules</title>
      <p>A variety of scientometric modules were developed for analyzing the SWJ data.
Visual-analytic tools range from pie charts showing paper submission types to
Cartograms of website visitors to edge-node graphs showing links between
collaborating authors. Two of the more unique trend modules are discussed here.
3.1</p>
      <sec id="sec-3-1">
        <title>Research Topic Trends</title>
        <p>
          This module shows how the research topics contained in the SWJ trend over
time. In order to construct this module, a topic modeling approach was taken
to extract latent topics in papers submitted to the SWJ. First, the text for all
original submissions between March 2010 and April 2013 were accessed, cleaned
of standard English stop-words and non-alpha numeric characters and stemmed4.
The submissions were then grouped by time periods (3 month are considered as
4 Using the Snowball stemmer - http://snowball.tartarus.org
one period), combining text from all articles within this period in to one single
document. This produced a total of 13 documents. Latent Dirichlet allocation
(LDA) was then applied to the documents with the purpose of extracting a
set number of latent topics. LDA is an unsupervised, generative probabilistic
model used to infer latent topics in a textual corpus [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ]. In this case, LDA is
applied across the set of 13 documents and topics are discovered, represented as a
multinomial distribution over words. Based on the co-occurrence of words in the
corpus and a numerical value for the resulting topics, LDA produces probability
values for each word in each topic and for each topic in each document. The
LDA model was tested with 50, 20 and 10 topics, and 20 topics produce the
most human comprehensible results for this module. An example of one of these
topics is shown in Figure 1a with font-size indicating relative probability of the
word existing in the topic.
        </p>
        <p>(a) Word cloud showing (b) Research topic trending module showing how
an example topic. topics change over time</p>
        <p>The topics are then displayed through the user interface via an interactive
line graph constructed with the JavaScript D3 charting library (Figure 1b). LDA
de nes each document (publications grouped by time period) as a distribution
over all topics with the total probability across all topics summing to 1.
Visually this is represented with time period shown on the X-axis and probabilities
for each topic (multiplied by 100) shown on the Y-axis. Initially the 20 topics
are color coded and shown on the chart with the option to show or hide each
topic through a click-able legend on the right. Hovering one's mouse over a line
produces a pop-up bubble that informs the user of the topic strength as well as
the top ten words most probable to that topic.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Author-paper-keyword Hive Chart</title>
        <p>The author-paper-keyword hive chart module (Figure 2) is a unique interactive
visualization showing the relationship between authors, papers, and keywords.
This module has the capability to help users to discover the possibly hidden
relations among the three distinct types of data.</p>
        <p>Hovering over a node on the authors axis (orange) produces a one-to-many
relationship on the keyword axis (green) showing all the keywords mentioned by
a speci c author. Additionally, there is a one-to-many relationship between the
selected author and the papers (blue axis) that he or she has contributed to the
SWJ. The same relationships are true of selecting any node in either the paper
or the keywords axis, allowing for exploration of the data from any node. This
module allows users to nd authors who are concerned with similar research
topics, and can also help visually discover all the coauthors of a researcher.
Editors can use the module to nd suitable reviewers.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>This demo paper presents a Linked Data-driven, semantically-enabled journal
portal for scientometrics and deploys it to the Semantic Web journal. SEJP uses
a journal's internal data and also connects to other (Linked Data) sources to
includes them in the analysis. Two scientometric analysis modules were discussed
in the prior sections focusing on the changing of topics over time as well as the
relations between authors, papers and keywords. These modules, however, are
only a small subset of a suite of interactive modules developed for the portal.
As development continues to progress, new modules and tools will be added,
further advancing the portal's capability for scientometrics. In the near future,
SEJP will be deployed to other journals as well.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sengupta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>The new manuscript review system for the semantic web journal</article-title>
          .
          <source>Semantic Web</source>
          <volume>4</volume>
          (
          <issue>2</issue>
          ) (
          <year>2013</year>
          )
          <volume>117</volume>
          {
          <fpage>117</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>D</given-names>
            <surname>'Arcus</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Giasson</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Bibliographic ontology speci cation</article-title>
          . Online: http://bibliontology.com/speci cation (
          <year>November 2009</year>
          )
          <article-title>Last accessed 2013-5-12.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Hu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Janowicz</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McKenzie</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sengupta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.:</given-names>
          </string-name>
          <article-title>A linked data-driven semantically-enabled journal portal for scientometrics</article-title>
          .
          <source>International Semantic Web Conference (October</source>
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Blei</surname>
            ,
            <given-names>D.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ng</surname>
            ,
            <given-names>A.Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jordan</surname>
            ,
            <given-names>M.I.</given-names>
          </string-name>
          :
          <article-title>Latent dirichlet allocation</article-title>
          .
          <source>the Journal of machine Learning research 3</source>
          (
          <year>2003</year>
          )
          <volume>993</volume>
          {
          <fpage>1022</fpage>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>