<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>LinkedCJ: A Knowledge Base of Chinese Academic Journals Based on Linked Data</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Peng Xu</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Xin Wang</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Haofen Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer Science and Engineering, East China University of Science and Technology</institution>
          ,
          <addr-line>Shanghai</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>School of Computer Science and Technology, Tianjin University</institution>
          ,
          <addr-line>Tianjin</addr-line>
          ,
          <country country="CN">China</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Nowadays the amount of academic articles published in Chinese is growing rapidly, however, existing methods of managing and querying Chinese academic journals and articles are not semantic-based. Our work consists of creating an ontology for represents and organizing bibliographic information of Chinese academic journals and articles. Moreover, we develop software applications based on Nutch, jsoup, and Drools for transforming millions of Web pages from website of Wanfang into approximately 15 million triples stored in a triple store. Finally, the knowledge base is evaluated using the Semantic Service Platform of Chinese Academic Journals and Articles (SSPCAJA). Results of the functional test show that information of Chinese academic journals and articles is effectively represented on the platform.</p>
      </abstract>
      <kwd-group>
        <kwd>linked data</kwd>
        <kwd>knowledge base</kwd>
        <kwd>Chinese academic journals</kwd>
        <kwd>ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The three major Chinese Web publishers, VIP, Wanfang, and CNKI have embodied
over 38 million, 20 million, and 36 million in articles respectively, as well as 12000,
7000, and 8000 in journals, until July 2013. However, existing methods on managing
and querying Chinese academic journals and articles are not semantic-based.</p>
      <p>
        The Semantic Web, which stores all information in the form of Linked Data[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]
instead of hyperlinked Web pages, focuses on semantic interpretation of the data on the
Web. At present, leading publishers, such as NPG, DBLP, and CrossRef, have been
organizing semantic data from journals on the basis of linked data. But most of the
vocabularies in NPG, DBLP, and CrossRef cannot adapt to Chinese academic
journals because the structure of knowledge organization cannot be reused directly.
However in China, semantic-based knowledge organizations for Chinese academic
journals have not yet been constructed, while the popular alternatives are data centralizing
platforms, such as C-DBLP and Not Old academic search.
      </p>
      <p>In this paper, we seek to organize data from several Chinese academic journals on
the basis of linked data. The main contributions of this paper are:
─ we build the LinkedCJ ontology for representing information of Chinese academic
journals;
─ we develop a method for extracting RDF triples from the Web pages of Wanfang
using a series of tools including Nutch plugins, jsoup, and Drools;
─ we conduct a set of functional tests on the SSPCAJA using LinkedCJ knowledge
base and compare SSPCAJA with linked data platforms of NPG and DBLP to
evaluate the LinkedCJ knowledge base.</p>
      <p>The rest of the paper is organized as follows. Section 2 describes the construction
of the LinkedCJ ontology. Section 3 gives the method for extracting RDF triples from
the crawled Web pages of Wanfang. The functional test and evaluation of the
knowledge base are presented in Section 4. Finally, Section 5 concludes the paper and
gives the future work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>LinkedCJ Ontology</title>
      <p>In order to organize and construct the knowledge base of Chinese academic journals,
it is necessary to create an ontology according to the characteristics of Chinese
academic journals.</p>
      <p>
        LinkedCJ, our new ontology for Chinese academic journals, inherits a list of
classes, object properties and data properties from existing ontologies of semantic
publishing, such as Functional Requirements for Bibliographic Records (FRBR)[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
FRBR-aligned Bibliographic Ontology (FaBiO)[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], Citation Typing Ontology
(CiTO)[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Dublin Core Metadata (DC)[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and The Friend of a Friend (FOAF).
LinkedCJ adds several new classes, object properties and data properties in order to
indicate exclusively information of Chinese academic journals. Fig.1 shows the top
level classes and object properties of LinkedCJ.
      </p>
      <p>fabio:Journal
fabio:CitationMetadata
foaf:Person
dcterms:source
cito:cites
linkedcj:isPerson
rdfs:subClassOf</p>
      <p>linkedcj:Author
frbr:creatorOf
Prefixes
rdfs:
fabio:
cito:
http://www.w3.org/1999/02/22-rdf-syntax-ns#
http://purl.org/spar/fabio/
http://purl.org/spar/cito/
frbr: http://purl.org/vocab/frbr/core#
dcterms: http://purl.org/dc/terms/
linkedcj: http://sw.tju.edu.cn/LinkedCJ/</p>
      <p>LinkedCJ ontology contains 5 classes, 5 object properties, and 35 data properties.
A list of original classes, object properties, and data properties, such as linkedcj:cn,
linkedcj:hasProject, and linkedcj:hasSecondarySubjectTerm, take part in the ontology
for representing items with Chinese characteristics that does not include in existing
ontologies, such as FRBR, FaBiO, CiTO, DC, and FOAF.</p>
      <p>In summary, the vast majority of the information from Chinese academic journals
and articles could be represented by LinkedCJ normatively.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Triple Extraction</title>
      <p>In order to construct the knowledge base, we develop software applications based on
Nutch, jsoup and Drools for transforming HTML pages from website of Wanfang into
triples stored in a triple store. Fig. 2 shows the whole process of triple extraction.</p>
      <p>Wanfang is the data resource of our knowledge base. Triples have been gathered
from the 13 journals of China Computer Federation (CCF), such as the Chinese
Journal of Computers. After approximately 300000 pages fetched in this way, we have
acquired a copy of HTML raw data including approximately 48000 journal articles,
590000 citations and 180000 authors. Finally, an N-triples (.nt) file containing about
15 million triples has been generated.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Evaluation and Comparison</title>
      <p>In order to evaluation the LinkedCJ knowledge base, we have performed several test
cases on the Semantic Service Platform of Chinese Academic Journals and Articles
(SSPCAJA). For the purpose of comparison, two leading linked data service
platforms, NPG and DBLP, have been tested at the same time.</p>
      <p>Table 1 shows the summary of the functional tests. Symbol “○” means the function
or information is provided, while symbol “×” means the function or information is not
provided. Maximum kinds of supported queries, as well as the personal name
disambiguation mechanism, are the main advantages of the LinkedCJ knowledge base.</p>
      <sec id="sec-4-1">
        <title>By title</title>
      </sec>
      <sec id="sec-4-2">
        <title>By author</title>
      </sec>
      <sec id="sec-4-3">
        <title>By journal Table 1. The results of the functional test on query types.</title>
        <p>Query type
SPARQL</p>
        <p>Show start page</p>
        <p>Show end page
Show secondary subject term</p>
        <p>Show project
Show creator‘s working unit</p>
        <p>Search author
Personal name disambiguation</p>
        <p>Search journal
Sorted by volume and number
By keywords</p>
        <p>SSPCAJA
○
×
×
○
○
○
○
○
○
×
○</p>
        <p>NPG
○
○
○
×
×
×
○
×
×
×
×</p>
        <p>DBLP
×
×
×
×
×
×
○
×
○
○
×</p>
        <p>Other test cases have revealed that LinkedCJ is superior to NPG and DBLP in
representing information of Chinese academic journals.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions</title>
      <p>In this paper, we present the design and implementation of LinkedCJ, the knowledge
base of Chinese academic journals. LinkedCJ knowledge base is evaluated using the
SSPCAJA. Results of the test cases show that information of Chinese academic
journals and articles is correctly represented by the knowledge base.</p>
      <p>Acknowledgements. This work is supported by CCF Opening Project of Chinese
Information Processing (Grant No. CCF2013-02-02) and the National Natural Science
Foundation of China (Grant No. 61100049).</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked Data - The Story So Far</article-title>
          .
          <source>International Journal on Semantic Web and Information Systems</source>
          ,
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <fpage>1</fpage>
          -
          <lpage>22</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>IFLA</given-names>
            <surname>Study</surname>
          </string-name>
          <article-title>Group on the Functional Requirements for Bibliographic Records.: Functional Requirements for Bibliographic Records: Final Report</article-title>
          . UBCIM publications,
          <source>München</source>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Peroni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shotton</surname>
            <given-names>D.:</given-names>
          </string-name>
          <article-title>FaBiO and CiTO: Ontologies for describing bibliographic resources and citations</article-title>
          .
          <source>Web Semantics: Science, Services and Agents on the World Wide Web</source>
          .
          <volume>17</volume>
          ,
          <fpage>33</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Shotton</surname>
          </string-name>
          , D.:
          <article-title>CiTO, the Citation Typing Ontology</article-title>
          .
          <source>Biomedical Semantics</source>
          .
          <article-title>1(S-1</article-title>
          )
          <issue>S6</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Kurtz</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Dublin Core, DSpace, and
          <article-title>a brief analysis of three university repositories</article-title>
          .
          <source>Information Technology and Libraries</source>
          .
          <volume>29</volume>
          (
          <issue>1</issue>
          ),
          <fpage>40</fpage>
          -
          <lpage>46</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>