<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>An Ontology Design Pattern for Data Integration in the Library Domain</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Patrick OBrien</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Carral</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mixter</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pascal Hitzler</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Montana State University</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Wright State University</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>A university's institutional repository (IR) contains the intellectual output of its faculty, sta and students. Its content is extensive and heterogenous, which complicates data aggregation and discovery tasks. To address these challenges, we propose the use of a conceptual ontology design pattern to model information for the IR domain which is general enough to be reused across di erent IR datasets.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>A university's institutional repository (IR) contains the intellectual output of
its faculty, sta and students. Content can be diverse and may include theses
and dissertations, proceedings, books, preprints and post-print journal articles,
as well as grey literature and datasets that support research conclusions. While
there are a number of Linked Open Datasets (LOD) with structured
bibliographic records on the web (i.e., DBLP, CiteSeer, Semantic Web Dog Food,
etc.), none have open access to a full text version of the scholarly article or a
robust view of the academic output for an entire University.</p>
      <p>Currently there are more than 2,400 IR a liated with universities or
disciplinary societies that are built on the principle of open access [7]. Most IR
include full text versions of the scholarly work encoded as media objects (PDF,
CSV, etc.). IRs contain a vast amount of data encapsulating information that
can provide unique perspectives on institutional research activities, such as the
interdisciplinary collaboration among researchers, departments and colleges.</p>
      <p>However, this valuable information is typically locked in bibliographic records
as simple text strings, or blobs, that are di cult for machines to isolate, ingest
and interpret. Unstructured IR data also hinder discovery by making indexing
by scholarly search engines di cult [1].</p>
      <p>To unlock the full potential of open access IR, it is necessary to dissect each
bibliographic record to identify, and link together, the entities contained within.
The research question, then, is whether a repeatable structured data model can
improve access and discovery of IR content by improving the quality of IR data.</p>
      <p>This paper describes a generic Ontology Design Pattern (ODP) based on a
project to convert bibliographic records from Montana State University's Open
Access Institutional Repository (IR) into linked data and still improve access
and discovery by services such as Google and Google Scholar. Like most libraries,
Montana State University's IR metadata was maintained in multiple production
systems using various formats to describe and access the same scholarly
papers encoded as full text PDF les. Speci cally, MAchine Readable Cataloging
(MARC) and Metadata Object Description Schema (MODS).</p>
      <p>The challenge was producing a single accurate, and robust, description of the
materials contained within the IR. This required sta to extract, consolidate, and
parse records into individual text strings and transform them into RDF. This
was done using a model based upon Schema.org, Dublin Core and extended using
the Citation Style Language for granular details. Once converted into RDF, the
data were reconciled against the university's internal Faculty Activity Database
to establish instance data of people with their Colleges and Departments. The
RDF data were then linked to the external sources of DBpedia and the Library of
Congress Subject Headings (LCSH). While the process was successful in
publishing Montana State University's IR as LOD[6], this process required signi cant
ad hoc and manual processes to identify and address data quality issues.</p>
      <p>We propose a generic Ontology Design Pattern (ODP) developed with the
three characteristics below would help IR managers improve the speed and e
ciency for publishing IR content as quality LOD:
1. Directly applicable to a variety of IR datasets and, thus, reduce the initial
hurdle for IRs to publish Linked Data [2].
2. Easily extensible, e.g., by aligning with existing library ontologies,
foundational ontologies, and other domain speci c vocabularies.
3. Help IR data managers improve the quality of IR metadata by reducing the
practice of manually reviewing bibliographic records for accuracy.</p>
      <p>Deriving such an ODP requires a generic use case which captures recurring
problems in di erent application domains. Competency questions are queries
that a domain expert would be expected to run against a knowledge base and
are recognized as a good approach for modeling requirements from multiple
domains. For the proposed ODP, such competency questions include:
1. Which records violate existing conditions required for scholarly citation?
2. What is the topic diversity of an organization intellectual output?
3. What is the depth of an organization's intellectual output?
4. Are their authors with "weak ties" to my domain of expertise I can explore
for "novel ideas" or collaboration in my research?
2</p>
    </sec>
    <sec id="sec-2">
      <title>Formalization</title>
      <p>This section discusses the more interesting classes, properties, and axioms of the
library pattern. Description Logics (DL) notation has been used to present the
axioms. To encode the pattern, we make use of the logic fragment SROIQ as
de ned in [5], which is the basis for the OWL 2 DL standard [4]. The proposed
ODP has been formally encoded using the Web Ontology Language (OWL).1 A
schematic view of the pattern is shown in Figure 1.
1 The pattern can be downloaded from
www.dropbox.com/sh/88jh5qwdgpxueqz/AAAj_kgmL5ErPL2JaPWtCvEsa?dl=0.</p>
      <p>
        CreativeWork: a generic class of creative work that includes things like books,
movies or software programs. A subclass of CreativeWork, ScholarlyWork, contains
all creative works related to scholarly research. The CreativeWork and
ScholarlyWork class relationship is enforced by axiom (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ). Axiom (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) indicates that every
scholarly work must have some author and exactly one publication date.
      </p>
      <sec id="sec-2-1">
        <title>ScholarlyWork v CreativeWork ScholarlyWork v 9hasCreator:Creatoru = 1hasPublicationDate:Date</title>
        <p>
          Creator: some person or organization responsible for generating some creative
work. All creators must have created at least some CreativeWork (
          <xref ref-type="bibr" rid="ref3">3</xref>
          ).
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>Creator v 9isCreatorOf:CreativeWork</title>
        <p>
          InstitutionalRepository: a repository which contains a set of creative works.
It is related to some organization. An institutional repository must contain some
type of scholarly work from some creator.
(
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
(
          <xref ref-type="bibr" rid="ref2">2</xref>
          )
(
          <xref ref-type="bibr" rid="ref3">3</xref>
          )
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>InstitutionalRepository v 9containsWorksFrom:Organization u 9holdsIntelectualOutput:CreativeWork</title>
        <p>
          Organization: An entity that formally links a group of people to a common
goal. A relevant class of Organization for our context is ScholarlyOrganization (
          <xref ref-type="bibr" rid="ref5">5</xref>
          ).
Universities, colleges, academic departments, and libraries are scholarly
organizations (
          <xref ref-type="bibr" rid="ref6 ref7">6-9</xref>
          ).
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>ScholarlyOrganization v Organization</title>
      </sec>
      <sec id="sec-2-5">
        <title>University v ScholarlyOrganization</title>
      </sec>
      <sec id="sec-2-6">
        <title>College v ScholarlyOrganization</title>
      </sec>
      <sec id="sec-2-7">
        <title>Department v ScholarlyOrganization Library v ScholarlyOrganization</title>
        <p>
          (
          <xref ref-type="bibr" rid="ref5">5</xref>
          )
(
          <xref ref-type="bibr" rid="ref6">6</xref>
          )
(
          <xref ref-type="bibr" rid="ref7">7</xref>
          )
(8)
(9)
(11)
(12)
(13)
(14)
(15)
(16)
(17)
        </p>
        <p>Universities have at least one college and one academic department (10).
Colleges are part of at most one university (11). Academic departments are part
of at least one and only one university (12).</p>
      </sec>
      <sec id="sec-2-8">
        <title>University v 9hasCollege:College u 9hasDepartment:AcademicDepartment (10)</title>
      </sec>
      <sec id="sec-2-9">
        <title>College v 1isCollegeOf:University Department v = 1isDepartmentOf:University</title>
        <p>We introduce subproperty statements (13-14) and declare the subproperty
hasSubOrganization as transitive with the following axioms:2
hasCollege v hasSubOrganization
hasDepartment v hasSubOrganization
hasSubOrganization hasSubOrganization v hasSubOrganization
The following role chain enables automatic determination of some
organization's intellectual output:
hasSubOrganization hasA liate v hasA liate</p>
        <p>hasA liate isCreatorOf v producesIntellectualOutput
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Conclusions and Future Work</title>
      <p>Applying an ODP to IR data will improve the e ciency and e ectiveness of
library metadata management work ows by quickly identify issues with data
that are currently done manually. Improving the quality of IR metadata and
publishing it for syndication on the Semantic Web will aid machine assisted
discovery and help address the limited availability of datasets that contain
adequate information linked to full-text scholarly research capable of supporting
semantics-driven Literature-Based Discovery [3].</p>
      <p>We are planing future iterations that extend the axiomatization and
populate the pattern using previous domain modeling and a real-world dataset from
Montana State University [6].
2 Many axioms which are intuitively derived from labels such as isCollegeOf
hasCollege are omitted. For a comprehensive list see out submission at
www.dropbox.com/sh/88jh5qwdgpxueqz/AAAj_kgmL5ErPL2JaPWtCvEsa?dl=0.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Arlitsch</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>O'Brien</surname>
            ,
            <given-names>P.S.:</given-names>
          </string-name>
          <article-title>Invisible institutional repositories: Addressing the low indexing ratios of irs in google scholar</article-title>
          .
          <source>Library Hi Tech</source>
          <volume>30</volume>
          (
          <issue>1</issue>
          ),
          <volume>60</volume>
          {
          <fpage>81</fpage>
          (
          <year>2012</year>
          ), http: //dx.doi.org/10.1108/07378831211213210
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Heath</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Berners-Lee</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Linked data - the story so far</article-title>
          .
          <source>Int. J. Semantic Web Inf. Syst</source>
          .
          <volume>5</volume>
          (
          <issue>3</issue>
          ),
          <volume>1</volume>
          {
          <fpage>22</fpage>
          (
          <year>2009</year>
          ), http://dx.doi.org/10.4018/jswis.2009081901
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Cameron</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bodenreider</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yalamanchili</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Danh</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vallabhaneni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Thirunarayan</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sheth</surname>
            ,
            <given-names>A.P.</given-names>
          </string-name>
          , Rind esch, T.C.
          <article-title>: A graph-based recovery and decomposition of swanson's hypothesis using semantic predications</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>46</volume>
          (
          <issue>2</issue>
          ),
          <volume>238</volume>
          {
          <fpage>251</fpage>
          (
          <year>2013</year>
          ), http://dx.doi.org/10.1016/j.jbi.
          <year>2012</year>
          .
          <volume>09</volume>
          . 004
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , Krotzsch,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Parsia</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            ,
            <surname>Patel-Schneider</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.F.</given-names>
            ,
            <surname>Rudolph</surname>
          </string-name>
          , S. (eds.)
          <source>: OWL 2 Web Ontology Language: Primer. W3C Recommendation (27 October</source>
          <year>2009</year>
          ), available at http://www.w3.org/TR/owl2-primer/
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Horrocks</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kutz</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sattler</surname>
            ,
            <given-names>U.</given-names>
          </string-name>
          :
          <article-title>The even more irresistible SROIQ</article-title>
          .
          <source>In: Proc. of the 10th Int. Conf. on Principles of Knowledge Representation and Reasoning (KR</source>
          <year>2006</year>
          ). pp.
          <volume>57</volume>
          {
          <fpage>67</fpage>
          . AAAI Press (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Mixter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , OBrien,
          <string-name>
            <given-names>P.</given-names>
            ,
            <surname>Arlitsch</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.</surname>
          </string-name>
          :
          <article-title>Describing theses and dissertations using schema.org</article-title>
          .
          <source>In: Proceedings of the 2014 International Conference on Dublin Core and Metadata Applications</source>
          . pp.
          <volume>138</volume>
          {
          <fpage>146</fpage>
          . DCMI'
          <volume>14</volume>
          , Dublin Core Metadata Initiative (
          <year>2014</year>
          ), http://dl.acm.org/citation.cfm?id=
          <volume>2771234</volume>
          .
          <fpage>2771249</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7. Pin eld, S.,
          <string-name>
            <surname>Salter</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bath</surname>
            ,
            <given-names>P.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hubbard</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Millington</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Anders</surname>
            ,
            <given-names>J.H.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hussain</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Open-access repositories worldwide,</article-title>
          <year>2005</year>
          -
          <fpage>2012</fpage>
          :
          <article-title>Past growth, current characteristics, and future possibilities</article-title>
          .
          <source>JASIST</source>
          <volume>65</volume>
          (
          <issue>12</issue>
          ),
          <volume>2404</volume>
          {
          <fpage>2421</fpage>
          (
          <year>2014</year>
          ), http: //dx.doi.org/10.1002/asi.23131
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>