<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards a Knowledge Graph for a Research Group with Focus on Qualitative Analysis of Scholarly Papers</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Brandenburg University of Applied Sciences</institution>
          ,
          <addr-line>Brandenburg a. d. H.</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Support of scientific workflows by semantic technology gains increasing interest in recent years. Huge efforts are put on providing structured, standard-based meta data and on machine based qualitative analysis of unstructured content of scholarly papers. This helps researchers to stay oriented in an ever growing and gaining complexity field. Semantic technologies have also the potential to support the in-depth involvement in scholarly papers, like practiced in research seminars. The paper reports on the preliminary results of an undertaking to support the collaborative documentation and reuse of qualitative analysis of scholarly papers in an information systems research group. A vocabulary is developed and openly provided. The system is implemented on the base of OntoWiki and can be accessed openly.</p>
      </abstract>
      <kwd-group>
        <kwd>Qualitative literature analysis</kwd>
        <kwd>Scientific workflows</kwd>
        <kwd>Research Group Knowledge Base</kwd>
        <kwd>Collaborative annotation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Research groups form the smallest, often informal social entity in the scientific
system. Their performance and their cohesion are mainly based on shared scientific
interests and a common, high level of expertise in the research field. Even if this
research field is narrowly specified, it remains a great challenge to keep in view the
state of knowledge. Beyond the awareness of other research groups and influential
researchers in the field, a qualitative expert analysis should focus on research
questions, on methods applied to them, as well as on research findings and their critical
disputation. Undoubtedly, regular scientific seminars are a traditional and effective
instrument for this, since they create a collective realm of experience and discussion.</p>
      <p>The small, informal research group Business Modeling and Knowledge
Engineering (BMaKE) at the Brandenburg University of Applied Sciences has established such
a seminar recently. This group is anchored in the program of information systems.
While the selection of the papers to be discussed and the structure to be used in expert
analysis were quickly agreed, the form of the knowledge base to be created for storing
the analysis results led immediately to the following research question:
• How to build a sustainable infrastructure for storing the knowledge, collectively
worked out in seminars, in a systematic, structured and easy to re-use way?
The collaboration environments and systems successfully used so far in project work
and teaching (Google Drive, GitHub, Confluence, Slack) are quite suitable for the
exchange of data and information. They fall short in providing a systematic
knowledge storage which can be queried flexibly, since they don’t implement the
necessary knowledge graph structure.</p>
      <p>At this point, the research question has not yet been definitively answered. The
paper aims at presenting the initiated approach and at discussing the experiences so far.
Therefore, the remainder of the paper is organized as follows: Section 2 provides an
overview of relevant work on semantic analysis and structuring of scholarly papers
content. The elaborated vocabulary to support the knowledge base is presented in
Section 3, whereas Section 4 introduces the preliminary system design for the targeted
knowledge base. Section 5 reflects the first implementation experiences. The paper
closes with a short conclusion and an outlook on further work in Section 6.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>There are different lines of research dealing with semantic analysis and the
deployment of structured data on scholarly papers and other relevant objects of scientific
environments and workflows, like conferences, proceedings, people, and projects.
Table 1 gives a brief overview mentioning exemplary work in the field as well as
main research objectives and findings for each of these lines.
The results of meta data extraction projects like presented in Table 1 can be used as
basic input for the research group knowledge base. The undertaking itself is a kind of
collaborative annotation, but with a more specific focus. The increasing production of
natively structured data will also support a basic input – as it looks today. However, it
is not impossible that this form of publication will also support very specific,
qualitative analysis questions in the future. The methods of text analysis and machine
learning are the closest to the qualitative analysis of scholarly papers. Though, since a
qualitative analysis is very field-specific, a high-quality training set is required.
Perhaps the knowledge base presented here can serve as a training set for automatic
qualitative analysis for scholarly papers in the field of Business Modeling and Knowledge
Engineering from the Information Systems’ perspective.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Vocabulary for Qualitative Analysis of Scholarly Papers</title>
      <p>
        Like stated above, the main objective of the required knowledge base is to support the
research group’s collective analysis of scientific publications in the field of
information systems. It is therefore obvious to structure scholarly papers according to their
main qualitative features: (i) research objectives, (ii) research methods, (iii) research
findings, (iv) future work, and (v) critical issues (comp. e.g. [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]). To allow
semantically rich queries to the knowledge base, these features shall be further structured,
whenever possible. Candidates for doing this are the research methods and the
research objectives. The main research methods in information systems are described in
[
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. For structuring the research objectives, a flexible, pairwise combination of
research activities and research artifacts can be applied. Both can be modeled as clear
enumerations when a limited research field is considered (see Table 2).
Two independent approaches were pursued in the search for reusable classes,
relations and attributes for the required knowledge schema. As a vocabulary with
increasing importance for websites first Schema.org was examined. It was found that the
rather formal, accompanying information on scholarly papers necessary for the use
case can be modeled adequately with elements of this vocabulary. The mentioned
above qualitative features of papers may reuse the relation schema:about, but no
fitting elements were found themselves. For filling the gaps, the SPAR Ontologies [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],
in particular the Discourse Elements Ontology (DEO), were considered in more
detail. The arguments for not reusing DEO are the following:
1. There are substantial differences between rhetorical elements used by an author or
detected by automatic text analysis, as assumed in DEO, and qualitative features
of a paper detected by expert analysis, as intended here. E.g. critical issues are an
individual estimation of a human reader and therefore are not provided in DEO.
2. As already stated, some of the analyzed features are to be specified as
enumerations. DEO don’t implement such constraints on data types.
      </p>
      <p>Therefore, these entities were modeled as new specific classes which nevertheless are
semantically and structurally integrated in the Schema.org frame. Fig. 1 shows the
high-level schema of the vocabulary. Red nodes are taken from Schema.org, the other
ones are specifically modeled. Green nodes are of type Enumeration, whereas the
white nodes stand for abstract concepts implementing a list structure. The vocabulary
is documented on GitHub1 and referenced in LOV2.</p>
    </sec>
    <sec id="sec-4">
      <title>Preliminary System Design of the Knowledge Graph</title>
      <p>
        The target system can be described as a knowledge graph, as defined in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] and
further specified in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Fig. 2 shows an abstract model of this knowledge graph where
the characteristic elements, particularly the exploited knowledge sources and the
provided knowledge services, are adapted to the use case under consideration. The
shaded items in the model represent already implemented, at least partly, elements. The
boxes with dots reflect the further extensibility of the system.
1 https://github.com/bmake/scholarlygraph/
2 http://lov.okfn.org/dataset/lov/vocabs/spvqa
      </p>
      <p>Knowledge
Sources
Qualitative
judgments of</p>
      <p>expert
researchers
Openly
accessible
structured
meta data
WikiData
ORCID data
...</p>
      <p>Integration &amp;
orchestration
Extraction
Cleaning &amp;
transformation
Interlinking
Validation</p>
      <p>Knowledge Graph for Scholarly Papers</p>
      <p>Knowledge Graph Management</p>
      <p>Triple Store</p>
      <p>Access
LinkedData API</p>
      <p>SPARQL endpoint
Analysis &amp;
reasoning
Schema Engineering &amp; Information
Knowledge</p>
      <p>Services
Research group</p>
      <p>
        Wiki
Researcher
support tool
Knowledge
explorer
Information
hub
Now, the system is implemented as an out of the box OntoWiki [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] comprising a
standard wiki interface, a triple store and a SPARQL endpoint. It is populated
manually by researchers during their qualitative analysis of seminar papers. Even external
sources of structured meta data are for the time of writing queried and interlinked
manually. Editing is supported either by Turtle templates for creating importable data
dumps or can be performed directly in the wiki. This preliminary workflow is
additionally used to evaluate processes and sources for automatic data input. Vocabulary
(schema) information is provided by the documentation mentioned above.
5
      </p>
    </sec>
    <sec id="sec-5">
      <title>First Implementation Experience</title>
      <p>The preliminary implementation as described in the previous section can be
considered as a research prototype3. Since the system aims at the structured documentation
and flexible reuse of seminar output of the BMaKE research group, the knowledge
base is growing slowly, but continuously. At the time of writing, 35 scholarly papers
from 11 publications correlated with 9 publication events (conferences) are analyzed.
They are interlinked with more than 100 authors, nearly 50 organizations and places.
Each month 5 to 10 new papers will be analyzed and added to the knowledge base.</p>
      <p>The immediate support of the research group’s work allows an in-process
evaluation of the support quality and a deeper elicitation of needs and requirements. The
first experiences in using the system in the context of scientific seminars shows the
following informal results:
1. Pure consumers of the system assessed it as very helpful in gathering deeper
knowledge in the research field.
3 https://bmakewiki.th-brandenburg.de</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Further Work</title>
      <p>According to preliminary assessment, a knowledge graph can be considered as a
sustainable infrastructure for storing and reusing the results of qualitative analyses of
scholarly papers. Even the preliminary implementation presented in this paper was
evaluated as an effective (even if up to now not efficient) measure to support the work
of a research group. There are three main lines of further development of the system:
(i) Formal meta data which are not object of qualitative analysis must be integrated in
an automatic way reusing structured data provided by open sources. (ii) A well-usable
template-based form should be developed for capturing the results of the qualitative
analysis. (iii) The use cases for the support of the research work must be elicited
systematically and on this basis the research group wiki should be adapted. These
development steps shall than be followed by a formal, structured evaluation.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aslam</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <article-title>A. e. a.: A Generic Framework for Adding Semantics to Digital Libraries</article-title>
          . In: Ciuciu,
          <string-name>
            <surname>I. e. a. (eds.) OTM</surname>
          </string-name>
          <year>2016</year>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>10034</volume>
          , pp.
          <fpage>277</fpage>
          -
          <lpage>281</lpage>
          . Springer, Cham (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Vahdati</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , e. a.:
          <article-title>OpenResearch - Collaborative Management of Scholarly Communication Metadata</article-title>
          . In: Blomqvist E.,
          <string-name>
            <surname>Ciancarini</surname>
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Poggi</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vitali</surname>
            <given-names>F</given-names>
          </string-name>
          . (eds.)
          <article-title>EKAW 2016</article-title>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>10024</volume>
          , pp.
          <fpage>778</fpage>
          -
          <lpage>793</lpage>
          . Springer, Cham (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>Di</given-names>
            <surname>Iorio</surname>
          </string-name>
          ,
          <string-name>
            <surname>A</surname>
          </string-name>
          , e. a.:
          <article-title>The RASH Framework</article-title>
          .
          <source>In: ISWC</source>
          <year>2015</year>
          ,
          <article-title>Poster</article-title>
          &amp; Demo Session, http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1486</volume>
          /paper_72.pdf,
          <source>last accessed</source>
          <year>2017</year>
          /08/05.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Ronzano</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Saggion</surname>
          </string-name>
          , H.:
          <article-title>Knowledge Extraction and Modeling from Scientific Publications</article-title>
          . In:
          <string-name>
            <surname>González-Beltrán</surname>
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osborne</surname>
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peroni</surname>
            <given-names>S</given-names>
          </string-name>
          . (eds.) SAVE-SD
          <year>2016</year>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>9792</volume>
          , pp.
          <fpage>11</fpage>
          -
          <lpage>25</lpage>
          . Springer, Cham (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Rossig</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Prätsch</surname>
          </string-name>
          , J.:
          <source>Wissenschaftliche Arbeiten. 7th edn. BerlinDruck</source>
          ,
          <string-name>
            <surname>Achim</surname>
          </string-name>
          (
          <year>2008</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Wilde</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hess</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Forschungsmethoden der Wirtschaftsinformatik - Eine empirische Untersuchung</article-title>
          .
          <source>In: WIRTSCHAFTSINFORMATIK</source>
          <volume>49</volume>
          (
          <issue>4</issue>
          ),
          <fpage>280</fpage>
          -
          <lpage>287</lpage>
          (
          <year>2007</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Peroni</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Semantic Publishing and Referencing Ontologies</article-title>
          .
          <source>In: Semantic Web Technologies and Legal Scholarly Publishing</source>
          , pp.
          <fpage>121</fpage>
          -
          <lpage>193</lpage>
          . Springer, Cham (
          <year>2014</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Paulheim</surname>
          </string-name>
          , H.:
          <article-title>Knowledge Graph Refinement: A Survey of Approaches and Evaluation Methods</article-title>
          .
          <source>In: Semantic Web Journal (Preprint)</source>
          ,
          <fpage>1</fpage>
          -
          <lpage>20</lpage>
          (
          <year>2016</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Meister</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jetschni</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kreideweiß</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Konzept und Prototyp einer dezentralen Wissensinfrastruktur zu Hochschuldaten für Mensch und Maschine</article-title>
          .
          <source>In: INFORMATIK</source>
          <year>2017</year>
          .
          <article-title>LNI</article-title>
          . GI e. V.,
          <string-name>
            <surname>Bonn</surname>
          </string-name>
          (
          <year>2017</year>
          ) in print.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Frischmuth</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Arndt</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Martin</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>OntoWiki 1.0</article-title>
          . In: SEMANTiCS 2016, Poster &amp; Demo Session, http://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>1695</volume>
          /paper11.pdf,
          <source>last accessed</source>
          <year>2017</year>
          /08/05.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>