<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Contextualizing controversies of the post- Lutheran reformation: A workflow for network analytics involving relational and graph databases</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Aline Deicke</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Anna Neovesky Digital Academy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Academy of Sciences</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Literature</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Mainz</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany aline.deicke@adwmainz.de</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>anna.neovesky@adwmainz.de</string-name>
        </contrib>
      </contrib-group>
      <abstract>
        <p>Data accumulated by long term research projects offers interesting perspectives for network analytical research when made available under a free license. In the research project "Controversia et Confessio", the datasets are stored in relational databases, whose structures do not lend themselves easily to the relationship-based thinking necessary for network research. The following paper presents an exemplary analytical workflow involving graph databases as an intermediary storage layer as it was employed in a current network analytical study on this data by the authors.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>1.1</p>
      <p>The research project “Controversia et Confessio” and use-case
The project “Controversia et Confessio - An Edition of Sources for the Development of Confessional
Documents and Confessionalization” focuses on the controversies that occurred during the “Wittenberger
Reformation” after Luther’s death. The events served to define the Lutheran doctrine and to consolidate
the Protestant denomination. The main medium through which the discussions were carried out are
pamphlets [1]. The project collects and redacts these pamphlets, which are published thematically and
categorized according to controversies as for example the “Osiandrian Controversy”2.</p>
      <p>Another result of the ongoing research and a central element to access the information is the
biobibliographic database, which contains the pamphlets with title, incipit, commentary, description,
corresponding controversy, and further metadata, as well as biographies of the most important persons</p>
      <sec id="sec-1-1">
        <title>1 http://www.controversia-et-confessio.de/ (Accessed on 2016-04-09).</title>
        <p>2 Overview of the volumes published so far:
www.ieg-mainz.de/forschungsprojekte/kontroversliteratur_und_streitkultur/controversia_et_cofessio (Accessed on 2016-04-09).
involved in the controversies. Initially organized in two separate databases, these were combined into one
relational database in 2015, when the Digital Academy3, the Digital Humanities department of the
Academy of Sciences and Literature | Mainz, started to collaborate with the project. Currently, the
database contains 2063 pamphlets and about 800 persons or entities.</p>
        <p>The use-cases portrayed in this paper belong to a research project conducted by the authors which
aims to examine the nature of theological discussion through pamphlets from a network analytical
perspective4. For this purpose, the relationships of authors and opponents as stated in the pamphlets are
transformed into a one-mode network and analysed. Apart from this, the study also focuses on
determining the potential of network methods for traditional editorial work. Additionally, the
transformation of data into different formats and data models also helps to assist the process of data
normalization that was undertaken to improve the quality of the project’s database.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2 Relational database</title>
      <p>The data is stored in a relational database management system, MySQL. The MySQL database is an
integral part of a complex content management driven online platform. A restructuring towards another
data model would have affected the application too much, but a graph based representation is ideal for
network related concerns, as it emphasizes the relation between items. Therefore a separate graph
database was created for the network analysis. As the steps for the transformation are fully documented,
the scripts can be reused and in case of modifications or corrections in the data the graph database can be
updated anytime with fresh data from the RDBMS.</p>
    </sec>
    <sec id="sec-3">
      <title>3 Graph database model</title>
      <p>First, the relational data model is transferred to a graph-based model (figure 1). The central objects
stay the same: Pamphlet, actor, year of publication, place of publication, and controversy. In the relational
data model, all of the mentioned items are tables in the database. In the graph model, said tables are nodes
containing the information necessary for the network analysis (the data model in the SQL-database
contains much more information, for example text and name variants, comments on date, folio and
material). The edges between the nodes are directed to introduce better semantics to the graph-based data
model. Other than most nodes, the nodes with the labels “Actor” and “Pamphlet” can be connected via
two different options, “AUTHOR_OF” and “OPPONENT_OF”. According to this model, the relational
data is imported into the graph database, in this case the open source community edition of Neo4j5.</p>
    </sec>
    <sec id="sec-4">
      <title>4 Import of MySQL-data and export of graph data</title>
      <p>To transfer the original data, all necessary fields from the MySQL database are dumped into several
CSV-files. Before importing the data it should be checked that the files are well-formed, for example by
using appropriate software or the tools of the Neo4j-shell itself6. A detailed description of the import
approach as it was employed can be found in Andreas Kuczera’s article on graph databases for Historians
[2, 3].</p>
      <sec id="sec-4-1">
        <title>3 www.digitale-akademie.de (Accessed on 2016-04-09).</title>
        <p>4 The authors would like to thank Dr. Jan-Martin Lies and Dr. habil. Kęstutis Daugirdas as well as Prof. Dr. Irene Dingel for
their constructive criticism and support.</p>
        <p>5 http://neo4j.com; https://github.com/neo4j (Accessed on 2016-04-09). Version used: Neo4j Community Edition 2.3.2.
6 http://neo4j.com/developer/guide-import-csv/#_csv_data_quality (Accessed on 2016-04-09).
5 Incorporating graph databases into network research</p>
        <p>The exported CSV- or GraphML-files can be imported into a network analysis software for further
visualization and processing. In the case of this project, one of the challenges posed by the source data is
the handling of relationship attributes in combination with weighted edges. Because authors can have
opponent-relations to the same actor in form of several pamphlets, the weighted
author-opponentrelationship between them can span not just one date, but a range of years, and can include affiliations to
several controversies. While this particular problem was not of importance to the current study, it can be
solved by combining edges before the import into the network analysis software. In case of the
“Controversia et Confessio” dataset, the import of author-opponent-edges as described above results in a
graph containing 389 nodes and 827 weighted edges10.</p>
        <p>While conducting the analysis of the author-opponent-graph, it frequently proves beneficial to return
to the graph database due to a number of circumstances, e.g. to export new data with different edges,
attributes or other variables to answer new research questions arising in the context of the conducted
research, to further examine relationships in their original context – because the data is kept in a relational
structure in the graph database, there is no cognitive dissonance as with databases that are relational only
in the computational sense – or to conduct further non-network analysis on the data.</p>
      </sec>
      <sec id="sec-4-2">
        <title>7 http://neo4j.com/docs/2.3.2/graphdb-neo4j.html#graphdb-neo4j-labels (Accessed on 2016-04-09).</title>
        <p>8 http://graphml.graphdrawing.org/index.html (Accessed on 2016-04-09).
9 https://github.com/jexp/neo4j-shell-tools (Accessed on 2016-04-09).</p>
        <p>10 Preliminary results of the study as presented at the 10. Workshop on Historical Network Research in Düsseldorf, 28.–
30.04.2016, can be found at http://prezi.com/7kan3xmiarmw/?utm_campaign=share&amp;utm_medium=copy&amp;rc=ex0share (German;
Accessed on 2016-04-09).</p>
        <p>An example for the first case would be the export of subgraphs representing single controversies or
time spans, or the examination of different relationship types like co-authorship. In the third case, one
application could be a statistical examination of the data such as contrasting the number of pamphlets an
author has written against an opponent with the number written without such an explicit reference.</p>
        <p>For the second case, a simple example shall be discussed. While analyzing the subgraph of the
controversy pertaining to Andreas Osiander’s view of justification, the so-called “Osiandrian
Controversy”, several features of the network turned out to require more in-depth attention.</p>
        <p>As figure 2 shows, there is a cluster of several actors (visualized in blue) in one corner of the graph
who are all connected to two persons – Joachim Mörlin and Martin Chemnitz – but neither to each other
nor to other actors in the graph. Because of the nature of pamphlets, which can be written by several
persons against a large group of recipients, the context of this feature was examined in the graph
database:
//Match pamphlets written by Martin Chemnitz in the context of the
'Osiandrian controversy' and all connected actors
MATCH (n:Actor)-[a:AUTHOR_OF]-&gt;(m:Pamphlet)&lt;--(o:Actor),
(m)[c:DATED_TO]-&gt;(p:Year), (m)-[e:CONTROVERSY]-&gt;(r:Controversy
{uid:'6611'})WHERE n.name =~ '.*Chemnitz.*'
RETURN n, m, o, p</p>
        <p>A query is written which matches all pamphlets written by persons whose names contain the string
“Chemnitz” and which are connected to the controversy with the uid “6611”, that is to say the
“Osiandrian Controversy”. To catch all actors connected to these pamphlets, the relation type between the
pamphlets m and the actors o is unspecified.</p>
        <p>The visualization output, when viewed in the
graph view, can be further adjusted by assigning
different colors and sizes to nodes as well as edges,
which can serve to clarify the resulting graph
structure. As seen in figure 3 Chemnitz wrote two
pamphlets regarding the “Osiandrian Controversy”,
both in the year 1567. Because one of them has no
opponent, it is not included in the network analysis.</p>
        <p>The other one, however, was written with Mörlin and
directed against five actors, which conform to the
ones seen in the subgraph above (figure 2).</p>
        <p>Going back to the network analysis program, we</p>
        <p>Figure 3: Graph view of the Neo4j browser can now confirm that the blue cluster consists of just
interface showing pamphlets written by Martin one pamphlet involving two authors and five
Chemnitz and connecting nodes opponents. In addition to this, its late date – 1567 –
shows that it was written years after the main
discussion regarding Osiander’s view of justification had ended.</p>
        <p>To clarify the matter even further, more information regarding the overall context can be queried, e.g.
which other controversies are addressed by the pamphlet in question. In a next step, these findings can be
checked against the bibliographical database of the project “Controversia et Confessio” to find out which
role, if any, this specific pamphlet played in the “Osiandrian Controversy” or if it was a later paper
dealing with general issues or summaries of prior discussions.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>6 Conclusions</title>
      <p>Frequently, project data is stored in relational databases, which is not ideal for a network analytical
workflow. In order not to compromise web applications running on these datasets, it is advisable to
transfer this data into a second, better suited architecture. Graph databases allow for an easy handling of
data based on relationships and for an effective workflow going back and forth between storage and
analysis software without having to switch between different types of data modeling. As shown, the graph
model enables segmentation as well as further independent analysis of selected network data. Moreover,
by providing a new perspective on the data set, this workflow yields new impulses for research as well as
data curation and normalization processes.</p>
    </sec>
    <sec id="sec-6">
      <title>Literature</title>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Dingel</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          <article-title>Zwischen Disputation und Polemik. „Streitkultur“ in den nachinterimistischen Kontroversen</article-title>
          . in Jürgens, H. P. and
          <string-name>
            <surname>Weller</surname>
          </string-name>
          , T. ed.
          <source>Streitkultur und Öffentlichkeit im konfessionellen Zeitalter</source>
          ,
          <source>Vandenhoeck &amp; Ruprecht</source>
          , Göttingen,
          <year>2013</year>
          ,
          <fpage>17</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Kuczera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Graphbasierte digitale Editionen, Mittelalter. Interdisziplinäre Forschung und Rezeptionsgeschichte</article-title>
          .
          <source>Retrieved April 9th</source>
          ,
          <year>2016</year>
          , from http://mittelalter.hypotheses.org/7994.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Kuczera</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          <article-title>Graphdatenbanken für Historiker. Netzwerke in den Registern der Regesten Kaiser Friedrichs III. mit neo4j und Gephi, Mittelalter. Interdisziplinäre Forschung und Rezeptionsgeschichte</article-title>
          .
          <source>Retrieved April 9th</source>
          ,
          <year>2016</year>
          , from http://mittelalter.hypotheses.org/5995.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>