<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Contextualizing controversies of the post-Lutheran reformation: A workflow for network analytics involving relational and graph databases</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Aline</forename><surname>Deicke</surname></persName>
							<email>aline.deicke@adwmainz.de</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Digital Academy</orgName>
								<orgName type="department" key="dep2">Academy of Sciences and Literature</orgName>
								<address>
									<settlement>Mainz</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Anna</forename><surname>Neovesky</surname></persName>
							<email>anna.neovesky@adwmainz.de</email>
							<affiliation key="aff0">
								<orgName type="department" key="dep1">Digital Academy</orgName>
								<orgName type="department" key="dep2">Academy of Sciences and Literature</orgName>
								<address>
									<settlement>Mainz</settlement>
									<country key="DE">Germany</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Contextualizing controversies of the post-Lutheran reformation: A workflow for network analytics involving relational and graph databases</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">81A35A3EB31870CD93275998B1F6F1EF</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-25T09:09+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data accumulated by long term research projects offers interesting perspectives for network analytical research when made available under a free license. In the research project "Controversia et Confessio", the datasets are stored in relational databases, whose structures do not lend themselves easily to the relationship-based thinking necessary for network research. The following paper presents an exemplary analytical workflow involving graph databases as an intermediary storage layer as it was employed in a current network analytical study on this data by the authors.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Over the last decade, network analysis has become a widely popular method of examining historical data. With the growing number of research projects publishing their collected data under open licenses, it has become possible to conduct analyses that rely on big data sets of high quality. One of these projects is "Controversia et Confessio" <ref type="bibr" target="#b0">1</ref> , a long term research project of the Academy of Sciences and Literature | Mainz and the Leibniz Institute of European History in Mainz. Its data, published under a CC-BY-license, is stored in a MySQL-database that also powers a website. Because of its distinct purpose and structure, this relational database proves difficult as a basis for the data requirements needed for multiple, distinguished network analyses. Therefore, a workflow has been developed that relies on graph databases as an intermediary layer of data storage and analysis.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.1">The research project "Controversia et Confessio" and use-case</head><p>The project "Controversia et Confessio -An Edition of Sources for the Development of Confessional Documents and Confessionalization" focuses on the controversies that occurred during the "Wittenberger Reformation" after Luther's death. The events served to define the Lutheran doctrine and to consolidate the Protestant denomination. The main medium through which the discussions were carried out are pamphlets [1]. The project collects and redacts these pamphlets, which are published thematically and categorized according to controversies as for example the "Osiandrian Controversy" <ref type="bibr" target="#b1">2</ref> .</p><p>Another result of the ongoing research and a central element to access the information is the biobibliographic database, which contains the pamphlets with title, incipit, commentary, description, corresponding controversy, and further metadata, as well as biographies of the most important persons involved in the controversies. Initially organized in two separate databases, these were combined into one relational database in 2015, when the Digital Academy<ref type="foot" target="#foot_2">3</ref> , the Digital Humanities department of the Academy of Sciences and Literature | Mainz, started to collaborate with the project. Currently, the database contains 2063 pamphlets and about 800 persons or entities.</p><p>The use-cases portrayed in this paper belong to a research project conducted by the authors which aims to examine the nature of theological discussion through pamphlets from a network analytical perspective<ref type="foot" target="#foot_3">4</ref> . For this purpose, the relationships of authors and opponents as stated in the pamphlets are transformed into a one-mode network and analysed. Apart from this, the study also focuses on determining the potential of network methods for traditional editorial work. Additionally, the transformation of data into different formats and data models also helps to assist the process of data normalization that was undertaken to improve the quality of the project's database.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Relational database</head><p>The data is stored in a relational database management system, MySQL. The MySQL database is an integral part of a complex content management driven online platform. A restructuring towards another data model would have affected the application too much, but a graph based representation is ideal for network related concerns, as it emphasizes the relation between items. Therefore a separate graph database was created for the network analysis. As the steps for the transformation are fully documented, the scripts can be reused and in case of modifications or corrections in the data the graph database can be updated anytime with fresh data from the RDBMS.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Graph database model</head><p>First, the relational data model is transferred to a graph-based model (figure <ref type="figure" target="#fig_0">1</ref>). The central objects stay the same: Pamphlet, actor, year of publication, place of publication, and controversy. In the relational data model, all of the mentioned items are tables in the database. In the graph model, said tables are nodes containing the information necessary for the network analysis (the data model in the SQL-database contains much more information, for example text and name variants, comments on date, folio and material). The edges between the nodes are directed to introduce better semantics to the graph-based data model. Other than most nodes, the nodes with the labels "Actor" and "Pamphlet" can be connected via two different options, "AUTHOR_OF" and "OPPONENT_OF". According to this model, the relational data is imported into the graph database, in this case the open source community edition of Neo4j<ref type="foot" target="#foot_4">5</ref> .</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Import of MySQL-data and export of graph data</head><p>To transfer the original data, all necessary fields from the MySQL database are dumped into several CSV-files. Before importing the data it should be checked that the files are well-formed, for example by using appropriate software or the tools of the Neo4j-shell itself<ref type="foot" target="#foot_5">6</ref> . A detailed description of the import approach as it was employed can be found in Andreas Kuczera's article on graph databases for Historians [2, 3].</p><p>To ensure a better backwards compatibility and easier comparisons between the MySQL and the graph database, the original unique identifiers from the MySQL database are used to identify single nodes. Due to the fact that the tablebased context of the RDBMS does not exist in the graph database, these uids are not actually unique by themselves. This problem can be dealt with through the introduction of labels 7 such as ":Pamphlets", especially in the commands creating edges between different types of nodes. Labels also allow for a greater deal of specificity and easier and more semantic handling of queries, as will be evident later on.</p><p>For the network analysis, relationships between authors of pamphlets and their opponents are explored. To export the appropriate data out of the graph database, two approaches are possible, namely the export as a CSV or a GraphML file CSV has been the preferred data format for network research for a long time as it has an easy structure and is human readable. However, in some network visualization programs, problems can arise when more complex edge tables containing attributes are imported, especially if edges are to be summed up to create weighted relationships. This can be circumvented with the use of GraphML 8 , an XML standard format used to describe graphs. While there is no native support in Neo4j for exporting GraphML, a plugin is available that among other features enables the export of GraphML from the Neo4j-shell 9 . Because GraphML does not just mark up nodes but also the edges themselves, an additional relationship has to be added to the graph data model which up to now was included only implicitly, namely the AUTHOR_OPPONENT-relation (figure <ref type="figure" target="#fig_0">1</ref>).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Incorporating graph databases into network research</head><p>The exported CSV-or GraphML-files can be imported into a network analysis software for further visualization and processing. In the case of this project, one of the challenges posed by the source data is the handling of relationship attributes in combination with weighted edges. Because authors can have opponent-relations to the same actor in form of several pamphlets, the weighted author-opponentrelationship between them can span not just one date, but a range of years, and can include affiliations to several controversies. While this particular problem was not of importance to the current study, it can be solved by combining edges before the import into the network analysis software. In case of the "Controversia et Confessio" dataset, the import of author-opponent-edges as described above results in a graph containing 389 nodes and 827 weighted edges <ref type="bibr">10</ref> .</p><p>While conducting the analysis of the author-opponent-graph, it frequently proves beneficial to return to the graph database due to a number of circumstances, e.g. to export new data with different edges, attributes or other variables to answer new research questions arising in the context of the conducted research, to further examine relationships in their original context -because the data is kept in a relational structure in the graph database, there is no cognitive dissonance as with databases that are relational only in the computational sense -or to conduct further non-network analysis on the data. 7 http://neo4j.com/docs/2.3.2/graphdb-neo4j.html#graphdb-neo4j-labels (Accessed on 2016-04-09). 8 http://graphml.graphdrawing.org/index.html (Accessed on 2016-04-09). 9 https://github.com/jexp/neo4j-shell-tools (Accessed on 2016-04-09). <ref type="bibr">10</ref> Preliminary results of the study as presented at the 10. Workshop on Historical Network Research in Düsseldorf, 28.-30.04.2016, can be found at http://prezi.com/7kan3xmiarmw/?utm_campaign=share&amp;utm_medium=copy&amp;rc=ex0share (German; Accessed on 2016-04-09). An example for the first case would be the export of subgraphs representing single controversies or time spans, or the examination of different relationship types like co-authorship. In the third case, one application could be a statistical examination of the data such as contrasting the number of pamphlets an author has written against an opponent with the number written without such an explicit reference.</p><p>For the second case, a simple example shall be discussed. While analyzing the subgraph of the controversy pertaining to Andreas Osiander's view of justification, the so-called "Osiandrian Controversy", several features of the network turned out to require more in-depth attention.</p><p>As figure <ref type="figure" target="#fig_2">2</ref> shows, there is a cluster of several actors (visualized in blue) in one corner of the graph who are all connected to two persons -Joachim Mörlin and Martin Chemnitz -but neither to each other nor to other actors in the graph. Because of the nature of pamphlets, which can be written by several persons against a large group of recipients, the context of this feature was examined in the graph database: A query is written which matches all pamphlets written by persons whose names contain the string "Chemnitz" and which are connected to the controversy with the uid "6611", that is to say the "Osiandrian Controversy". To catch all actors connected to these pamphlets, the relation type between the pamphlets m and the actors o is unspecified. The visualization output, when viewed in the graph view, can be further adjusted by assigning different colors and sizes to nodes as well as edges, which can serve to clarify the resulting graph structure. As seen in figure <ref type="figure">3</ref> Chemnitz wrote two pamphlets regarding the "Osiandrian Controversy", both in the year 1567. Because one of them has no opponent, it is not included in the network analysis. The other one, however, was written with Mörlin and directed against five actors, which conform to the ones seen in the subgraph above (figure <ref type="figure" target="#fig_2">2</ref>).</p><p>Going back to the network analysis program, we can now confirm that the blue cluster consists of just one pamphlet involving two authors and five opponents. In addition to this, its late date -1567shows that it was written years after the main discussion regarding Osiander's view of justification had ended.</p><p>To clarify the matter even further, more information regarding the overall context can be queried, e.g. which other controversies are addressed by the pamphlet in question. In a next step, these findings can be checked against the bibliographical database of the project "Controversia et Confessio" to find out which role, if any, this specific pamphlet played in the "Osiandrian Controversy" or if it was a later paper dealing with general issues or summaries of prior discussions.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Graph-based data model of the "Controversia et Confessio" dataset</figDesc><graphic coords="3,93.60,106.80,170.16,170.16" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>/</head><label></label><figDesc>/Match pamphlets written by Martin Chemnitz in the context of the 'Osiandrian controversy' and all connected actors MATCH (n:Actor)-[a:AUTHOR_OF]-&gt;(m:Pamphlet)&lt;--(o:Actor), (m)-[c:DATED_TO]-&gt;(p:Year), (m)-[e:CONTROVERSY]-&gt;(r:Controversy {uid:'6611'})WHERE n.name =~ '.*Chemnitz.*' RETURN n, m, o, p</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_2"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Subgraph visualizing the "Osiandrian Controversy" (Made with Gephi 0.9.1. Size of nodes: Degree, color of nodes: Modularity).</figDesc><graphic coords="4,93.60,106.80,410.88,260.40" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0"><head></head><label></label><figDesc></figDesc><graphic coords="5,-46.32,27.12,487.44,268.08" type="bitmap" /></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="1" xml:id="foot_0">http://www.controversia-et-confessio.de/ (Accessed on</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2016" xml:id="foot_1">-04-09).<ref type="bibr" target="#b1">2</ref> Overview of the volumes published so far: www.ieg-mainz.de/forschungsprojekte/kontroversliteratur_und_streitkultur/controversia_et_cofessio (Accessed on 2016-04-09).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="3" xml:id="foot_2">www.digitale-akademie.de (Accessed on 2016-04-09).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="4" xml:id="foot_3">The authors would like to thank Dr. Jan-Martin Lies and Dr. habil. Kęstutis Daugirdas as well as Prof. Dr. Irene Dingel for their constructive criticism and support.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="5" xml:id="foot_4">http://neo4j.com; https://github.com/neo4j (Accessed on 2016-04-09). Version used: Neo4j Community Edition 2.3.2.</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_5">http://neo4j.com/developer/guide-import-csv/#_csv_data_quality (Accessed on 2016-04-09).</note>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="6" xml:id="foot_6">ConclusionsFrequently, project data is stored in relational databases, which is not ideal for a network analytical workflow. In order not to compromise web applications running on these datasets, it is advisable to transfer this data into a second, better suited architecture. Graph databases allow for an easy handling of data based on relationships and for an effective workflow going back and forth between storage and analysis software without having to switch between different types of data modeling. As shown, the graph model enables segmentation as well as further independent analysis of selected network data. Moreover, by providing a new perspective on the data set, this workflow yields new impulses for research as well as data curation and normalization processes.</note>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Zwischen Disputation und Polemik</title>
		<author>
			<persName><forename type="first">I</forename><surname>Dingel</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Streitkultur und Öffentlichkeit im konfessionellen Zeitalter</title>
				<editor>
			<persName><forename type="first">H</forename><forename type="middle">P</forename><surname>Jürgens</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">T</forename><surname>Weller</surname></persName>
		</editor>
		<meeting><address><addrLine>Göttingen</addrLine></address></meeting>
		<imprint>
			<publisher>Vandenhoeck &amp; Ruprecht</publisher>
			<date type="published" when="2013">2013</date>
			<biblScope unit="page" from="17" to="29" />
		</imprint>
	</monogr>
	<note>Streitkultur&quot; in den nachinterimistischen Kontroversen</note>
</biblStruct>

<biblStruct xml:id="b1">
	<monogr>
		<author>
			<persName><forename type="first">A</forename><surname>Kuczera</surname></persName>
		</author>
		<ptr target="http://mittelalter.hypotheses.org/7994" />
		<title level="m">Graphbasierte digitale Editionen, Mittelalter. Interdisziplinäre Forschung und Rezeptionsgeschichte</title>
				<imprint>
			<date type="published" when="2016-04-09">April 9th, 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Graphdatenbanken für Historiker</title>
		<author>
			<persName><forename type="first">A</forename><surname>Kuczera</surname></persName>
		</author>
		<ptr target="http://mittelalter.hypotheses.org/5995.Figure3" />
	</analytic>
	<monogr>
		<title level="m">Netzwerke in den Registern der Regesten Kaiser Friedrichs III. mit neo4j und Gephi, Mittelalter. Interdisziplinäre Forschung und Rezeptionsgeschichte</title>
				<imprint>
			<date type="published" when="2016-04-09">April 9th, 2016</date>
		</imprint>
	</monogr>
	<note>: Graph view of the Neo4j browser interface showing pamphlets written by Martin Chemnitz and connecting nodes</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
