=Paper= {{Paper |id=Vol-3019/p53 |storemode=property |title=Towards a Representation of Temporal Data in Archival Records: Use Cases and Requirements |pdfUrl=https://ceur-ws.org/Vol-3019/LinkedArchives_2021_paper_4.pdf |volume=Vol-3019 |authors=Oleksandra Bruns,Tabea Tietz,Mahsa Vafaie,Danilo Dessì,Harald Sack }} ==Towards a Representation of Temporal Data in Archival Records: Use Cases and Requirements== https://ceur-ws.org/Vol-3019/LinkedArchives_2021_paper_4.pdf
 Towards a Representation of Temporal Data in
 Archival Records: Use Cases and Requirements *

    Oleksandra Bruns1,2 , Tabea Tietz1,2 , Mahsa Vafaie1,2 , Danilo Dessı́1,2 , and
                                 Harald Sack1,2
      1
          FIZ Karlsruhe – Leibniz Institute for Information Infrastructure, Germany
                         firstname.lastname@fiz-karlsruhe.de
              2
                Karlsruhe Institute of Technology, Institute AIFB, Germany



          Abstract. Archival records are essential sources of information for histo-
          rians and digital humanists to understand history. For modern information
          systems they are often analysed and integrated into Knowledge Graphs
          for better access, interoperability and re-use. However, due to restrictions
          of the representation of RDF predicates temporal data within archival
          records is a challenge to model. This position paper explains require-
          ments for modeling temporal data in archival records based on running
          research projects in which archival records are analysed and integrated
          in Knowledge Graphs for research and exploration.

          Keywords: Knowledge Graphs · Archives · Temporal Representation.


1     Introduction
The research, exploration, and analysis of historical events throughout di↵erent
time layers are fundamental for understanding their influence on today’s world
as well as for increasing the sense of cultural identity. Being primary sources of
information, archival records play an essential role in providing real evidence on
events. Therefore, a structured and formal representation of millions of existing
archival resources from di↵erent time periods is necessary to preserve, to explore
and to enrich the timeline of history, e.g., What event triggered a certain decision?
How did a city develop over a certain period of time? How did legal processes and
court proceedings change over time? What terminology was used several centuries
ago and how did it evolve? In order to provide novel means of exploration, a
promising approach would be the integration of heterogeneous archival records
from various points in history into semantically interlinked representations such as
Knowledge Graphs (KGs). For this purpose, the time varying data and knowledge
contained in such records must be represented in a semantically correct way and
furthermore reasoned and queried. Despite the high amount of research ideas for
integrating the concept of time into Semantic Web applications (cf. [15] for a
corresponding survey), the proposed approaches di↵er from each other in various
ways, and thus the choice of a certain modeling approach is highly dependent
on its respective context. For example, the approaches di↵er in the dimensions
of time they adapt: whether it is the valid time to represent the validity of a
______________
* Copyright 2021 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0
  International (CC BY 4.0).
        O. Bruns et al.

certain fact, concept or event [6,8,9,11], or the transaction time to record the time
of change in a KG [3]. They di↵er on the type of temporal data, whether only
temporal change of facts is considered [5,8,9], or also temporal existence of entities
or concepts [6,16] has to be denoted. Furthermore, the temporal statements in
data can be expressed di↵erently. An additional challenge lies in the nature of
historical records where the information about the validity of a fact or a concept
is uncertain or partly or fully missing, therefore the documentation (and thereby
representation) of a complete provenance is essential.
    The contribution of this position paper is the presentation of use cases which
utilize archival records for research purposes, the description of the nature of
temporal data in them, and the discussion of resulting requirements for KGs to
enable modeling and reasoning over temporal data in archival records.


2     Use Cases: Archival Data for Historical Research

In the following, three use cases are introduced. They form the basis of discussion
to identify the requirements for the representation of temporal data in archival
resources.


2.1   Use Case 1: Exploring Nuremberg in di↵erent Time Layers

Within the TRANSRAZ project, a KG based 3D virtual research environment
(VRE) is created to explore the city of Nuremberg in di↵erent time layers from
the middle ages to modern times. The VRE connects buildings with informa-
tion about historical places, events, residents and organizations extracted from
external resources [10,13]. Exemplary heterogeneous data sources relevant for
the discussion on temporal representations of archival data are presented in the
following.
Use Case 1.1: Historical Address Books. Nuremberg address books provide
a valuable resource to explore the city history. They have been published since
1792 and provide information on persons, their occupations, addresses as well as
companies. The address book data from di↵erent time layers is in the process of
being integrated into a KG to explore [1]. A major challenge of the integration
is the disambiguation and alignment of entities over a period of time, since all
information is dynamic. Persons change their names, addresses and occupations.
Streets are renamed, building numbers change and buildings are destroyed and
rebuilt. Occupations may either change their name or their function.
Use Case 1.2: The “Nürnberger Künstlerlexicon” (NKL). NKL [4] is a
collection of bibliographical articles about artists of Nuremberg based on archival
records ranging from the 12th to the 20th century. The articles provide both
personal information of artists such as occupations, birth and death places and
dates, family relations, places and periods of study, and information about their
artworks and their public life. The articles of NKL are based on administrative
records, and the text is saturated with varying temporal units to describe events.
             Towards a Representation of Temporal Data in Archival Records

A challenge in the process of integrating this information into a KG is the descrip-
tion of events and dates, which includes time instants (e.g. September 27th, 1763)
and time intervals (e.g., from 1763 to 1782), but also unknown time stamps (e.g.,
until his death), incomplete intervals (e.g. from 1756) and temporal uncertainty
(e.g. around 1830).
    Without a semantically correct representation of these dynamics in the
TRANSRAZ KG, cultural heritage researchers cannot be supported in tasks
involving time dependent data exploration, e.g. When did important artists settle
in the city, where did they live and when did they move to di↵erent city areas?
Was their movement triggered by significant events in history?


2.2     Use Case 2: Archivportal-D: Subject-related Points of Access

With the nationwide Archivportal-D, a platform is created to make institutional
information, indexing services and digitized archival holdings available to everyone
for scientific use through a central point of access. The first subject-related
information that can be accessed via Archivportal-D is holdings and data related
to the Weimar Republic3 . To collect historical records, store them long-term and
make them accessible for everyone, archival resources have to be categorized to
provide a structured and intuitive access for search and exploration. For this
purpose, new classification schemes have to be created for subject specific entries
depending on the topic of the archival records. Furthermore, whenever another
archive provides access to additional records within the same subject specific
platform, and whenever new records are digitized, a classification scheme has to
be accordingly adapted, i.e. it is dynamic. When creating an ontology that models
the classification of these topic based archival resources, these dynamics have to
be specifically addressed [7,14]. In the end, researchers can draw conclusions on
the document classification process, e.g. Which keywords were eliminated and
which documents are more controversially addressed and were re-classified the
most? Another challenge is the representation of the time periods the documents
describe. Also in this use case, vague time intervals and uncertainty have to
be represented within a KG as precise as possible from a historical perspective
and semantically correct for data interoperability, data exploration and semantic
reasoning. This allows to conclude on the timeline of events the records describe,
e.g. What event or action resulted from a certain political statement?


2.3     Use Case 3: Wiedergutmachung

“Pilotprojekt zur Wiedergutmachung”4 is issued by the German Federal Ministry
 of Finance. It is centred around archival data from the reparation process and
 reparation cases filed after World War II and the fall of the Nazi regime, in
 the state of Baden-Württemberg. Wiedergutmachung includes the reparation,
 restitution, and indemnification for victims of Nazi persecution [2]. The goal
   3
       https://www.archivportal-d.de/themenportale/weimarer-republik
   4
       https://www.fiz-karlsruhe.de/en/forschung/wiedergutmachung
        O. Bruns et al.

of this project is to integrate archival data into a KG enabling its enrichment
through connection to external resources. Every archival record set corresponds
to a court case which starts after the application is sent to the office of reparation.
The record sets are closed shortly after the court case is concluded and the
decision about the entitlement to and amount of reparation is made. The archival
metadata provides information on the “duration of file” for each record set.
However, the duration of the legal procedures at the court remains unknown. For
modeling interval relations between the court cases and the archival record sets
documenting them, along with interval relations between di↵erent court cases, a
data model that enables anonymous timestamps is required. Another challenge in
modeling the reparation cases arises from the fact that official documents usually
have more than one date associated with them. In the case of Wiedergutmachung
documents, date of issue in most cases di↵ers from date of processing by the
responsible state office. Thus, to reproduce the timeline of the court case it is
essential for such data to be extracted and represented in a KG for exploration.
Furthermore, some of the documents and forms changed over time from the late
1940s to the early 1960s. Such diachronic changes in the content of the resources
can be captured by a temporal data model and integrated into a KG.


3    Discussion of Requirements

This section presents a preliminary analysis of the historical records discussed in
 section 2. The following modeling requirements (REQ) were identified:
 REQ1: Modeling change – Relabeling and Concept Drift. Since archival
 records cover di↵erent periods of time, it is natural that mentioned concepts and
 entities change. Such changes can happen on the name level, when the meaning
 stays the same and only a new name for the concept or entity is provided. In
 address books from di↵erent years (use case 1.1) the same person may have
 di↵erent last names due to marriage, the street may be renamed due to historical
 events, the occupation names may be outdated and substituted with more modern
 terms, e.g., “mannequin” is an outdated term for “model”. In use case 2, labels
 of concepts in classification schemes are often adapted by archivists to provide a
 more suitable representation, e.g. the keyword “May Day” was later renamed into
“Labour Day”. In use case 3, both personal information such as person names and
 names of the state offices may di↵er throughout the time.
Alternatively, the meaning of some concepts evolves without changing the
 name [12]. A typical example of such changes is occupation names that oc-
 cur in use cases 1 and 3. Occupation functions may change over time, so their
 definition will vary across di↵erent time layers, even though the name stays
 the same. In use case 2 the change of meaning is observed whenever the clas-
 sification scheme is adapted. For example, the category “Road Traffic” would
 have a broader meaning with the addition of the subcategory “Bicycle” to the
 classification scheme.
 REQ2: Modeling time dependent information – Temporal Facts, Con-
 cepts, Entities. Temporal data in archival records varies and is associated with
            Towards a Representation of Temporal Data in Archival Records

both point-based and interval-based timestamps. In use case 1.1, only one type
of time units is provided – issue year of the book. Thus, every fact has to be
associated with a year interval. Use cases 1.2 and 3 contain both instant and
lasting facts, and have to be annotated with either timestamps or time intervals,
e.g., “Johann Ackermann married Elisabatha Schönberger on November 20, 1834”
and “Johann Ackermann studied medicine in Jena and Göttingen from 1668
until 1771”. Similar to facts, concepts and entities may be denoted with their
existence in time. For use cases 1 and 3 the existence of occupations has to be
annotated, e.g. the profession “knocker-up” existed only until the 1940s. For use
case 1.1 buildings have to be associated with their construction and destruction
dates. In use case 3, types of documents and forms changed overtime and may be
denoted with their period of use. In use case 2, every change in the classification
scheme is associated with the new version of the scheme. Hence, this version
entity is to be annotated with the time interval it was or is valid in.
REQ3: Modeling timestamps – Anonymous and Uncertain Time. Due
to their nature, historical records often contain incomplete temporal information
that must be modeled via anonymous timestamps (variables). For all use cases,
it is often the case when facts or entities lack information about their validity,
however cannot be considered static, e.g., address of a person. In some cases
temporal intervals are not complete, e.g. in a sentence “Reinhold Bach worked
in Nuremberg until 1923” the beginning of the interval is unknown. However,
modeling of anonymous time annotations is important for stating the interval
relations between facts. Finally, some archival records from all use cases do
not contain information on an original date of issue, but they contain subse-
quent annotations of approximate dates by historians, e.g. “the beginning of the
1950s”, “23.08.1908 or 24.08.1908”, “around 1872”. Such timestamps cannot be
considered anonymous and have to be normalized for reasoning.
     In summary, archival resources are heterogeneous and contain diverse temporal
information. The representation of time varying data in archival records is complex
and requires a highly expressive approach for the enrichment and exploration of
our past stored in archives.

4    Conclusion
In this position paper, use cases that utilize archival records for research purposes
are presented. Furthermore, requirements for modeling temporal data within
historical archival records are defined. In future work, more use cases will be
employed to provide a more complete and better structured list of requirements
to develop guidelines for proper temporal semantic annotation of archival records.
Based on the guidelines, a semantically correct and historically accurate represen-
tation of temporal data will be published to enable federated time-based queries
over di↵erent archives to provide researchers with a holistic view into our past.

Acknowledgement. This work is funded by the Leibniz Association under
project number SAW-2020-FIZ KA-4-Transraz and by the Deutsche Forschungs-
gemeinschaft with grant number 396707386.
        O. Bruns et al.

References
 1. Bruns, O., Tietz, T., Chaabane, M.B., Portz, M., Xiong, F., Sack, H.: The Nuremberg
    Address Knowledge Graph. In: 18th Extended Semantic Web Conference (ESWC),
    Poster and Demo Track. Springer (2021)
 2. Goschler, C.: The United States and Wiedergutmachung for Victims of Nazi
    Persecution: From Leadership to Disengagement. Holocaust and Shilumim: The
    Policy of Wiedergutmachung in the Early 1950s. Washington, DC: German Historical
    Institute (1991)
 3. Grandi, F.: Multi-temporal RDF Ontology Versioning. In: Proceedings of the 3rd
    International Workshop on Ontology Dynamics (IWOD-09). pp. 625–634 (2009)
 4. Grieb, M.H.: Nürnberger Künstlerlexikon: Bildende Künstler, Kunsthandwerker,
    Gelehrte, Sammler, Kulturscha↵ende und Mäzene vom 12. bis zur Mitte des 20.
    Jahrhunderts. Walter de Gruyter (2011)
 5. Hartig, O.: Foundations of RDF* and SPARQL*:(An alternative approach to
    statement-level metadata in RDF). In: AMW 2017 11th Alberto Mendelzon Inter-
    national Workshop on Foundations of Data Management and the Web, Montevideo,
    Uruguay, June 7-9, 2017. vol. 1912. Juan Reutter, Divesh Srivastava (2017)
 6. Ho↵art, J., Suchanek, F.M., Berberich, K., Weikum, G.: YAGO2: A Spatially and
    Temporally Enhanced Knowledge Base from Wikipedia. Artificial Intelligence 194,
    28–61 (2013)
 7. Hoppe, F., Tietz, T., Dessı, D., Meyer, N., Sprau, M., Alam, M., Sack, H.: The
    Challenges of German Archival Document Categorization on Insufficient Labeled
    Data. In: WHiSe 2020: Workshop on Humanities in the Semantic Web 2020-
    Proceedings of the Third Workshop on Humanities in the Semantic Web (WHiSe
    2020), co-located with 15th Extended Semantic Web Conference (ESWC 2020),
    Heraklion, Greece, June 2, 2020 (online). Ed.: A. Adamou (2020)
 8. Hurtado, C., Vaisman, A.: Reasoning with Temporal Constraints in RDF. In:
    International Workshop on Principles and Practice of Semantic Web Reasoning.
    pp. 164–178. Springer (2006)
 9. Nguyen, V., Bodenreider, O., Sheth, A.: Don’t like RDF reification? Making
    statements about statements using singleton property. In: Proceedings of the
    23rd international conference on World wide web. pp. 759–770 (2014)
10. Razum, M., Göller, S., Sack, H., Tietz, T., Vsesviatska, O., Weilandt, G.,
    Grellert, M., Scharm, T.: TOPORAZ: Ein digitales Raum-Zeit-Modell für ver-
    netzte Forschung am Beispiel Nürnberg. Information-Wissenschaft & Praxis 71(4),
    185–194 (2020)
11. Schueler, B., Sizov, S., Staab, S., Tran, D.T.: Querying for Meta Knowledge. In:
    Proceedings of the 17th international conference on World Wide Web. pp. 625–634
    (2008)
12. Tietz, T., Alam, M., Sack, H., van Erp, M.: Challenges of Knowledge Graph
    Evolution from an NLP Perspective. In: WHiSe 2020: Workshop on Humanities
    in the Semantic Web 2020-Proceedings of the Third Workshop on Humanities
    in the Semantic Web (WHiSe 2020), co-located with 15th Extended Semantic
    Web Conference (ESWC 2020), Heraklion, Greece, June 2, 2020 (online). Ed.: A.
    Adamou (2020)
13. Tietz, T., Bruns, O., Göller, S., Razum, M., Dessı̀, D., Sack, H.: Knowledge graph
    enabled Curation and Exploration of Nuremberg’s City Heritage. In: Proceedings of
    the Conference on Digital Curation Technologies (Qurator 2021): Berlin, Germany,
    February 8th to 12th, 2021. Ed.: A. Paschke (2021)
            Towards a Representation of Temporal Data in Archival Records

14. Vsesviatska, O., Tietz, T., Hoppe, F., Sprau, M., Meyer, N., Dessi, D., Sack, H.:
    ArDO: an Ontology to describe the Dynamics of Multimedia Archival Records.
    In: Proceedings of the 36th Annual ACM Symposium on Applied Computing. pp.
    1855–1863 (2021)
15. Wang, H.T., Tansel, A.U.: Temporal Extensions to RDF. Journal of Web Engineering
    18(1), 125–168 (2019)
16. Welty, C., Fikes, R., Makarios, S.: A Reusable Ontology for Fluents in OWL. In:
    FOIS. vol. 150, pp. 226–236 (2006)