Considerations about Uniqueness and Unalterability for the Encoding of Biographical Data in Ontologies Thierry Declerck1 and Rachele Sprugnoli2 1 DFKI GmbH, Multilingual Technologies Lab Stuhlsatzenhausweg 3, D-66123 Saarbrücken, Germany 2 Fondazione Bruno Kessler, Digital Humanities Group Via Sommarive, 18, I-38123 Povo, Italy declerck@dfki.de, sprugnoli@fbk.eu Abstract This paper results from observations that have been made while studying ontological and linked data-based approaches to the encoding of biographical data. Based on certain issues we discovered and which will be described here, we aim to call for a collaborative work towards guidelines for modelling biographical data in the standard Semantic Web representation languages. The need for guidelines became even more clear after reading an article, which described various types of errors in biographical data encoding that have been generated due to an unsuitable use of the owl:sameAs property when referring to the linked data-based description of the life of two literary authors. In this context, there is also a need to agree on the core element of which a biographical description constitutes. More specifically, we aim to determine the “biographical unit”, which should be primarily modelled and to which all related information should be linked by using corresponding semantic properties. Apart from that, we will also discuss the need of the definition and use of synchronic versus diachronic properties associated with the modelled biographical unit. Regarding this point, we come to the conclusion that for the description of a biographical unit, there are probably no properties whose values remain unaltered over time. This is particularly true if the provenance information, that can provide contrasting values which, however, might be correct from different point of views, is taken into account. Keywords: Ontologies, biographical units, linked data 1 Introduction the more reliable one, so that we have to encode biographi- The issue of encoding changes occurring in a life has re- cal information mentioning its provenance, especially in the ceived some attention in the context of the formal repre- cases where we do not have a unique value. In the end, we sentation of biographical data. In one study on this topic, only have the certainty that, biologically speaking, a person Krieger and Declerck (2015) present considerations on syn- was born only once, but that various birth dates can be as- chronicity and diachronicity and how those aspects can be sociated with this event, in dependency of perspectives and applied for defining properties in a formal ontology about provenance of the information.2 biographical data. Building further on this study, one can Our intuition is that a very carefully designed ontology can eventually come to the conclusion that it is very difficult, if offer support when dealing with a “biographical unit”. This not impossible, to come up with unalterable property values biographical unit might have no fixed characteristics, or that can be associated with an individual within a biograph- properties, but on the basis of the large set of possibly diver- ical description. This leads to the question whether there gent values of descriptors (classes and properties) and their are any properties of human beings which are in fact im- organisation in one ontological space, it can be considered mutable and which can therefore be used as the fixed pillar as one unique carrier of a life. This carrier should then be on whose base we can describe all other changeable aspects uniquely identified by a URI.3 and characteristics of human beings. In this paper, we concentrate thus on biographical data as At the actual state of our study, this seems not to be given. giving an account of a person’s life and achievements, not Let us take as an example the case of the soldier Manning, considering at this stage prosopography or what is some- introduced in Wikipedia as “Chelsea Elizabeth Manning times referred to as “collective biography” (Davies and (born Bradley Edward Manning, December 17, 1987)”.1 In Gannon, 2006). this case, the question arises whether this person remains In the next sections we will first report on existing onto- the same after the change of sex, while we want to stick to logical modellisation initiatives for biographical data, be- the assumption that only one entry for this biographical unit fore briefly describing the Linked Open Data cloud and should be kept. We can also be confronted with uncertainty when looking at 2 The same remark applies for sure to the death date of a per- the birth date of a person, as this is an information that can son, also biologically speaking, with the only difference that we still be modified or corrected in dependency of new data, can have biographies of living people, where a death date does not also depending on the sources consulted. In addition to that, need to be specified, until the passing away of a person is being it is sometimes not even possible to state which source is described in a biographical data set. 3 This could also be an IRI (Internationalized Resource Identi- 1 fier), but the URI stresses the “Uniqueness” of the resource iden- https://en.wikipedia.org/wiki/Chelsea_ Manning. tifier. 76 presenting the way biographical data is represented in this goal of the project was to design, develop and evaluate tools framework. This will be followed by a discussion of the that can improve the understanding of biographical texts by paper by Brown and Simpson (2013), which describes connecting life events to contextual information, including how erroneous biographical data can be generated in the their location, time of occurrence and related archival ma- Linked Data framework due to the inappropriate use of the terials (Buckland and Ramos, 2010). Different datasets and owl:sameAs property. Finally, we will present our ideas sources of information were taken into account during the on how to overcome those issues, also calling for a col- project: namely, the digital texts provided in the on-line laborative work in order to generate guidelines to describe Biographical Directory of the United States Congress,7 the biographical data within the Linked Open Data cloud. manually compiled chronology of Emma Goldman’s lec- ture itinerary,8 or the scanned page image of Irish texts.9 2 Overview of Existing Models for Bio CRM is a domain-specific extension of CIDOC CRM Biographical Data (Doerr, 2003): it provides a general model for represent- Several repositories of biographies are available in digital ing biographical datasets that can be extended to meet the format and specific schemata have been proposed to model requirements of specific projects (Tuominen, 2016). This life events to improve the analysis and understanding of ontology makes a clear distinction between unary roles of these repositories. actors, binary relations between actors and events in which BIO4 describes a person’s life seen as a series of inter- actors participate having different roles. Events are de- linked events. Its vocabulary, expressed in OWL, has scribed in terms of time, location, participants and other four core classes: Person, Event, Relationship resources involved; moreover, they are organised in an hi- and Interval. As for the Event class, BIO pro- erarchy distinguishing, for example, ecclesiastical from ed- poses a framework of 37 event types: some of these ucational events. Each event type has a corresponding class types apply to all people (e.g., Birth, Death), others of permitted roles. Bio CRM has been developed by the are more specific (e.g., Coronation, BarMitzvah). Semantic Computing Research Group of Aalto University Each event is characterised by four properties: Date, (Finland) within a set of experiments and projects focused Place, State (i.e., territory involved in an event), and on the linking, enrichment and visualisation of biographies Position (i.e, employment position or public office). with the aim of improving the reading experience of bi- Other properties are used to relate an event to an agent ographies by providing the users with a rich reading con- (e.g., Employer, Officiator) or to temporally order text. A first experiment, called National Semantic Biogra- an event with respect to another event (e.g. Following phy of Finland, takes the short biographies published in the Event, Preceding Event). An extension of BIO has Finnish National Biography10 as input data and works on been proposed within the Shoah Ontology, a domain on- a single type of event (i.e, achievements in the career of a tology that formally describes concepts and relationships person). An event extractor is used to identify snippets of characterizing the life and persecution of Jews in Italy be- texts containing words which express creation events, dates tween 1943 and 1945 (Brazzo and Mazzini, 2015). Here, written in numbers, named entities of type location and a the ontology class called Persecution is used to repre- reference to the name of the subject person of the biogra- sent all main events related to the persecution of the vic- phy. Extracted information is then transformed in RDF fol- tims (arrest, detention, deportation to a Nazi camp, trans- lowing the Bio CRM model and linked to several external fer to another camp, liberation, death in a massacre). This resources such as GeoNames and Wikipedia (Hyvönen et class is connected to the Person class that is based on BIO al., 2014). This approach has also been applied to the digi- extended with additional anagrafic/genealogical properties tised historical register of the Finnish high school “Norssi”, (e.g. niece_nephewOf). which includes information about the student lives of more The aim of the Biography Light Ontology is two- than 10,000 alumni (Hyvönen et al., 2017). fold (Ramos, 2009): i) encode life events following the Biography.owl is a lightweight ontology designed to rep- 4W model, thus answering questions about what, where, resent biographical facts (Krieger and Declerck, 2015): when, who; ii) improve the interoperability among ex- its main feature is the tri-partite structure which entities isting vocabularies such as LODE (Linking Open De- are modelled with. More specifically, the most general scriptions of Events)5 and BIO. Biography Light intro- class Entity has three subclasses, that is Abstract (de- duces the main class BioEvent with four subclasses that scribing concepts and roles), Object (describing physical represent changes in the health of the biography’s sub- things) and Happening. The latter includes both situa- ject, his/her relations with other people, changes in lo- tions and events, the first being static and atomic, the sec- cation such as migrations, and inventions or discoveries ond dynamic and decomposable. Happenings have prop- made by the subject. Event properties are borrowed from erties related to their starting and ending date, the agents LODE (e.g., atPlace) and from the Event Ontology (e.g., involved in them, and their location. Particular attention is isAgentIn). This ontology has been developed within devoted to pre- and post-conditions of a happening due to and adopted by the Bringing Lives to Light: Biography in properties encoding causes and effects. Context Project, an initiative of the Electronic Cultural At- 7 las Initiative (ECAI)6 at the University of California. The http://bioguide.congress.gov/ 8 See http://metadata.berkeley.edu/emma/ for a 4 prototype. http://vocab.org/bio/ 5 9 http://linkedevents.org/ontology/ For more details see http://ecai.org/neh2007/. 6 10 http://ecai.org/. https://kansallisbiografia.fi/english. 77 3 Discussing Existing Models biographical unit. For example, hasHealthStatus can In this Section we discuss in more detail the approaches vary very often over time: we can thus “reify” the statement towards synchronic versus diachronic properties proposed about the health status and encode it as a statement to be by Krieger and Declerck (2015) and by the Biography Light equipped with a time stamp in the object part of the result- Ontology, both briefly introduced in the preceding Section. ing new triple. Wikidata11 is using this method for marking Krieger and Declerck (2015) study how to classify relations the change of sex/gender by Bradley (later Chelsea) Man- (or properties) associated with classes of an ontology as be- ning,12 but also for example for marking the number of in- ing either synchronic or diachronic. The assumption behind habitants of a city (Hernández et al., 2015). In addition to this approach is that a date of birth is something that will that, the provenance of such information needs to be taken not change over time (“a person is born only once”), while into account and to be encoded properly, so that the user the profession exercised by a person can vary over time. can select between sources that seem to be more interesting While this study was mainly concerned with formalisation or more reliable. At this point, we can take advantage of aspects, one of the results was that it is in fact very diffi- the work described by Ockeloen et al. (2013).13 cult, if not impossible to describe a property that will have only an unalterable value. We can assume that, biologically 4 The Linked Open Data Cloud speaking, a person has indeed only one date of birth, but One of our goal of porting a model for biographical data the statements about this event can be multiple, depending into a Semantic Web compliant formal representation is to on the sources, or may be revised over time. be able to publish those data as a specialised subset of the Furthermore, interesting statements about “changes” in the Linked Open Data (LOD) cloud. Figure 1 shows the shape Biography Light Ontology could be found: “The Biogra- of this cloud, as of 2018-04-30.14 phy Light Ontology takes an event centric approach to the Looking at this cloud in more detail, the reader can see the encoding of biographic texts. It is a lightweight framework legend to the various colours used to mark the specialised for common biographic occurrences, such as changes in subsets of the Linked Data infrastructure: Cross Domain, the health of a biographic subject, relationships between Geography, Government, Life Sciences, Linguistics, etc. the subject and other people, social groups, or institutions, Biographical data is also present in LOD, but not (yet) in a migration or the change of location of a subject, and bio- specialised subspace. For example, there are a lot of bio- graphic events pertaining to creations, inventions, or dis- graphical data encoded in the DBpedia node, which is clas- coveries produced by the primary subject. The Biogra- sified as “Cross_domain”. phy Light model introduces the event type bl:BioEvent DBpedia15 started as an effort consisting in extracting struc- with four basic subclasses: bl:ChangeOfHealth, tured data from Wikipedia (mainly its “infoboxes”) and to bl:ChangeOfRelation, bl:ChangeOfLocation, encode this information in a Semantic Web compliant rep- and bl:Origination” (Ramos, 2009). In other words, resentation language. Nowadays, DBpedia is among the some properties of the Biography Light Ontology carry the largest nodes in the LOD cloud. DBpedia organises its data name “Change”. Using this ontology, we can extract event on the basis of an ontology that was first developed starting factoids modelled as instances of biographical event classes from the Wikipedia category system, which can be found in from a biography, as in the example below, adapted from the infoboxes and which evolved to a full ontology repre- (Ramos, 2009): senting a directed acyclic graph. This ontology contains Text: Robert George Collier Proctor (1868- 4,233,000 instances (resources), among which 1,450,000 1903), bibliographer, was born in Budleigh are about entities classified as “Person” and many about Salterton, Devon, on 13 May 1868. He was edu- other topics that are inherently related to the description of cated at a preparatory school in Reading and at a person, like places, organisations, work, etc.16 The full Marlborough College, before joining Bath Col- ontology is browsable17 and demonstrates that DBpedia is lege in 1881. making use of a large set of (ontological) properties that Event factoids: can be used to describe a biography. Looking at the full ontology, one can also see the details on the information • ChangeOfHealth: associated with a certain class, illustrating that, for exam- – birth, 1868-05-13, Budleigh Salterton, 11 Devon https://www.wikidata.org/ 12 • ChangeOfSocialRelation: At https://www.wikidata.org/wiki/Q298423 the entity “Q298423” is marked as being “male” until 22 August – studied at Marlborough College, before 2013 and “transgender female” starting from 22 August 2013. 1881 The change of given name of the “Q298423” entity is marked in – studied at Bath College in 1881 a similar way. 13 “Provenance” is also a W3C recommendation, see https: We can however assume that not only the properties identi- //www.w3.org/TR/2013/REC-prov-dm-20130430/. fied in the Biography Light Ontology can change, but all the 14 http://lod-cloud.net/. properties associated to a biographical entity are subjects to 15 See http://wiki.dbpedia.org/ for more details. changes. The proposal would thus be to equip all proper- 16 For more details see http://wiki.dbpedia.org/ ties with a time stamp (an instant or a duration). Techni- services-resources/ontology . cally, this can be done by either allowing n-tuples proper- 17 http://mappings.dbpedia.org/server/ ties (Krieger, 2014) or by “reifying” a statement about a ontology/classes/. 78 Figure 1: The shape of the Linked Open Data cloud, as of 2018−04−30. ple, 257 properties are introduced for the class Person.18 struct can cause problems and generate errors. An example The information about domain and range of such proper- of this issue is given by Brown and Simpson (2013), which ties is given and we can also recognise the type of each will be described briefly in the following section. property, being either a data-type or an object-type prop- erty. Looking at the data-type property birthDate,19 we can see that the class “Person” is defined as its domain and 5 Issues with “Michael Field” the xsd:date type as its range. This setting corresponds to our intuition that only a Person can have a birth date, but Brown and Simpson (2013) describe problems with bib- for example not a Group or even an Agent (a superclass liographical entries in the Linked Data context. More of Person in the DBpedia ontology). However, it is im- concretely, it involves Katharine Harris Bradley and her portant to note that the correct setting of domain and range niece Edith Emma Cooper. Both are authors of poetry and of properties is just ensuring the flow of information to be verse drama and formed a duo, for which they used the inherited by the sub-classes of the class bearing the proper- pseudonym “Michael Field”. The use of pseudonyms is ties, but it is not a restriction on the instances of the classes not seldom and has many reasons: in this specific case the that can be checked for avoiding inconsistencies. choice of a pseudonym could have been motivated by the DBpedia links its data to other knowledge sources using to fact that the authors had an intimate relationship. “Hid- this end OWL constructs such as owl:sameAs. This con- ing” themselves behind a pseudonym with a masculine name might have been a strategy to avoid social repro- 18 http://mappings.dbpedia.org/server/ bation. In some knowledge sources the relation between ontology/classes/Person. each of the literary author and the pseudonym “Michael 19 Field” is stated in such a way that the pseudonym is inher- http://mappings.dbpedia.org/index.php/ OntologyProperty:BirthDate. iting a birth/death date, and in the end even two birth/death 79 dates,20 and meaning at the same time that each author is The core element of a biographical unit in such an ontol- being associated with two birth/death dates, when the rela- ogy being a URI, we strongly discourage the use of the tion to the pseudonym is defined as a symmetric one. In owl:sameAs property for linking to this unit. The very this case the un-reflected use of the owl:sameAs prop- negative results of applying such a property to the descrip- erty between one person and the pseudonym is enough for tion of entries in a biography in a linked data environment generating the wrong data, and associating the properties have been precisely and accurately documented in Brown birthDate and deathDate to the pseudonym. Never- and Simpson (2013), as we reported in Section 5. theless, defining a restriction of the ontology, stating that only instances of the class Person can bear the properties Modelling the “Michael Field” data birthDate and deathDate would suffice for avoiding We started our modelling experiment by encoding the bi- this kind of problem. ographical data described in Brown and Simpson (Brown Some data sets in the LOD, such as DBpedia, introduce and Simpson, 2013) in order to investigate how we could “Michael Field” as an author.21 All the explanation texts in avoid the problems described in that paper. In particular, the DBpedia page for “Michael Field” specify that the entry we developed OWL/RDF(s) code taking as a starting point is about a pseudonym but at the formal ontological level it the Biographical Ontology (Krieger and Declerck, 2015). is introduced as a Person, which is wrong, as it should be Figure 2 depicts the basic class hierarchy we are using for an instance of a class Pseudonym. The same error can be modelling the “Michael Field” data also used by Brown observed in the Yago data set (Suchanek et al., 2007). In the and Simpson (2013). In this figure the small number of Yago data, we can even see that the name of the pseudonym instances we have included are indicated in parentheses, is segmented in a Given Name and a Family Name and that which are basically the people named in (Brown and Simp- the pseudonym is bearing a gender property, with value son, 2013). “female”.22 We do think that this kind of information is not appropriate for a pseudonym. The modelling in Wikidata seems to be more accurate, as it introduces “Edith Emma Cooper” as an instance of the class human and establishes a part_of relation to the “Michael Field” instance of the class collective pseudonym.23 We also noticed that DBpedia is making use of its property dbo:wikiPageRedirects in order to get to the page http://dbpedia.org/page/Michael_Field\ _(author) when querying for “Edith_Emma_Cooper”. While the property dbo:wikiPageRedirects is an extremely useful feature helping to normalize vari- ants in names and then pointing them to the right DBpedia page, it is rather cumbersome in the case of “Edith_Emma_Cooper”, as it would be better to land on the page describing her and not on a page that deals with the pseudonym she is sharing with another author. 6 On the “Biographical Unit” As stated above, we did not find a property that can be con- sidered as having a stable value in order to characterise a Figure 2: Overview of the Class Hierarchy used for mod- core element of an entry in a biographical dataset. By now, elling the “Michael Field” data. our intuition is that we just have to declare a class Person, being a “life carrier” and having a temporal span, to which The code in Listing 1 displays the way we apply a re- all kind of relevant biographical properties can be assigned. striction to the class bio:Person, where we state that Instances of this class are uniquely addressed by URI. This at least one date of birth has to be given, while we also state results in a highly abstract model. that the property bio:dateOfDeath is defined for this 20 class.24 The associated properties bio:dateOfBirth The birth and death dates of Katharine Harris Bradley and and bio:dateOfDeath not listed here are defined for her niece Edith Emma Cooper are 27 October 1846/26 September domain bio:Person and range xsd:date, similar to 1914 and 12 January 1862/13 December 1913 respectively. 21 the related properties in DBpedia or Wikidata. http://dbpedia.org/page/Michael_Field_ (author). 22 https://gate.d5.mpi-inf.mpg.de/ 24 webyago3spotlxComp/SvgBrowser/. The definition of this class will for sure be updated to include 23 See https://www.wikidata.org/wiki/ information about provenance. We will also add a constraint stat- Q3719235 and https://www.wikidata.org/wiki/ ing that within a time period, to be counted from the birth date, a Q839369. death date has to be given. 80 Listing 1: The class bio:Person bio:hasFirstName "Katherine Harris" ; bio:Person bio:hasLastName "Bradley" ; rdf:type owl:Class ; bio:hasLover bio:Woman_3 ; rdfs:subClassOf bio:Agent ; bio:hasSister bio:Woman_2 ; rdfs:subClassOf [ bio:isMemberOf bio:MichaelField ; rdf:type owl:Restriction ; . owl:minCardinality "1"^^xsd:date ; owl:onProperty ; bio:Woman_3 ] ; rdf:type bio:Woman ; rdfs:subClassOf [ bio:dateOfBirth "1862-01-12"^^xsd:date ; rdf:type owl:Restriction ; bio:dateOfDeath "1913-12-13"^^xsd:date ; owl:onProperty bio:hasFather bio:Man_1 ; ; bio:hasFirstName "Edith Emma" ; ] ; bio:hasLastName "Cooper" ; owl:disjointWith bio:State ; bio:hasLover bio:Woman_1 ; . bio:hasMother bio:Woman_2 ; bio:isMemberOf bio:MichaelField ; The code in Listing 2 introduces “MichaelField” as an in- . stance of the Class bio:ArtisticGroup. This group consists of two instances of the class Person, which are With this draft encoding our aim was to show how to avoid described in the listings 4 and 5 below. It is important to the issues described by Brown and Simpson (2013) who note that it is the specific group, which is associated with stress the need to have both a generic ontological frame- the pseudonym bio:Pseudonym_1 (“Michael Field”). work for describing entities, but also a very specific encod- None of the authors alone should be associated with the ing scheme for accurately modelling all aspects and sub- pseudonym, as it was the case in certain data sets in the tleties of biographical data. LOD cloud.25 7 Towards a Sub-cloud of the LOD Listing 2: An instance of bio:ArtisticGroup Dedicated to Biographical Data bio:MichaelField rdf:type bio:ArtisticGroup ; Based on the observations we could make on the diverse bio:hasActivity bio:Writer ; efforts to encode biographies in a Semantic Web compliant bio:hasMember bio:Woman_1 ; format, which have been described in Section 2, Section 3 bio:hasMember bio:Woman_3 ; and Section 4, we see the need for reaching a wide con- bio:hasPseudonym bio:Pseudonym_1 ; sensus on this ontological design, exploring and possibly rdfs:label "\"Michael Field\""@en ; reusing existing biography vocabularies and ontologies. . In order to achieve this aim, we can build on the “Shared The code in Listing 3 introduces “Michael Field” as an in- Data Model” initiative (Fokkens and ter Braake, 2018),26 stance of the class bio:CollectivePseudonym. which was put in place at the DH Biographical Data Work- shop held at the Digital Humanities 2016 conference.27 We Listing 3: An instance of bio:CollectivePseudonym expect that generally accepted guidelines for the ontologi- bio:Pseudonym_1 cal encoding of biographical data can be derived from this rdf:type bio:CollectivePseudonym ; moderated collection of data models. bio:hasActivity bio:Writer ; bio:hasName "Michael Field" ; In addition, we are advocating for a collaborative effort . dedicated to establish a specialised sub-cloud of the LOD framework dedicated to data sets containing biographical The code in Listing 4 and Listing 5 below concerns the two data. In this way redundancies and inconsistencies in the authors involved in both the artistic duo with the associated modelling of biographical data could be avoided and the pseudonym, but also related to each other by both a familiar modelling of such data could also get a more salient posi- and an intimate relation. tion and an improved visibility in the LOD. Listing 4: An instance of bio:Person This community group could be organised in a similar man- bio:Woman_1 ner to the W3C Community Group for the representation rdf:type bio:Woman ; of language data in relation to ontologies and to the OKFN bio:dateOfBirth "1846-10-27"^^xsd:date ; bio:dateOfDeath "1914-09-26"^^xsd:date ; 26 This is a moderated collaborative effort for sharing data bio:hasActivity bio:Writer ; models in the field of biography, resulting in a “Repository for Biographical Data Models” (Fokkens and ter Braake, 2018), which can be accessed at https://github.com/cltl/ 25 It is also to be mentioned that each member of this group also BiographicalDataModels. 27 had an own pseudonym, which we do not display here, for reason http://www.biographynet.nl/ of space. dh-biographical-data-workshop/. 81 Working Group on Linguistics28 for building a domain spe- Bronwyn Davies and Susanne Gannon. 2006. Doing col- cific subset of the Linked Data cloud, in this case the LLOD lective biography: Investigating the production of sub- cloud.29 jectivity. McGraw-Hill Education (UK). Martin Doerr. 2003. The cidoc conceptual reference mod- 8 Conclusions ule: an ontological approach to semantic interoperability of metadata. AI magazine, 24(3):75. Based on our study of existing ontological models for bi- ographical data, we came to the conclusion that it seems Antske Fokkens and Serge ter Braake. 2018. Connecting impossible to find one property of a human being that can people across borders: a repository for biographical data remain stable within its lifespan. This has consequences models. In Proceedings of the 2nd conference on Biogra- on the modelling work, as we need to precisely define what phies in a Digital World. constitutes the uniqueness of an entry in a biographical data Daniel Hernández, Aidan Hogan, and Markus Krötzsch. set. We advocate for a solution, which consists in intro- 2015. Reifying RDF: what works well with Wikidata? ducing a URI for each entry, which needs to be equipped In Proceedings of the 11th International Workshop on fundamentally with two properties describing the dates of Scalable Semantic Web Knowledge Base Systems (SSWS birth and of death. All values to be given to those (and other 2015), volume 1457 of CEUR Workshop Proceedings. related) properties are mutable and can also vary in depen- CEUR-WS.org. dency of the provenance information, that also needs to be Eero Hyvönen, Miika Alonen, Esko Ikkala, and Eetu encoded in the biographical data set. Mäkelä. 2014. Life stories as event-based linked data: Furthermore, we came across reports that detail errors in case semantic national biography. In Proceedings of the the encoding of biographical data in the Linked Data cloud 2014 International Conference on Posters & Demonstra- and which were generated by the inappropriate use of onto- tions Track-Volume 1272, pages 1–4. CEUR-WS. org. logical properties and vocabularies. This situation calls for Eero Hyvönen, Petri Leskinen, Erkki Heino, Jouni Tuomi- the building of more collaborative work in the field of on- nen, and Laura Sirola. 2017. Reassembling and en- tological modelling of biographical data and possibly also riching the life stories in printed biographical registers: for a W3C Community Group dedicated to the creation of Norssi high school alumni on the semantic web. In Inter- a biography specific sub-cloud in the LOD framework. national Conference on Language, Data and Knowledge, pages 113–119. Springer. Acknowledgement Hans-Ulrich Krieger and Thierry Declerck. 2015. An owl ontology for biographical knowledge. representing The DFKI contribution to this paper was partly sup- time-dependent factual knowledge. In Proceedings of ported by the H2020 project QT21 with agreement num- the First Conference on Biographical Data in a Digital ber 645452. We thank the anonymous reviewers of the first World 2015. CEURS-WS.org, 7. Online-Proceedings: version of this paper for their very helpful comments. Our http://ceur-ws.org/Vol-1399/. thanks go also to Eileen Schnur for proofreading and im- Hans-Ulrich Krieger. 2014. A detailed comparison of proving our text. seven approaches for the annotation of time-dependent The paper is dedicated to the memory of Hans-Ulrich factual knowledge in rdf and owl. In Proceedings of the Krieger who unfortunately passed away in June 2017. He 10th Joint ACL-ISO Workshop on Interoperable Seman- was the initiator of our efforts in this field and published the tic Annotation (held in conjunction with LREC 2014). first version of the DFKI biography ontology. European Language Resources Association. Niels Ockeloen, Antske Fokkens, Serge Ter Braake, Piek 9 References Vossen, Victor De Boer, Guus Schreiber, and Susan Laura Brazzo and Silvia Mazzini. 2015. From the Holo- Legêne. 2013. Biographynet: Managing provenance at caust Victims Names to the Description of the Persecu- multiple levels and from different perspectives. In Pro- tion of the European Jews in Nazi Years: the Linked Data ceedings of the 3rd International Conference on Linked Approach and a New Domain Ontology. In Book of ab- Science - Volume 1116, LISC’13, pages 59–71, Aachen, stract of DH 2015. Germany, Germany. CEUR-WS.org. S. Brown and J. Simpson. 2013. The curious identity of Michele R. Ramos. 2009. Biography Light Ontology: An michael field and its implications for humanities research Open Vocabulary For Encoding Biographic Texts. Tech- with the semantic web. In 2013 IEEE International Con- nical report, Bringing Lives to Light: Biography in Con- ference on Big Data, pages 77–85, Oct. text Project. Michael Buckland and Michele Renee Ramos. 2010. Fabian M Suchanek, Gjergji Kasneci, and Gerhard Events as a structuring device in biographical mark-up Weikum. 2007. Yago: a core of semantic knowledge. and metadata. Bulletin of the Association for Informa- In Proceedings of the 16th international conference on tion Science and Technology, 36(2):26–29. World Wide Web, pages 697–706. ACM. Jouni Tuominen. 2016. Bio CRM: A Data Model for Rep- 28 See https://www.w3.org/2016/05/ontolex/ resenting Biographical Information for Prosopography. and https://blog.okfn.org/category/ Version 2016-08-19. Technical report, Bringing Lives to working-groups/wg-linguistics/. Light: Biography in Context Project. 29 http://linguistic-lod.org/llod-cloud for details on the Linguistic Linked Open Data cloud. 82