A Prosopographical Information System (APIS) Matthias Schlögl, Katalin Lejtovicz Austrian Centre for Digital Humanities Sonnenfelsgasse 19, 1010 Vienna, Austria matthias.schloegl@oeaw.ac.at, katalin.lejtovicz@oeaw.ac.at Abstract During recent years massive amount of biographical datasets have been digitized and - at least some of them - made available open access. However, an easy to use system that allows non-experts to work with the data is still missing. The APIS system, designed within the framework of the APIS project at the Austrian Academy of Sciences, is a web-based, highly customizeable virtual research environment that allows researchers to work alongside programs designed for processing natural language texts, so called Natural Language Processing pipelines. Keywords: biographical data, virtual research environment, natural language processing 1 Introduction published under a open-source license (MIT) on GitHub: During recent years massive amounts of biographical https://github.com/acdh-oeaw/apis. datasets have been digitized and - at least some of them 2 APIS virtual research environment - made available open access (Reinert et al., 2015; Fokkens et al., 2014). Additionally, collaborative efforts such as The approaches for extracting structured information from Wikipedia/Wikidata1 have created even more partly struc- biographical data sets have been brought forward by a rela- tured prosopographical and biographical datasets (Gergaud tively small scholarly community using locally runned, tai- et al., 2016). Reference resources such as Gemeinsame lor made systems that almost never have a user interface. Normdatei2 and the Virtual Internationa Authority File Compared to the conventional methods that researchers ap- (VIAF)3 have also been utilized for prosopographical re- ply when evaluating textual data (e.g. taking notes in a search (Andert et al., 2014). Since the first endeavours, Word Document, filling out an Excel sheet manually), APIS researchers have worked on tools that allow for extracting allows for a semi-automatic exploration of the information structured data of these biographical texts. Various Natural in a large scale data set. It enables researchers to find an- Language Processing (NLP) techniques have been used for swers to their research questions more easily and much these objectives (local grammars, regular expressions, ma- faster than with conventional methods. chine learning and deep learning based approaches etc.). APIS is a web-based, highly customizeable VRE that However, the goal of the researchers was not limited to allows traditional researchers to work alongside NLP transforming full-text data into structured data, but also pipelines. This hybrid approach (the possibility to manu- included the interpretation of textual resources by apply- ally annotate texts and edit entities/relations alongside au- ing statistical and network research methodologies. In this tomatic systems) allows researchers to ”use the best of both sense computer linguistic processing, statistical analysis worlds”, and computer scientists to improve the tools di- and network visualization of biographies has been started rectly on real world data. The web application not only at ÖBL - the Austrian Biographical Dictionary - in the con- helps researchers to systematically and semi-automatically text of the APIS project. The results of the various anal- process large amounts of data, but also to analyze and vi- ysis methods are later evaluated and interpreted by schol- sualize connections between entities detected in the docu- arly researchers. In this paper we describe the Virtual Re- ments. Visualization of the data allows the researchers to search Environment (VRE) (Schlögl and Andorfer, 2018) - get an overall picture of the entities and relations encoded from now on referred to as APIS - that has been developed in the documents, that otherwise would be hard to access. during the project and Natural Language Processing (NLP) APIS provides the users an easy and intuitive workflow to techniques we use for (semi)automatically structuring the process large amounts of data. data. The APIS VRE is a Django based web application It therefore tackles two main problems and will make the work with biographical data easier for historians as well as 1 or resources such as Freebase that have been included in these data scientists: endeavours. • It allows historians to annotate biographies with ex- 2 An authority file for persons, events, locations, works, institu- actly that information they need for their research, tions operated cooperatively by the German National Library, the easily link the annotations to the Linked Open Data German Union Catalogue of Serials and other institutions. The cloud4 , and export it for further research. GND has recognized these developments and will open the sys- tem to actors outside traditional libraries. http://www.dnb.de/EN/ 4 Standardisierung/GND/gnd node.html (Kett, 2017) LOD - data that is being published so that it can easily be 3 interlinked with other datasets, which allows for more refined, de- An international authority file compiled by national libraries. https://viaf.org/. tailed queries of the content. 53 • It allows data scientists to easily access annotated data places, persons and institutions, institutions and works, per- via APIs, use it for (re)training models, store new an- sons and persons and so on. All entities share a set of basic notations to the system and use the built-in evaluation attributes (name, start-, end-dates etc.) and some have addi- system for retrieving precision, recall, F1 and other tional ones (e.g. place has longitudes and latitudes). Every metrices. entity can be related to several URIs (if they do not share the same top-level domain) and grouped in so-called collec- 2.1 General Idea tions. Relations on the other hand have a fix set of attributes The design of APIS meets three basic criteria, based on ex- (start-, end-date, kind, notes, references).6 Every entity can perience from previous projects: have as much full-texts as needed. These full-texts in return can have offset annotations grouped in so-called annotation • a simple datamodel that can be serialized to other for- projects and - if useful - linked to other entities or relations7 mats and datamodels later on in the database. All entities and relations are typed with Simple Knowledge Organisation System (SKOS) vocabu- • use of a solid and widely used software stack to keep laries (SKOS defines standards for working with knowl- the development and maintenance effort as low as pos- edge systems such as thesauri, taxonomies and classifica- sible tion schemes). Additionally the system features a very fine • a hybrid approach that allows researchers as well as grained user permission system, that allows to set permis- automatic tools/pipelines to work in parallel on the sions on collection basis. same dataset. 2.3 The web frontend While this design has some advantages, it brings some The APIS web frontend allows to search the data, work on downsides along. Most commonly used high level ontolo- it and analyze it. The list views can be used to search the gies, such as CIDOC CRM (a structure designed to describe data8 , sort and export it and to access the edit views. Figure concepts and relationships used in the cultural heritage do- main) are based on an event driven datamodel. Our internal datamodel (discussed in more detail below) is simpler and easier to use, but needs to be mapped to event based mod- els later on. Similarly, the use of well proven technologies such as Django and SQL databases brings some obvious advantages in the development of the web application, but in a world of Linked Open Data at some point we will need to serialize our data into RDF-triples5 and publish it to in- clude it in the Linked Open Data cloud. However, during the project our design decisions have proven to be success- ful. Due to the simple datamodel and the easy and fast de- velopment of the web application we were able to (manu- ally) annotate much more data than we anticipated. 2.2 Datamodel The APIS datamodel is a hybrid between an event-based and a relation-based model. Figure 1 shows a simplified Figure 2: APIS edit view of person 2 shows the edit view of a person. The view consists of two panes, in the left pane one can work on the entities meta- data, in the right the entity can be related to other entities. The forms feature wherever possible/useful autocompletes Figure 1: APIS datamodel (simplified version) 6 Relations are a kind of mini-event: The relation can be only connected to two entities and has a limited set of attributes, but version of the APIS datamodel. It consists of 5 entities nonetheless the relation of two entities has some additional data (person, place, institution, event and work) that are all in- attached. We therefore call our model a hybrid between relation- terrelated. Relations can be added between persons and based and event-based. 7 Entities and/or relations that are annotated in the full-text can 5 RDF is a framework for representing information in the Web. be automatically added to the database. 8 In RDF statements about resources are expressed in the form of The search fields and functions can be defined in the main subjectpredicateobject, known as triples. settings file of the application. 54 that make the editing process more convenient and less er- 2.5 Inter annotator agreement ror prone for the researcher. In section 3 we will elaborate on the Natural Language Pro- cessing (NLP) techniques we used to (semi)automatically 2.4 Full-text annotation enrich the ÖBL biographies. One of the prerequisites of APIS also allows for annotation of biographical full texts. automatic text processing is a gold standard of annotations Instead of just adding a relation between two entities to the and a high inter-annotator agreement.11 Getting towards a database, this relation can be annotated directly in the text. gold standard and a high agreement among the annotators When highlighting a part of the text a context menu opens is a time consuming and tedious process. We try to fos- that allows to select the relation type.9 After selecting the ter this process by visualizing overlapping annotations in relation type (e.g. Person-Place) another form is loaded that the frontend and providing readymade metrices to compute allows for selecting the related entity (e.g. Vienna) and the the agreement over large collections of texts and/or annota- kind of relation (e.g. ’educated in’). We already explained tors.12 2.6 Versioning One important aspect of (historic) research is provenance. Ideally every step in the data generation and data analy- sis process is logged and reproduceable. To allow for full provenance information in the APIS process, we imple- mented a system that serializes every edit of a data point and adds a timestamp and a user-ID to the serialization. The revision can be accessed in the GUI and used for recreating any former state of the database. We are currently work- ing on building a Rest API endpoint for providing machine readable access to this versioning system. 2.7 Visualization The APIS system also includes a rudimentary visualiza- tion module. Several projects have shown that social net- Figure 3: Image of overlapping annotations work analysis (SNA) is a very useful visualization and anal- ysis method (Armitage, 2016; Warren et al., 2016) The that annotations in APIS are stored as offsets and related to the user and something we call annotation project. This al- lows to view the biography from different angles. A simple form allows to filter for the annotations one wants to look at (annotation project, user, type of annotation). Addition- ally, the visualization allows for overlapping annotations. As figure 3 shows when clicking on overlapping annota- tions - visualized with yellow background color - a context window opens and shows a copy of the text snipped for ev- ery existing overlapping annotation. 2.4.1 Automatic import of LOD entities The APIS webapplication allows the use of external re- sources - such as Linked Open Data resources - in the auto- Figure 4: Network visualization complete search. Whenever a researcher searches for an en- tity in the autocomplete, not only local entries are searched, but also external resources integrated into the APIS sys- APIS network visualization allows for iterative creation tem.10 When a researcher selects an entity that is not yet of networks by specifying the source node13 , the relation present in the database the system retrieves the original en- type and/or kind and/or the target node. The form sup- tity and parses it into the database. The parser can be de- ports the researcher in creating the network with autocom- fined in an instance wide settings file. pletes that show existing entries in the database. Nodes 9 11 The context menu is defined in a system wide settings file Most of the time the latter is needed to produce the former. 12 accessible via the admin backend. As described above the APIS application does not distinguish 10 We use a local Apache Stanbol instance for fast access between human researchers and automatic tools. Tools communi- to Geonames and GND, but have also implemented bridges to cate with the database via a Rest API, researchers via the GUI, SPARQL (the query language for RDF data) endpoints for less both have an user account that allows APIS to version the edits. 13 frequently used sources. It is also possible to select whole collections of nodes. 55 can be extended14 by accessing the context menu of the into a semantic reference resource, which is later used for nodes. Figure 4 shows a network that was created by the semantic enrichment of the documents. To perform adding person-place relations with the target node set to the semantic annotation, we produce so-called Referenced ’München’, ’Berlin’ and ’Graz’. After creating the network Sites from the data available in RDF/XML format (i.e. from it can be downloaded either as JSON15 or graphml.16 The GeoNames, GND). In the Referenced Sites the indexed data downloaded file includes all the attributes - such as longi- is stored in a Solr 21 index. tudes and latitudes for places - that exist in the database. The APIS project also cooperates with external partners 3.1.1 Abbreviations to explore the potential of other more experimental visu- The information extraction process created in APIS con- alization methods. One of these methods is the space-time- sists of two steps. First, we resolve abbreviations of person cube developed by colleagues from the University of Krems names, institution names, academic titles, place names, and (Windhager et al., 2017).17 common verbs. We developed two versions to resolve ab- breviations, a Java program based on regular expressions 3 Information extraction and a Python based script that uses regular expressions, a dictionary of German words and a large German-language One of the goals of the APIS project is to offer automated corpus (AMC) (Ďurčo et al., 2014) to resolve ambiguous text processing to facilitate the work of researchers. The abbreviations and choose the correct variant. The program processing and interpretation of the texts were carried out queries the abbreviation and its context in the AMC corpus, using computer linguistic methods, which include identi- and the resolution with the most hits is chosen. fication of entities (individuals, places, institutions, etc.), automatically linking them to Linked Open Data Cloud re- 3.1.2 Creating an index sources, and disambiguating and manually curating the re- The second step in the semantic annotation process is to sults. In the following section we will outline the above create Solr indices from ontologies. During Entity Linking described steps in more detail. Apache Stanbol searches the entities (persons, places, insti- tution names, etc.) in the indexed ontologies. In the APIS 3.1 Entity Linking project we created indexes from GeoNames and GND to Although biographies are available in XML format, these link the place names, personal names and institution names do not contain all relevant information about a person’s in the text to the Linked Open Data Cloud. The indexes life in structured format, except for some key events such were created as follows: we downloaded the RDF/XML as birth and death. One of the main goals of the project dumps of the aforementioned resources, which were cut is to reveal information encoded in natural language text into smaller files in order to get manageable sized data, and (e.g. names of persons, places, institutions, events, etc.) to make it easy to create separate indexes for the different and to automatically detect relationships between them entity types. After this we created the Apache Solr indexes and the person depicted in the biography. In order to from the above mentioned files using Apache Stanbols Java tackle this problem efficiently, we combined automated package for indexing. and manual information retrieval techniques. The infor- mation extraction in APIS consists of three main steps: 3.1.3 The NLP pipeline Named Entity Recognition, Entity Linking, and Disam- After creating and installing the Solr index the Entity Link- biguation/Curation. For the automatic information extrac- ing component is configured. Stanbol allows various con- tion we use the open source software Apache Stanbol18 , figuration options to achieve an accurate and efficient Entity which detects entities in natural language texts and con- Linking process. For example, one can narrow down the nects them to ontologies and knowledge databases such as search to proper nouns only. In this case the NLP algorithm the GND, GeoNames19 , or DBpedia20 . The connections of Stanbol identifies proper nouns and queries only them in that are created between entities and biographies not only the Solr index, this yields more accurate Entity Linking and allow for the enrichment of the biographies with semantic a better runtime. Another configuration option is to use the information, but also for the automatic correction of miss- types of entities in the matching process. If this setting is ing or erroneous data. The advantage of using Apache Stan- turned on and the index contains information regarding the bol for Entity Identification and Linking is that it provides a type of the entities, the user gets the results categorized into straightforward mechanism how entities are identified and different types such as ”Person”, ”Location”, ”Event”, etc. how any ontology in RDF/XML format can be converted (depending on what types are available in the index). Following the configuration of the Entity Linking compo- 14 By ’extending’ we mean adding all relations for the node to nent, the Natural Language Processing component is con- the visualization. structed, which defines what NLP steps have to be carried 15 a format that allows for easy data interchange between appli- out. In APIS we use the Apache OpenNLP22 open source cations - see: https://www.json.org/ software for the computer linguistic analysis of the biogra- 16 Graphml is a XML-based format for storing graphs. See http: phies. Our pipeline consists of the following steps: De- //graphml.graphdrawing.org/ for details. termine the language of the input text. (langdetect) Divide 17 Please also see Windhager et al in this proceedings for details. 18 https://stanbol.apache.org/index.html, last accessed: 21 Solr is an open source search platform, which allows for full- 26.02.2018 text search, faceted search, hit highlighting amongst other fea- 19 http://www.geonames.org/, last accessed: 26.02.2018 tures. 20 http://wiki.dbpedia.org/, last accessed: 26.02.2018 22 https://opennlp.apache.org/, last accessed: 27.02.2018 56 the text into sentences (opennlp-sentence). Tokenize the The second solution we tested was IEPY (Information Ex- sentences (opennlp-token). Determine the Part of Speech traction in Python)25 , an open source software implemented tag of the words (opennlp-pos). Search for noun phrases in Python which realizes relation extraction. IEPY per- (opennlp-chunker). Perform Entity Linking. (Custom Ref- forms machine learning based relationship recognition. On erenced Site) the web interface of the application, the user annotates oc- In the last step, the nouns and noun phrases are compared currences of predefined relationships (e.g. ’traveled some- with the Solr index (Entity Linking). If a term matches where’, ’married somebody’, etc.) from which the software an entry in the index, the entry from the Solr index is re- learns a model, that can be used to identify relations in doc- turned by the application in the requested output format uments that have not been seen before by the system. In (e.g. JSON, RDF/XML, Turtle, N3, JSON-LD). If there are case of the ÖBL, IEPY has not proven to be a suitable soft- multiple results, a score between 0 and 1 indicates which is ware, because it requires the selection of both members of the most likely result. The advantage of the Apache Stan- a relationship (eg. in case of ’learned somewhere’ both the bol Entity Linking software is, that it can effectively index person and the place). However in ÖBL, to avoid the repeti- any ontology available in RDF/XML format, and allows the tion of the person, the biography was written about, his/her user to select the data resource for semantic annotation. name is usually only mentioned once, at the beginning of the biography. 3.2 Relation Extraction The third approach we have examined is the recognition Entity Linking is the first step in automatically interpret- of the tree structure obtained from the syntactic parsing of ing the meaning of a natural language document. Through the sentences with Deep Learning. We use a standard NLP Entity Linking strings in the documents can be replaced by pipeline26 to process the text. When the module finds a URIs (Uniform Resource Identifiers). The concepts in the named entity it climbs up the parse tree and extracts pre- LOD resources are not only clearly identifiable and refer- defined classes - in the sense of POS tags - of words (e.g. enceable by their URIs, but they can also be shared between verbs). The extracted list of words is converted into a vector applications, unstructured texts can be enriched with infor- which is used for classification. This method makes use of mation attached to them or inconsistencies in the data can the inherent advantages a biography brings along: in many be detected and corrected. cases a biography talks about the portrayed person, there- The second step is to determine the relationships and the fore we skipped the search for the subject and just assumed types of the relationships that hold between the entities, that the portrayed person is the subject. First tests with a also known as automatic Relation Extraction. During Re- model trained on roughly 4000 and evaluated on 1000 ex- lation Extraction the NLP module looks for semantic re- amples of person-place relations shows the potential of the lationships such as ’parent-child’, ’traveled to a place’, method27 , but also the problems automatic tools have with ’learned somewhere’, ’participated in an event’ between the very specific language in the ÖBL. people, places, and events detected in the text. We have The training data set was annotated during a small research tried three different methods for the automatic relationship project dealing with members of the ’Künstlerhaus’.28 recognition, which will be tested and the best solution will Given the rather difficult training data, the (for modern NLP be permanently integrated into the APIS system. tools) problematic language of the ÖBL, and the relation The first version is a rule-based algorithm implemented us- types to extract29 the model performed rather well, even ing the GATE framework.23 The implementation uses the though obviously not precise enough for historians to only JAPE regular expressions language of GATE to automati- rely on the extracted data. The evaluation on 30 randomly cally extract semantic links from the text. In a first step, chosen artist biographies30 showed a recall of 0.79 and a the output of the Entity Linking module is converted to precision of 0.44 (F-beta 0.56). The combination of high XML format, where each Named Entity is an element in recall and low precision is due to the named entity recog- the XML. These XML files were then uploaded to GATE, nizer annotating places where a human annotator wouldn’t and processed by the ANNIE NLP module.24 The Entity do so (e.g. ’Vienna’ in ’University of Vienna’). We believe Linking results as well as the output of the NLP pipeline are that the precision of the method can be significantly raised stored as annotations in GATE. The JAPE regular expres- by improving the named entity recognizer.31 sions work with these annotations and search for linguistic 25 https://github.com/machinalis/iepy patterns in the documents that can express a relationship. 26 https://spacy.io If the application finds a text snippet that corresponds to 27 Please see https://apis.acdh.oeaw.ac.at/presentation the pattern that is specific to that relationship, it automati- innsbruck17/ for a more detailed presentation and a live cally provides a new annotation, which defines the type of version of the model. the relationship. The output of the relation extraction was 28 The fact that this data was not specifically produced for train- exported to XML - widely used in NLP applications - and ing purposes is important. It is very unevenly distributed: about imported back in the APIS system. 2/3 of all annotations bear only two labels out of eight. The anno- tations were also done by only one annotator and are therefore not 23 very concise over the whole corpus. GATE is an open source software designed to automatically 29 Relation types were only chosen based on the research ques- process natural language documents. See: https://gate.ac.uk/ 24 tion and not for how easy they are to find by automatic tools. ANNIE is a system within the GATE framework, which was 30 All members of the ’Künstlerhaus’ have been annotated and designed to automatically process and extract information from used for training, we therefore used other artists for evaluation. textual data. 31 We will do so by retraining the model, and by implementing 57 There have not been many attempts to automatically extract - other than the rule based approach - allow historians to information from biographical articles so far and no one - to train it with whatever they are interested in and get a first - our best knowledge - has tried to train models on relations even if not very accurate - annotation of the whole dataset. annotated by researchers. However, Fokkens et al. (2014) for example extract metadata on the portrayed person from 5 Copyrights full text. While this is not (exactly) the same as extracting These proceedings are published by CEUR. Copyright of relations to other entities it is comparable (e.g. metadata on the individual submissions remains entirely with the au- education vs relations to schools and universities). Fokkens thors. Copyright of the proceedings falls to the editors. For et al. (2014) had much higher precision, but significantly a detailed explanation see: http://ceur-ws.org/ lower recall. The overall system performed similar to our deep learning approach. Dib et al. (2015) used a somewhat 6 Acknowledgements similar approach to extract professions from wikipedia ar- ticles. While they also used the parse tree (and especially The APIS project is funded by a research grant (project the verbs) to find the connection between an actor (in our number ÖAW0405) of Nationalstiftung für Forschung, case the portrayed person) and a circumstance (in our case Technologie und Entwicklung (Programm ”Digital Human- a Named Entity) they did not use a machine learning algo- ities - Langzeitprojekte zum kulturellen Erbe”) rithm to predict the kind of relation, but used a (more or less) fixed set of words that describe the professions. Even 7 References if they have evaluated it only on a limited number of well Martin Andert, Frank Berger, Paul Molitor, and Jörg Ritter. suited articles the overall performance of their system was 2014. An optimized platform for capturing metadata of much higher than ours (recall: 74.1%, precision: 95.2% and historical correspondence. 30(4):471–480. F1: 83.3%). However, as it is focused on extracting pro- Neil Armitage. 2016. The Biographical Network Method. fessions only the system is not really comparable to ours. Sociological Research Online, 21(2):16. Bonch-Osmolovskaya and Kolbasov (2015) also used rules Anastasia Bonch-Osmolovskaya and Matvey Kolbasov. to extract facts from a digital edition of Tolstoy’s letters. 2015. Tolstoy Digital: Mining Biographical Data in Lit- While the system had a very good performance (compara- erary Heritage Editions. In Proceedings of the First Con- ble to Dib et al. (2015)) for professions it had a F1 of 0.43 ference on Biographical Data in a Digital World 2015, for family facts. page 5, Amsterdam. We are currently working on annotating 300 biographies Firas Dib, Simon Lindberg, and Pierre Nugues. 2015. Ex- specifically for training the relation extraction tools. While traction of Career Proles from Wikipedia. In Proceed- our training material so far focused on certain professions ings of the First Conference on Biographical Data in a and on a specific research question, the model trained on Digital World 2015, page 6, Amsterdam. these annotations should provide us with a baseline. Addi- Antske Fokkens, Serge ter Braake, Niels Ockeloen, Piek tionally, we are working on a gold standard for evaluating Vossen, Susan Legne, and Guus Schreiber. 2014. Bi- this baseline model. ographyNet: Methodological Issues when NLP supports We are also working on evaluating the rule based approach historical research. pages 3728–3735. for relation extraction discussed above. Olivier Gergaud, Morgane Laouenan, and Etienne Was- mer. 2016. A Brief History of Human Time: Exploring 4 Conclusion a database of ’notable people’. Sciences Po Economics APIS provides a integrated system that allows researchers Discussion Paper 2016-03, Sciences Po Departement of to annotate biographies and link the annotation to LOD re- Economics, February. sources (and therefore reuse the data that already exists). In Jürgen Kett. 2017. GND-Entwicklungsprogramm 2017- a second step it allows for basic visualizations, filtering and 2021. export of the data. On the other hand the system provides M Reinert, M Schrott, and B Ebneth. 2015. From Biogra- easy access to the database-backend for data scientists and phies to Data Curation-The Making of www. deutsche- therefore allows for use of annotations for training models biographie. de. BD. and out of the box evaluation. Matthias Schlögl and Peter Andorfer. 2018. acdh- The NLP pipelines have some problems with the non- oeaw/apis-core: Apis-core, May. standard language used in biographic dictionaries such as Matej Ďurčo, Karlheinz Mörth, Hannes Pirker, and Jutta ÖBL. However, we found that the rule based approach as Ransmayer. 2014. Austrian Media Corpus 2.0. well as the trained models show some possibilities. The CN Warren, D Shore, J Otis, and L Wang. 2016. Six De- former - as others have shown before (Dib et al., 2015; grees of Francis Bacon: A Statistical Method for Recon- Bonch-Osmolovskaya and Kolbasov, 2015) - especially for structing Large Historical Social Networks. DHQ, 10(3). extracting data of well defined realms such as professions. Florian Windhager, Paolo Federico, Saminu Salisu, The latter even if precision and recall are not high enough Matthias Schlögl, and Eva Mayr. 2017. A Synoptic Vi- yet, to provide historians at least with a useful baseline an- sualization Framework for the Multi-Perspective Study notation that they can use as starting point. This tool will of Biography and Prosopography Data. October. some simple rules such as: when the name of an institution con- tains a place name, the system will annotate the expression as an institution, but not as a place. 58