Interpersonal Relations in Biographical Dictionaries. A Case Study.

                 Sophia Stotz∗ , Valentina Stuß∗ , Matthias Reinert‡ , Maximilian Schrott‡
                                                       ∗
                                                           University of Paderborn
                                                            stotz,stuss@upb.de

                                                ‡
                                                    Historische Kommission München
                                                       reinert,schrott@hk.badw.de
                                                                 Abstract
Adopting the concept of “Local Grammars” (M. Gross), which were successfully applied in practice by (Geierhos, 2010) to biographical
information extraction in English our project aims to detect, encode, and finally visualize relations between persons. Our corpus consists
of the digitised biographical lexicon “Neue Deutsche Biographie (NDB)”, roughly 21.000 biographies in 25 volumes in print since 1953.
We developed local grammars and suitable dictionaries to describe interpersonal relations and applied them to the corpus with Unitex 3.1.
The local grammars were designed to integrate existing TEI-XML structures in the corpus. Using the ability of local grammars in Unitex
to act as transducers we were able to produce XML-tags and encode semantic information. Based on grammars for personal names
and places we described interpersonal relations like to study, predecessors and successors as well as friends and circles. Afterwards we
identified persons (as given in the authority file or index). Finally we displayed relations on our website in an interactive and dynamic
way. Utilizing the Javascript library D3.js we represented named relations between identified individuals as ego centred network graphs.
Keywords: Local Grammar, Relation Extraction, Visualisation


                     1.    Introduction                                   1.1. Method
                                                                          In the huge field of information extraction we operate on
Biographical dictionaries comprise accounts of lives in a                 named entity recognition, named entity disambiguation and
condensed, often abbreviated form. They list the most im-                 relation extraction. But we restricted our efforts to detect
portant events in an individual’s life, as well as achieve-               personal names and a restricted set of relations. Interesting
ments and contacts with others. Events are expressed in                   relations are accompanied with predicates containing fur-
predicates or sometimes idioms. Both carry one or more                    ther nameable entities as arguments. Our disambiguation
arguments, at least one of them representing an individual.               aims primarily to align personal names with a knowledge
This we call predicate-argument-structure (Geierhos, 2010,                base, namely an index of people, already qualified with pro-
7f.). Other statements about the influence of publications,               fession, dates of birth and death and references to pages
innovations or intellectual impact brought about by the sub-              where they occur in the printed volumes.
ject of biography are not taken into account.                             In order to extract relations we applied methods described
A subset of these predicate-argument structures contain re-               by Gross (1997), an approach called local grammars. Gross
lational expressions: a second argument representing an-                  promoted the idea that idioms tended to be predominant
other person and the predicate - possibly accompanied by                  over syntactic rules in language and demanded to examine
temporal or modal modifiers - representing the relation.                  large corpora in order to extract typical phrases. It is a com-
                                                                          bined dictionaries and graph approach, whereby graphs de-
We consider academic teachers, friends, colleagues as di-
                                                                          scribe linguistic structures on a sub-sentence level. Lin-
rect interpersonal relations and relations constituted by
                                                                          guistic structures or predicate-argument-structures are con-
peer-groups attending the same school and university or
                                                                          sidered as verbal or noun phrases comprising entities car-
share the same profession and professional institution as in-
                                                                          rying information. This reflects the influence of (Harris,
direct relations. Another dimension is hierarchy (patrons,
                                                                          1974) who put the focus on argument structures.
teachers) vs equality (friends, colleagues) expressed in di-
rect relations and hereditary (familiar background) vs tran-              Recent research into this approach has been undertaken on
scendence (intellectual influence, schools of thought) in in-             organization names in English by (Mallchok, 2005), on de-
direct relations. Obviously these relations are manifold and              scriptors for humans in German by (Geierhos, 2007), on to-
occur in modified forms therefore we have to normalise                    ponyms in German by (Nagel, 2008), on biographical facts
them. In this paper we will demonstrate the extraction of                 in English by (Geierhos, 2010) and on biographical facts in
relations expressed by the verb to study.                                 French by (Maurel et al., 2011) and (Maurel and Friburger,
                                                                          2013).
In order to visualize relations between individuals we need
                                                                          Just like these studies we rely on Unitex corpus-processor
to identify their names. We achieved this be applying sim-
                                                                          (Paumier, 2013). Unitex adopts the early efforts of W.
ple matching techniques using indexes and scores and we
                                                                          A. Woods on applying graphs to linguistic phenomena
undertook tests using topic similarities.
                                                                          (William A Woods, 1970). Already in 1980 he pro-
Finally we show the potential of relation extracting be-                  posed to draft and apply subsequent graphs step by step
tween identified individuals by visualizing them online us-               (Woods, 1980). Among others, those ideas and the abil-
ing common force-directed graph libraries.                                ity to call sub-graphs and morphological filters have been

                                                                     74
                           Figure 1: Example of a simple bootstrap graph detecting place names


implemented in Unitex.                                                     Wetzlar,.EN+Topon+ORTSTUD
We constructed local grammars in two steps. First we                       Wismar,.EN+Topon+ORTSTUD
drafted preliminary graphs to describe and detect the spe-                 Witzenhausen,.EN+Topon+ORTSTUD
cific vocabulary around interesting phrases. This was help-                Włocławek,.EN+Topon+ORTSTUD
ful to set up auxiliary dictionaries. Like the electronic dic-             Worpswede,.EN+Topon+ORTSTUD
tionaries distributed with Unitex we use the DELA syn-                     Zerbst,.EN+Topon+ORTSTUD
tax (Dictionnaires Electroniques du LADL [Laboratoire
                                                                      Figure 2: Example of a simple dictionary entries, denot-
d’Automatique Documentaire et Linguistique] (Paumier,
                                                                      ing place names with lexical category EN (named entity),
2013, 29)).
                                                                      semantic categories Topon and ORTSTUD
Secondly we had to cope with TEI-XML-markup already
present in the corpus. We decided not to clean up this in-
formation because abbreviations had been tagged and fa-               for occupation, 2.000 of them in declined form; 15.000 ge-
cilitated the detection of sentence boundaries. This was              ographical names, 3.500 institutional names, mostly multi
achieved by using subsequent local grammars graphs, a                 word chunks. A special vocabulary (1000 entries) covered
mode of “cascade” available in Unitex and described by                disciplines and adjectives accompanying them; another in-
(Maurel and Friburger, 2013).                                         dividual school names who otherwise interfere with the re-
                                                                      lation to study.
1.2. Dictionaries                                                     Bootstrapping dictionaries from the corpus gives the oppor-
Dictionaries are crucial for the adoption of local grammars.          tunity to revise and optimize the dictionaries.
We used the general dictionary CISLEX for German de-
veloped at Center for Information and Language Process-               1.3. The corpus Neue Deutsche Biographie
ing (Centrum für Informations- und Sprachverarbeitung -               Our corpus is provided online at www.deutsche-
CIS) Munich (Guenthner and Maier, 1994). CISLEX con-                  biographie.de. The website consists of the digitised
tains syntactic information about 150.000 entries encoded             biographical dictionaries “New German Biography”
in DELA format (Paumier, 2013, 47ff).                                 (NDB). The dictionary recently reached the letter T
In addition we extracted dictionaries of denominators for             (Tecklenborg) and has published 25 volumes in print since
named entities from indices (list of names, professions) and          1953. Available online are about 21.000 articles of the first
an authority file (Gemeinsame Normdatei1 . The Gemein-                24 volumes (A-Stader). These biographical articles have
same Normdatei (GND)2 provided personal names and                     been selected in a peer review process by the editorial team
name parts, names for places, regions and organisations.              under guidance of the editor in chief. They are composed
We could derive dictionaries with roughly 1.9 mio sur-                of a headline, a short genealogy, the account of life and
names, 1.5 mio forenames and 9.3 mio full names for indi-             further technical paragraphs on awards, works, secondary
viduals as well as 1.36 mio entries for organisational names.         literature and depictions. All articles are signed by an
Describing simple local grammars in a bootstrap manner                author. Articles are written in modern German (pre 2006
(Gross, 1999) we could extract lists of entities for fields of        style) in full sentences but show many abbreviations of
study, institutions and place names (see 2). These boot-              frequent words (adjectives, nouns) and the lemma itself
strapped dictionaries are specific to the given corpus and            (surname or personal name of the subject of the biography).
linguistically simply structured. They contain almost no              In addition to the NDB its precursor “Allgemeine Deutsche
syntactic information or declined forms but carry semantic            Biographie” finished 1912 in 55 volumes plus an index
information. We put together another 32.000 descriptors               volume enlarges the amount of articles available in the
                                                                      website by 27.000. These older articles are written in an
   1
    http://www.dnb.de/gnd                                             outdated orthography and style and have not been taken
   2                                                                  into account.
    http://www.dnb.de/lds GND as Linked Data Service
in March 2013.                                                        We heavily used auxiliary databases listing the individuals

                                                                 75
              Figure 3: Masking pre-tagged text and entities, using TOKEN-loops with ![ ]-negative context


mentioned in the text along with profession or position in                1. masking pre-existing tags
life, their birth and death dates and references to the printed           2. masking interfering statements on education
volumes. All in all the core data base consists of 92.000 in-             3. to study with arguments in pre- and post-position
dividuals and several hundred families. Almost each entry                 4. to study with arguments in pre-position
has been aligned with or added to the bibliographic author-               5. to study with arguments in post-position (common
ity file Gemeinsame Normdatei (GND).                                         case)
The articles were digitised and typographically tagged by                 6. deal with the noun study.
an exernal firm and afterwards structurally tagged in XML
                                                                       Figure 5: Schema of the cascade (Paumier, 2013, 243ff).
according to the TEI guidelines (Text Encoding Initiative,
                                                                       Each graph is applied repeatedly until no new match is
2009) in the project. For reasons like human read-ability,
                                                                       found and in merge-mode e.g. merging outputs with the
easier proof-reading, and tagging of pre-existing XML, we
                                                                       detected sequences in the corpus
decided neither to follow the stand-off mark-up approach
nor the habit of computational linguistics of working on
plain text but to keep up the whole tagging, re-use it on              sequence of grammars (cascade). Almost all grammars
occasion and add further tags in line.                                 were acting as transducers - they wrote output back into
                                                                       the recognized chunks of text. In this way new XML tags
   2.   A local grammar for the verb to study                          were introduced to mark extracted entities in each step.
In German, there are several ways to express someone has               There is a {multi word expression,.lexical
studied. The verb studieren as well as ein Studium be-                 type|mask(+lexical type|mask)∗ }–notation
ginnen, aufnehmen, absolvieren, beenden or (sich) an der               processed by the Unitex system (Paumier, 2013, 44-46).
Universtität einschreiben/ Vorlesungen (an der Universität)            As shown in fig. 3 Unitex recognizes such kind of meta-
belegen, besuchen, jemanden hören each sets a certain fo-              syntax in order to treat multi-word expressions on the one
cus to the activity and determines possible arguments. We              hand and assign lexico-semantic types (e.g. CHOICE+UA
restrict our grammar to the verb to study and its forms. Our           in fig. 3) to text units on the other hand (Geierhos et al.,
analysis of the corpus resulted in the following structure:            2011, 49).
The predicate-argument structure of to study is accompa-               The mask applies to abbreviations already identified and
nied by several types of entities, like institution, university        tagged, certain abbreviations are tagged with semantic
(Universität Wien, Akademie der bildenden Künste), place,              types. This applies also to personal names which were sim-
discipline (Physik, Kulturwissenschaften, teacher (bei Vir-            ilarly identified and tagged with local grammars.
chow und Naunyn, <persName>...</persName»), time                       The schema of Cassys allows to apply a list of graphs and
(1813, 4 Semester, ab Juli 1876) and student colleagues.               to run through each graph once or until no further match is
Several adverbs and modifiers occur in the phrases as                  detected. By default each graph is applied as a transducer,
well as uncertainty markers and negative phrases (studierte            its output can be given in replace- or merge-mode (Paumier,
wahrscheinlich).                                                       2013, 84).
The position of arguments/entities in the sentence is not
fixed, they may occur after the predicate, before or on both           2.2. Recognizing Entities and Relations
sides. One position usually expressed with a pronoun or                By using dictionaries (see 1.2.) and masking graphs we
an abbreviated lemma denotes the subject who has studied.              created a sequence of graphs for our target relation. The
As the corpus contains biographies on individuals we as-               schema of the cascade starts by masking pre-existing XML-
sume that the subject of to study and of the biography are             Tags and goes on detecting and encoding composed enti-
the same.                                                              ties. The Local grammar for the verb to study is split up by
                                                                       positional differences.
2.1. Masking pre-tagged text and entities                              The main graph (s. fig. 4) is composed of paths and sub-
The corpus comprises lots of abbreviations. We masked                  graphs (Paumier, 2013, 99). Each path describes a lin-
them using a special grammar. The masking started a short              guistic possibility and for certain arguments the graph de-

                                                                  76
Figure 4: A local grammar describing the post positioned arguments of studieren/to study, boxes after prepositions branch
into subgraphs


scend into subgraphs describing the structure of the argu-                Part of Corpus                     model      unseen
ment more detailed. Obviously the arguments are governed                  Nstudier...                         3378        5245
by prepositions; in is followed by place names, bei precedes              Nstudier... in sample                 148        261
teachers. The only object argument - the discipline(s) or                 Total nr. of entities in sample       580       1028
field(s) of study - directly governed by studieren/to study is            Nr. of entities found by LG           427        601
rare in a university context.                                             Errors                                  4         17
The graph (s. fig. 4) is applied as a transducer (Paumier,                Recall                            73,62%     58,46%
2013, 243ff). In the figure outputs are displayed in bold-                Precision                         99,31%     98,35%
face letters, each attached to the a box matching possible                F-Score                             84,56      73,33
type, strings or ε on a certain position in the input string.
They produce well formed XML which can be processed                                Table 2: The caption of the table
afterwards.

2.3. Results of Relation Extraction
                                                                      errors would be another graph applied within the cascade or
The LGs were modeled on a subset of the whole corpus
                                                                      on top of the result in replacement mode like (Nagel, 2008,
(vols. 2–4,12–14,22–24) which covered the wide range of
                                                                      233, see “Antigrammatiken”) has shown.
years. Hence the results have been measured twice: once
                                                                      The recall can be increased by additional grammars which
on the model set and again on the test set comprising all
                                                                      can be applied on top of the result. Missing entities due to
other volumes (1,5–11,15–21).
                                                                      early exiting graphs which are generally the consequence
In order to test the results we extracted lines containing the
                                                                      of missing entries in the dictionary.
string studier which represents the infinitive and present
stem (studier[en]), past and perfect stems (studiert) but not               3. Disambiguating Personal Names
related nouns and composita of (Studie, Studium).
The matches and errors were counted as follows:                       Detecting relations in predicate-arguments structures re-
                                                                      sulted in named entities as typed sets of strings (literals).
  entities of    found          not found      false                  The relation extraction already differentiated between per-
                                               named                  sonal names, university names, place names and disci-
  to study       true           fault          fault                  plines. One of the next steps was to disambiguate the iden-
                 positive       (Recall)       (Precision)            tity of personal names by aligning them with knowledge
  not to study   false          true           false                  bases. We identified “literals” as individuals in our registry
                 positive       negative       (Precision)            of names and the authority file.
                 (Precision)                                          To illustrate the problem the single word “Goethe” could
                                                                      refer to the famous writer and public servant Johann Wolf-
    Table 1: Assertion of errors to precision and recall              gang von Goethe ( 1832), but possibly to 5 other articles on
                                                                      persons named “Goethe” in NDB and ADB. The authority
                                                                      file GND provides 129 hits for a person called “Goethe”.
We calculated the common F-measure:                                   The first approach matched features from index-entries
                                                                      (given name, surname, year of birth, year of death, page
           F-Measure = 2×precision×recall
                           precision+recall                           and region [headline, biography or genealogy]) and occur-
We achieved a high rate of precision as intended. The small           rences of names. By simply adding points together for each
number of errors resulted in an erroneous path in the gram-           matching criteria we related the sum to the number of crite-
mar which could be deactivated. Another remedy for these              ria. Matching years scored double, matching initials scored

                                                                 77
                                                                      sualization into the Domain Object Model of the website,
                                                                      making them styleable with CSS and debuggable with stan-
                                                                      dard in-browser developer tools. D3 is also quite flexible: it
                                                                      can process a variety of data formats, as long as the data is
                                                                      structured like an array and then transform it into any kind
                                                                      of visualization, either simple or complex. While potent D3
                                                                      it should be noted though, that D3 can be quite hard to im-
                                                                      plement, due to a poor documentation and some unintuitive
                                                                      behaviours.

                                                                      4.1. Designing the graph
                                                                      When looking for a way to visualize the interpersonal rela-
                                                                      tions we experimented with displaying the persons on the
                                                                      outside of the perimeter of a circle, with the edges between
                                                                      them running through the inside of the circle itself. We
                                                                      hoped that this would provide a good way to display large
                                                                      numbers of persons and their relations within a delimited
Figure 6: Distribution of numbers of score-baseline pairs, x          space. In the end however we found this this approach lack-
represents the baseline of features in the string (name parts,        ing in comprehensibility and difficult to implement. Instead
dates, pages), y represents the scores achieved by matching           we took inspiration from the Social Network and Archive
and c(x,y) represents the count of matches for the given              (SNAC) project at the Institute for Advanced Technology
score and baseline drawn as a circle                                  in the Humanities at the University of Virginia.4 Their pro-
                                                                      totype visualization displays the relations of a person in a
                                                                      classic network graph, in which the persons are nodes with
half points. This resulted in about 55.000 matches in a dis-          edges between them representing their relations. But while
tribution given in fig. 6.                                            the SNAC visualization arranges the nodes along concen-
We examined a sample for each pair in order to detect a               tric circles, we decided to use a force-directed graph.
threshold of certainty. Names without dates were generally
under-determined and have been dropped. In genealogies                4.2. Force-directed graphs
we assumed everyone shared the surname of the subject of              In a force-directed graph, the layout is determined auto-
the biography. But plain given names sometimes do refer               matically and dynamically by an algorithm, that calculates
to another family and the implicit assumption of a common             simulated forces between the nodes. This algorithm is pro-
surname led to failures. In headlines and in the biographical         vided by the D3 library. Normally nodes repel each other
description the matching of names bearing at least 3 correct          and would just spread out evenly across the canvas. Edges,
features (f.i. a name and the page and a date, 2 dates and a          which have a certain length and flexibility, similar to a real-
part of the name, 2 name parts and a date) yielded to rea-            life bungie cord, counteract this repulsion and tie the con-
sonable correct results.                                              nected nodes together. These two forces should ideally ar-
3.1. Results of Disambiguation                                        range the graph in a clearly laid out way. Unrelated nodes
                                                                      are kept at a distance from each other, while related ones
The simple scoring approach allowed us to match most of               group closely together, forming clusters that indicate their
the articles and a substantial amount of persons in the bio-          high level of interconnectedness at first glance.
graphical descriptions and to a smaller degree in genealo-
gies. Named entities for personal names without dates –               4.3. The ego-centred network graph
very frequent in the early volumes and the preceding ADB              Our graph is centered around one person - the root. When
– could not be processed. We tested topic modelling and               the visualisation is started, only the immediate relations of
topic similarity measures (cosine similarity) but were not            the root are grouped radially around it. But the graph can
successful due to the lack of biographies for all potentially         be expanded further, like in the visualisation of SNAC. By
interesting individuals. Some biographies were not elabo-             clicking on the node of a person the user can append their
rate enough to provide a decent vector of topics.                     relations to the graph (if they have any within our database).
                                                                      This not only works with the nodes that are directly linked
          4.   Visualising Relations Online                           to the root, but with any node in the graph. This way the
The visualization of the extracted relations data was re-             user can jump from relation to relation, go deep into the
alized with D3.js.3 This javascript library is all about              graph and discover extensive interpersonal networks.
transforming data into graphics, as its name “Data-Driven-            The nodes can also be collapsed again by clicking on them
Documents” implies. We decided to use this library be-                a second time. This removes all nodes and edges from the
cause of some key advantages. It is modern technology,                graph that are connected to the root only through the clicked
which creates its graphics user side without the need for             node. And by clicking on the root node the graph can al-
any plugin except javascript. It draws into a HTML “div”-             ways be brought back into its original state, with only the
container and integrates the different elements of the vi-            root itself and its immediate relations visible. The deletion
   3                                                                     4
       http://d3js.org                                                       http://socialarchive.iath.virginia.edu/

                                                                 78
             Figure 7: The relations of Philipp Jonkheer von Siebold and three relatives, manually expanded.5


of links and nodes has to be done recursively to account              4.4. Typed Relations
for deep trees of relations, which might spawn from a sin-            A new feature currently tested out in closed beta is the
gle node. Before deleting a node the program checks if it             typing of relations. Currently we mainly distinguish three
has “children” of its own. If so these “children” are then            types of links. The differentiation is based on the part of
checked for deletion or further recursion.                            the article, from where the relations was extracted. If it’s
For the recursion to work, the edges in the graph have to             from the genealogy the link is classed as “Familie” (fam-
be directed. Even though this is not visible in the visualisa-        ily). “Leben” (life) on the other hand means, that the rela-
tion, every link has a source and a target node. To prevent           tion was found in the biography itself. And finally “Liter-
the forming of circles within the graph, which could lead             atur” (literature) links come from the bibliographical ap-
to unwanted behaviour during the recursion, links pointing            pendix to the article. The edges in the graph are color-
back towards the direction of the root have to be avoided.            coded according to their type and can be removed from or
For this reason edges that connect two already linked nodes           added to the graph by the user. The next step is to add
but in the opposite direction are quietly dropped. Likewise           the relations extracted with the more sophisticated method
other links that would close a circle are flipped around by           of computational linguistics described earlier in this paper.
the program to point away from the root. While this ma-               These link types are based on the actual nature of relation-
nipulation and discarding of data is not ideal, we do not             ship rather than their position in our text. We already have
consider it to be very problematic and simply present all             added the type “Lehrer/Schüler” (teacher/students) to our
relations as mutual to the user.                                      beta version and plan to add further types, once they can be
                                                                      extracted with enough confidence. Right now relations like
   5
   Access   to   test  version  http://data.                          “Lehrer/Schüler” exists separate from the three other types.
deutsche-biographie.de/beta/lib4/Projects/
dtBio/relations/?id=sfz80197&version=ndb on                           request.

                                                                 79
But as they model a different kind of relationship, we plan         Friederike Mallchok. 2005. Automatic Recognition of Or-
to revise the data model, so that a link can have multiple            ganization Names in English Business News. Studien zur
types.                                                                Informations- und Sprachverarbeitung Band 9, zugleich
                                                                      Dissertation 2004.
4.5. Further Plans                                                  Denis Maurel and Nathalie Friburger. 2013. Utilisation
We also plan to migrate the relation data to a graph                  avancée des cascades de graphes sous unitex (cassys).
database. Right now the data for the ego-graph is produced            In 2nd Unitex/GramLab Workshop. 10-11 octobre 2013,
from the same Apache Solr search index as the rest of our             Université Paris Est-Marne-la-Vallée.
website. While this works sufficiently well for our cur-            Denis Maurel, Nathalie Friburger, J.-Y. Antoine, I. Eshkol-
rent implementation, we want to expand the functionality              Taravella, and D. Nouvel. 2011. Cascades autour de la
of our visualisation. With the integrated advanced support            reconnaissance des entités nommées. In TAL, pages 69–
for graphs in databases like Neo4J we hope to allow for new           96.
functions like the automatic computation of the shortest re-        Sebastian Nagel. 2008. Lokale Grammatiken zur Beschrei-
lationship between any two persons, while at the same time            bung von lokativen Sätzen und ihre Anwendung im Infor-
reducing the problems with circles and backlinks.                     mation Retrieval.
                                                                    Sébastian Paumier. 2013. Unitex 3.1 (Beta). User Manual.
          5. Outcomes and Discussion                                Text Encoding Initiative, editor. 2009. TEI: P5 Guidelines,
The laborious description of predicate-argument structures            version 1.5.
finally payed off. We could retrieve structured informa-            William A Woods. 1970. Transition network grammars for
tion as type named entities and have been able to adopt               natural language analysis. Communications of the ACM,
our grammars to similar unseen corpora with a fair result.            13(10):591–606.
Our approach on disambiguation is supported for individ-            William A. Woods. 1980. Cascaded ATN grammars.
ual mentions comprising names and dates. Names missing                American Journal of Computational Linguistics, 6:1–12.
dates and other named entities bearing fewer features were
unable to identify.

              6. Acknowledgements
The work is funded by the Deutsche Forschungsgemein-
schaft (DFG) to establish a biographical information
system online (2012-15).


                    7.   References
Michaela Geierhos, Jean-Leon Bouraoui, and Patrick Wa-
  trin. 2011. Towards multilingual biographical event ex-
  traction - initial thoughts on the design of a new anno-
  tation scheme. In Multilingual Resources, Multilingual
  Applications. hg.v. Hanna Hedeland, Thomas Schmidt,
  Kai Wörner, page 4.
Michaela Geierhos. 2007. Grammatik der Menschen-
  bezeichner in biographischen Kontexten. Arbeiten zur
  Informations- und Sprachverarbeitung. Band 2.
Michaela Geierhos. 2010. BiographIE - Klassifikation und
  Extraktion karrierespezifischer Informationen. Linguis-
  tic Resources for Natural Language Processing 05. Lin-
  com.
Maurice Gross. 1997. The construction of local grammars.
  In E. Roche and Y. Schabès, editors, Finite-State Lan-
  guage Processing, pages 329–354.
Maurice Gross. 1999. A bootstrap method for construct-
  ing local grammars. In Neda Bokan, editor, Proceedings
  of the Symposium on Contemporary Mathematics, pages
  229–250.
Franz Guenthner and Petra Maier. 1994. Das CISLEX
  Wörterbuchsystem.
Zellig S. Harris. 1974. Lecture Notes on English Trans-
  formational Grammar Université de Paris VIII, 1974
  (Transl. 1976 by Maurice Gross: Notes du course de syn-
  taxe, Paris: Editions du Seuil.).

                                                               80