Mapping Biographical events to ODPs through
         Lexico-Semantic Patterns?

                    Marco Antonio Stranisci[0000−0001−9337−7250]
                        Valerio Basile[0000−0001−8110−6832]
                      Rossana Damiano[0000−0001−9866−2843]
                        Viviana Patti[0000−0001−5991−370X]

       Dipartimento di Informatica, University of Turin, C.so Svizzera 185, Italy
                         marcoantonio.stranisci@unito.it
                              valerio.basile@unito.it
                             rossana.damiano@unito.it
                               viviana.patti@unito.it


        Abstract. In this paper we present a collection of semantically-encoded
        biographies of authors who were born in former colony countries from
        1945. The data set relies on an ontology that represents the life of an
        author through the two key concepts of migration from birth place and
        legal status in a country, both modeled on two Ontology Design Pat-
        terns: Time Indexed Person Status and Basic Execution Plan. Together
        with the resource, we describe a pipeline to convert the textual biogra-
        phies of the authors gathered from Wikipedia into the roles experienced
        by them in migrations. The pipeline includes modules for linguistic pre-
        processing and named entity recognition, and an entity linking step re-
        lying on Wikipedia and Wikidata APIs to link places and organizations
        to their respective countries. A set of lexico-semantic patterns based on
        verb classes from the Unified Verb Index has been developed in order to
        extract migration-related knowledge from unseen text biographies.

        Keywords: Biography · Immigration · Pattern-based information ex-
        traction · ODP.


1     Introduction
Under-representation of non-Western people is an open issue with a long tradi-
tion [24]. Ethnic minorities suffer this condition in crucial sectors of society, such
as schools [13] and media players [14]. Even collaborative projects seem to be
affected by cultural [28] and gender biases. For instance, [28] observed that most
Wikipedia contributors are European and male, and this may have an influence
on the creation of contents on this platform [26].
    Our work addresses this topic by providing structured knowledge about writ-
ers who suffer a lack of representation on Wikipedia due to their ethnic ori-
gin [25]. In this paper, we present a pipeline for the automatic extraction of
?
    Copyright © 2021 for this paper by its authors. Use permitted under Creative
    Commons License Attribution 4.0 International (CC BY 4.0).
2       M. Stranisci et al.

biographical events from Wikipedia through the adoption of Lexico-Semantic
Patterns [4]; biographical events are semantically described by referring to the
Ontology Design Patterns (ODP) framework [5]. The development of a mapping
from raw-text biographies to semantic categories is a preliminary step to linking
the literary production of under-represented writers to their lives.
   The paper is structured as follows. In Section 2 we discuss the Linked Data
projects that inspired our work, and review the state-of-the-art approaches to
event extraction and encoding. In Section 3 we present the Ontology of Under-
Represented Writers, describing how we encoded their biographies through recur-
rent semantic patterns, and how we modeled the interplay between the authors
and their places of birth. In Section 4, we present the pipeline for the automatic
extraction of biographical events through Lexico-Semantic patterns. Finally, in
Section 5, we analyze, and evaluate results. A discussion about open issues and
future work concludes the paper.


2   Related Work

In recent years, thanks to the availability of sources in a digital form, a new
interest in the study of biographies has arisen is literary, cultural and historical
studies. In particular, three existing Knowledge Graphs share many similarities
with ours: the Orlando project1 , Enslaved2 , and WeChangEd3 . The Orlando
Project is a collection of biographies of 1, 300 British women writers; Enslaved is
a data set of 509, 783 people of the historical slave trade developed from 8 preex-
isting archives; WeChangeEd is a collection of 1, 800 female editors born between
1710 and 1920, aligned with Wikidata. All these data sets rely on Semantic Web
technologies [22,21,27], which are used to represent socio-demographic informa-
tion about the individuals, such as ethnicity, family relationships, and social
status.
    The URW project has a similar perspective to these projects in terms of
aiming to represent a group of persons sharing a specific condition. However,
the concept of “being under-represented” is challenging to model, because it
has blurred boundaries and it can be very subjective. Our project intentionally
does not rely on a taxonomy of ethnicities, choosing instead to fully describe the
interplay between a person and the places where they lived their life, in order to
avoid a Western representation of non-Western writers biographies.
   Several approaches aimed at encoding and annotating events have been pro-
posed in the last years. Despite the common representational goal, these ap-
proaches vary significantly, since events can be formalized at different levels of
granularity. The Biography Ontology [8], part of an ontology network within the
TrendMiner project [7], models biographical events as time-dependent knowledge
by directly adding temporal arguments to the materialised triples.
1
  http://www.artsrn.ualberta.ca/orlando/
2
  https://enslaved.org/
3
  https://www.wechanged.ugent.be/
                                        The Under-Represented Writers KG           3

    Other works analyze events at a word level. The ACE/ERE projects [2][23]
rely on the identification of the events through the use of a lexical ‘Trigger’. The
TimeML annotation scheme [17] has been specifically designed for identifying
all the temporal expressions in a text, and annotating the chronological relation
between them. The Richer Event Description (RED) framework [15] simplifies
the taxonomy of events proposed in TimeML, but adds information about the
causal relations over them.
    Biographical event extraction from raw text is the subject of works relying
on Wikipedia as a source of knowledge. The Pantheon 1.0 data set [28] is a
collection of 11,341 biographies available in more than 25 languages in Wikipedia.
Individuals in the data set have been categorized according their occupation by
using a controlled vocabulary relying on Freebase. Information about the number
of page views for each biography is provided as a way to measure its popularity.
    Other projects have attempted to extract time and geographical information
from biographical texts. Russo et al. [18] collected 782 biographies of people de-
ported to Nazi concentration camps, extracting relevant dates and places of their
lives. Then, all information has been arranged into a structured representation
by using the TimeML framework [17]. The RAMBLE ON application [12] takes
as input a biographical raw text, and automatically detects Motion frames [1]
together with the georeferencing of each place mentioned in frames.
    Our proposal aims at extracting geographical knowledge and life events jointly,
to provide a semantic model for representing biographies. Unlike existing ap-
proaches, which are focused on detecting the lexical entries triggering an event [17],
our work provides a mapping between the textual and the semantic level. Bi-
ographical patterns, encoded by adopting the ODP framework, are populated
extracting semantic knowledge from raw text biographies.


3   A Semantic Model for Under-Represented Writers

The semantic model is designed with the purpose of providing a formal and
objective description of authors who are potentially under-represented due to
the context where they were born. In particular, it encodes biographical events
and situations in which a dul:Person is: (i) a writer and (i) has experienced
the condition of being under-represented. In this way, a correlation between
biographical events and literary production of under-represented authors can be
drawn, and employed to gain insight on the motivations and themes reflected in
their narratives. The main components of this semantic model are: the condition
of being under-represented and the identification of objective criteria to classify
countries which correlate with this condition.
Biographical patterns. According to our formalization, a writer who is under-
represented is a person who published one or more literary works, and may have
experienced the process of migrating, intended as the movement from a country
to another, and the condition of living in a given country after leaving one’s
place of birth. Within the latter situation, the author’s legal or professional
4        M. Stranisci et al.


    Fig. 1. A graphical representation of the urw:TimeIndexedPersonStatus pattern


status may be expressed. Our solution to encode these situations draws from
the ODP framework, which provides foundationally sound, re-usable building
blocks for representing common patterns across ontologies. More specifically,
the urw:Migration pattern refers to the BasicPlanExecution ODP [5], since a
migration represents the execution of a intentionally devised line of action. The
legal status of a person, urw:TimeIndexedPersonStatus (TIPS), relies on
the TimeIndexedPersonRole ODP [16], since this condition is typically subject to
change and can be modelled as time-bounded role. As can be seen in Figure 1 and
2, both Migration and TIPS describe situations that are the setting for an entity
of the type dul:Person, which refers to person according to the commonsense
intuition, with a dul:Role. A role in a TIPS is a urw:ConditionRole, defined
by one or more urw:Conditions, such as being a foreign student, a worker, a
refugee. Since multiple conditions could co-occur in defining a role, each of them
has setting in a separate dul:Classification situation. The urw:Migration
Role in the Migration pattern is defined by a urw:MigrationReason, namely
the reason of the plan of migrating (e.g.: fleeing war, seeking for a job). Both
situations are time-indexed, and take place in one or more specific urw:Place.

Integration with Existing Resources. In addition to the Migration and
TIPS patterns, existing resources have been integrated in the semantic model:
geographical resources for identifying the countries correlated with the lack of
representation, and linguistic resources for mapping raw text biographical facts
to the ontology. In fact, the TIPS and Migration patterns do not provide them-
selves a criterion to identify the under-representation, since they only portray
the condition of living outside one’s country. However, an author such as Italo
Calvino, who was born in Italy and moved to France during his life should not
be considered as under-represented, since his birthplace was a wealthy European
                                        The Under-Represented Writers KG           5


         Fig. 2. A graphical representation of the urw:Migration pattern.


country. Hence, three indicators have been encoded in the ontology to identify
a country as under-represented:
 – the country’s colonial past;
 – its Human Development Index (HDI)4 , a measure of the the global develop-
   ment of countries provided by the United Nations;
 – its mobility score5 , namely the number of countries where a person could
   travel with the passport of the country.
In our formalization, an under-represented country must be a former colony, it
must have a medium or lower HDI (below 0.8), and it must fall within the second
half of the ranking of countries by mobility score. The Named Authority List of
countries maintained by the European Union6 , an authoritative, comprehensive,
and multilingual reference for country names, has been used to standardize and
index all these sources of geographical knowledge.
    Concerning the linguistic resources, we rely on the Ontolex-Lemon model,
which [11] plays the function of mapping the morphological and syntactic prop-
erties of lexical entries to the semantic categories expressed by OWL classes. The
use of this models facilitates the process of converting the raw text of the authors’
biographies into RDF triples by maintaining the lexico-semantic information in
the final representation, as described in Section 4.
    Finally, the PROV-O Ontology [9] is a standard to express the provenance in-
formation of a work. In the context of our research, this model is used to identify
4
  http://hdr.undp.org/en/content/human-development-index-hdi
5
  https://www.passportindex.org/
6
  https://op.europa.eu/en/web/eu-vocabularies/dataset/-/resource?uri=http:
  //publications.europa.eu/\resource/dataset/country
6      M. Stranisci et al.

the LSPs as prov:SoftwareAgent, and the textual Wikipedia biographies as
the source of knowledge from which biographical patterns have been derived.


4   From Ontology Patterns to Lexico-Semantic Patterns

Before collecting the biographies from Wikipedia, under-represented writers have
been identified through the occupation Wikidata property (WDT:P106). Each
person who worked as a writer, novelist, or poet has been collected and classified
by retrieving the country of origin associated to her/his birthplace (WDT:P19).
For each author, the biography in English language, if present, has been retrieved
from Wikipedia. The total amount of collected person entities is 114, 675. Writers
who were born from 1945 on, in any Asian or African under-represented country
(see Section 3) have been chosen to highlight only on biographies of people who
experienced or born after the Decolonization process.
    Starting from this initial corpus, a pipeline to convert raw texts biographies
in TIPS, and Migration classes based on Lexico-Semantic Patterns (LSP) [6,4]
has been developed. LSPs are rules composed of semantic and syntactic elements
related to classes and properties of an ontology. When a rule matches a string of
text, the ontology is automatically populated with one or more RDF triples. An
example of a Lexico-Semantic Pattern, created to extract geographical informa-
tion from text, is the following [19]:

    The rule $subject : Concept COMP RB? IN? $object : Concept matches
    the phrase Administrative territory of Prague is divided into localities
    retrieving a mereological relation between Prague, and localities to be
    stored in an ontology.

  Our pipeline is based on three steps: text parsing, LSP development, Infor-
mation Extraction.


    Fig. 3. The diagram representing the information extraction process in URW
                                       The Under-Represented Writers KG        7

Text parsing. Using the SpaCy library7 , each biography has been split in sen-
tences, and only the ones containing at least one entity of the type Organization
(ORG), or Geopolitical Entity (GPE) have been stored in JSON format, together
with the name of the author, and her/his country and year of birth. Below, there
is an example of an item, in JSON format, referring to the Nigerian writer, and
radio presenter Dotun Adebayo:

      {
          author: Dotun Adebayo,
          birthPlace: Nigeria,
          birthYear: 1960,
          places: [(Stationers’ Company’s Comprehensive School,
          ORG),(Stockholm University)],
          sentence: He then went on to Stationers’ Company’s Com-
          prehensive School in Hornsey, North London, followed by
          Stockholm University, where he studied Literature.
      }


In parallel, each ORG and GPE has been linked with the respective country.
All the strings identified as geopolitical entities or organizations by the SpaCy
Named Entity Recognition module have been used as an input for search through
the Wikipedia API. The first 10 results of the search have been subsequently an-
alyzed, and, among them, the first candidate that holds the Wikidata property
‘country’ (WDT:P17) has been selected, if any. Only the 25, 554 sentences con-
taining an ORG or GPE belonging to different countries than the birth country
of an author have been selected for the next step.
LSP development. After the sentences have been collected, a random subset of
them has been analyzed in order to define LSP rules for encoding the biographic
facts contained in the raw text into the two main patterns of the URW ontology:
urw:TimeIndexedPersonCondition (TIPS), and urw:Migration.
    Given the structure of these patterns, three key elements have been identi-
fied as necessary in an input sentence to make it a candidate trigger: a verb
expressing a change of place or a condition (e.g.: fleeing a country, obtaining
a graduation), a preposition, and an entity of the type Organization (ORG)
or Geo Political Entity (GPE) belonging to a different country from the place
of birth. For instance (see Figure 4), the elements in bold face in the sentence
“He [Dotun Adebayo] then went on to Stationers’ Company’s Compre-
hensive School in Hornsey, North London.” match the pattern (escape-51.1-
1)(to—at—in)(GPE—ORG). So, from this rule, the following RDF triples are
extracted:
[ a urw:Migration;
    dul:isSettingFor [
    a urw:MigrationRole;
    dul:isDefinedIn Study_Abroad
    dul:isRoleOf Dotun_Adebayo.
7
    https://spacy.io/
8        M. Stranisci et al.


Fig. 4. A diagram representing the extraction of the Migration pattern related to
Dotun Adebayo’s biography.


      ]
      dul:isSettingFor England;
      dul:isSetting For Dotun_Adebayo;
      prov:wasDerivedFrom Wikidata;
      prov:wasAttributedTo [
      a LexicoSemanticPattern
      urw:hasPattern (escape-51.1-1)(to|at|in)(GPE|ORG)
      ]
]

The subsequent step in the definition of the LSPs has been the clustering of
verbs through a mapping to general verb types, aimed at reducing the number
of patterns and increasing their recall. To do so, we employed the Unified Verb
Index8 , a repository resulting from the mapping of several lexical resources that
provides syntactic and semantic frames of English verbs. In particular, we linked
the verbs in our data to the VerbNet classes in Unified Verb Index (UVI) [20].
In the previous example, the relevant class for mapping movement verbs onto
the Migration ontology patters is escape-51.1-1-1, which includes the following
lemmas: depart, disembark, escape, exit, flee, leave, vacate.
    As anticipated in Section 3, the mapping between LSPs and VerbNet classes
is expressed in the ontology through the Ontolex-Lemon specification [10,11].
According to this model, each verb is an ontolex:LexicalEntry with a corre-
sponding set of ontolex:LexicalSenses (WordNet offsets [3]), which represent
the lexicalized sense of the ontolex:LexicalConcept, namely the VerbNet
class. The ontolex:LexicalConcept is the bridge between the lexical entries
and the ontology classes. For instance, the ontolex:LexicalEntry lex leave
8
    https://uvi.colorado.edu/
                                            The Under-Represented Writers KG           9

has a corresponding ontolex:LexicalSense which is the v#2009433 Word-
Net offset. The latter is one of the possible lexicalization of the escape-51.1-1-1
VerbNet class, which is the ontolex:Concept.
Information Extraction. After the creation and refinement of the LSPs, 53
rules of the form:

      VerbNet class $preposition GPE—ORG

have been formulated and applied to the annotated sentences.
The following is an example of how the same LSP matches sentences with dif-
ferent verbs and preposition, and encodes them as urw:TimeIndexedPerson
Status:

      LSP: obtain-13.5.2 from|for|at|by|in|as GPE|ORG

      Ajunwa9 received her BA at University of California, Davis in
      2003.

      He held a master’s degree in Theatrical Directing which he obtained
      from the University of Sofia


5      Analysis and evaluation of the results

From the life events encoding pipeline (Section 4) 12, 147 sentences containing
an instance of TIPS, Migration, or both have been obtained.10
Some preliminary statistics can help to assess the relevance of the data stored
in the KG. The resulting Knowledge Graph includes 2, 618 different authors’
biographies, place of birth, and year of birth. 1, 638 of these authors were born
in Asia, 980 in Africa. In total, 39, 167 RDF triples have been stored in the data
set.
   In order to test the precision of the LSP (Lexico-Syntactic Patterns), we
manually evaluated a random sample of 2, 555 sentences, which correspond to the
10% of sentences containing at least one GPE or ORG different from the author
country of birth. Each sentence was labelled as expressing a ‘TIPS—Migration’
(48.5%) or ‘None’ (51.5%), then compared with the patterns.
   Table 1 shows that LSPs performed with a precision of 0.68. The manual
analysis of prediction errors revealed they had several causes. In some cases,
the subject was not the author but another person (e.g., ‘His father left India
9
     https://en.wikipedia.org/wiki/Ifeoma Ajunwa
10
     The first version of the data set is publicly available on the GitHub
     repository of the Under-Represented Writers project https://w3id.org/
     UnderRepresentedWritersOntology/. 10, 569 sentences expressing at least one
     TIPS were detected, 3, 549 with Migration patterns. In 1, 971 cases both are present
     in the same sentences.
10     M. Stranisci et al.

            Table 1. Results of the evaluation of biographical patterns

                                   Pattern        Precision
                                   TIPS             0.665
                                 Migration          0.805
                             TIPS and Migration     0.68


in early 1963 to study at Oxford University’). Another source of error is the
presence of reported speech of the writer (e.g., ‘Members of her African audi-
ence have asserted that Thiam does not understand why women may support
FGM’). Finally, both the NER and the entity linking pipeline seem to introduce
false positives (e.g., in the phrase ‘Shatrughan Sinha, has also spoken in Kumar’s
favour on Twitter’, Twitter is marked as an organization in the United States).
It is important to mention an imbalance in the performance of the two bio-
graphical patterns: Migration situations are retrieved with a precision of 0.805,
in line with recent findings from the literature [19], while precision for TIPS is
0.665. This difference is probably due to the nature of the latter pattern, which
is highly heterogeneous and needs a deeper analysis to specialize it into specific
patterns for different status types. In order to investigate the low performance of


            Table 2. Categorization of status types under TIPS pattern

                             Status type Occurrences (%)
                              Occupation          39.2
                             Publications         17.8
                              Education            12
                               Awards              8.5
                             Social causes          8
                                Other             14.5


the TIPS LSP, we conducted a closer analysis of the situations encompassed by
the TIPS pattern. The results (Table 2) show that the type of status described
in the sentences that matched this pattern is varied: it can refer to occupation
(39.2% of the manually evaluated cases), publications (17.8%), education (12%),
awards (8.5%), or involvement in social causes (8%). Since these situation types
are highly consistent with the URW domain, this preliminary categorization
suggests that more specific rules are needed to encode this information together
with a deeper specification of TIPS within the ontology, and that this ability to
discriminate will improve the performance.
                                         The Under-Represented Writers KG           11

6    Conclusions and Future Work
In this paper we presented a pipeline to extract life events of writers born in an
Asian or African Former Colony Countries from 1945 onwards from Wikipedia
biographies through Lexico-Semantic Patterns. At the present stage, the data
set includes 12, 147 biographical events about 2, 618 authors.
    A manual evaluation of a sample of the results showed a good precision of
Lexico-Semantic Patterns. However, some rules need to be further specialized in
order to extract a taxonomy of TIPS-related conditions. Despite these limita-
tion, it is important to underline that a pipeline based on a small set of rules has
produced a relatively large corpus, from which holistic knowledge about life’s
narratives can be extracted, and generalized to other types of biographies. Fu-
ture works must take into account the chronological arrangement of Migration
and TIPS patterns within a whole biography, and generalize Lexico-Semantic
Patterns to other categories of under-represented people – ethnic minorities and
second generation migrants, people with other occupations – which can be col-
lected in the URW Knowledge Graph.


References
 1. Baker, C.F., Fillmore, C.J., Lowe, J.B.: The Berkeley Framenet project. In: 36th
    Annual Meeting of the ACL and 17th Int. Conf. on Computational Linguistics,
    Volume 1. pp. 86–90 (1998)
 2. Doddington, G., Mitchell, A., Przybocki, M., Ramshaw, L., Strassel, S.,
    Weischedel, R.: The Automatic Content Extraction (ACE) Program – Tasks, Data,
    and Evaluation. In: Proceedings of the 4th Int. Conf. on Language Resources and
    Evaluation (LREC’04). ELRA, Lisbon, Portugal (2004)
 3. Fellbaum, C. (ed.): WordNet: An Electronic Lexical Database. Language, Speech,
    and Communication, MIT Press, Cambridge, MA (1998)
 4. Frasincar, F., Borsje, J., Levering, L.: A semantic web-based approach for building
    personalized news services. International Journal of E-Business Research (IJEBR)
    5(3), 35–53 (2009)
 5. Gangemi, A., Presutti, V.: Ontology design patterns. In: Handbook on ontologies,
    pp. 221–243. Springer (2009)
 6. IJntema, W., Sangers, J., Hogenboom, F., Frasincar, F.: A lexico-semantic pattern
    language for learning ontology instances from text. Journal of Web Semantics 15,
    37–50 (2012)
 7. Krieger, H.U., Declerck, T.: Tmo—the Federated Ontology of the TrendMiner
    Project. In: LREC. pp. 4164–4171 (2014)
 8. Krieger, H.U., Declerck, T.: An OWL ontology for biographical knowledge. repre-
    senting time-dependent factual knowledge. In: BD. pp. 101–110 (2015)
 9. Lebo, T., Sahoo, S., McGuinness, D., Belhajjame, K., Cheney, J., Corsar, D.,
    Garijo, D., Soiland-Reyes, S., Zednik, S., Zhao, J.: Prov-o: The PROV ontology.
    Tech. rep., World Wide Web Consortium (2013), https://www.w3.org/TR/prov-o/
10. McCrae, J., Montiel-Ponsoda, E., Cimiano, P.: Integrating WordNet and Wik-
    tionary with Lemon. In: Linked Data in Linguistics, pp. 25–34. Springer (2012)
11. McCrae, J.P., Bosque-Gil, J., Gracia, J., Buitelaar, P., Cimiano, P.: The Ontolex-
    Lemon model: development and applications. In: eLex 2017. pp. 19–21 (2017)
12      M. Stranisci et al.

12. Menini, S., Sprugnoli, R., Moretti, G., Bignotti, E., Tonelli, S., Lepri, B.: RAMBLE
    ON: Tracing movements of popular historical figures. In: Software Demonstrations
    of the 15th Conf. of EACL. pp. 77–80 (2017)
13. Mikander, P., et al.: Westerners and others in Finnish school textbooks. University
    of Helsinki, Institute of Behavioural Sciences, Studies in Education (2016)
14. Nishikawa, K.A., Towner, T.L., Clawson, R.A., Waltenburg, E.N.: Interviewing the
    interviewers: Journalistic norms and racial diversity in the newsroom. The Howard
    Journal of Communications 20(3), 242–259 (2009)
15. O’Gorman, T., Wright-Bettner, K., Palmer, M.: Richer Event Description: Inte-
    grating event coreference with temporal, causal and bridging annotation. In: Proc.
    of the 2nd Workshop on Computing News Storylines (CNS 2016). pp. 47–56 (2016)
16. Presutti, V., Gangemi, A.: Content ontology design patterns as practical building
    blocks for web ontologies. In: Int. Conference on Conceptual Modeling. pp. 128–
    141. Springer (2008)
17. Pustejovsky, J., Castano, J.M., Ingria, R., Sauri, R., Gaizauskas, R.J., Setzer,
    A., Katz, G., Radev, D.R.: TimeML: Robust specification of event and temporal
    expressions in text. New directions in question answering 3, 28–34 (2003)
18. Russo, I., Caselli, T., Monachini, M.: Extracting and Visualising Biographical
    Events from Wikipedia. In: BD. pp. 111–115 (2015)
19. Saeeda, L., Med, M., Ledvinka, M., Blaško, M., Křemen, P.: Entity linking
    and lexico-semantic patterns for ontology learning. In: Harth, A., Kirrane, S.,
    Ngonga Ngomo, A.C., Paulheim, H., Rula, A., Gentile, A.L., Haase, P., Cochez,
    M. (eds.) The Semantic Web. pp. 138–153. Springer, Cham (2020)
20. Schuler, K.K.: VerbNet: A Broad-Coverage, Comprehensive Verb Lexicon. Ph.D.
    thesis, University of Pennsylvania (2006)
21. Shimizu, C., Hitzler, P., Hirt, Q., Rehberger, D., Estrecha, S.G., Foley, C., Sheill,
    A.M., Hawthorne, W., Mixter, J., Watrall, E., et al.: The Enslaved ontology: Peo-
    ples of the historic slave trade. Journal of Web Semantics 63, 100567 (2020)
22. Simpson, J., Brown, S.: From XML to RDF in the Orlando Project. In: 2013 Int.
    Conf. on Culture and Computing. pp. 194–195. IEEE (2013)
23. Song, Z., Bies, A., Strassel, S., Riese, T., Mott, J., Ellis, J., Wright, J., Kulick,
    S., Ryant, N., Ma, X.: From light to rich ERE: Annotation of Entities, Relations,
    and Events. In: Proc. of the the 3rd Workshop on EVENTS: Definition, Detection,
    Coreference, and Representation. pp. 89–98 (2015)
24. Spivak, G.C.: Can the subaltern speak? Die Philosophin 14(27), 42–58 (2003)
25. Stranisci, M.A., Patti, V., Damiano, R.: Representing the Under-Represented: a
    Dataset of Post-Colonial, and Migrant Writers. In: 3rd Conference on Language,
    Data and Knowledge (LDK 2021). Schloss Dagstuhl-Leibniz-Zentrum für Infor-
    matik (2021)
26. Sun, J., Peng, N.: Men are elected, women are married: Events gender bias on
    Wikipedia. In: Proc. of the 59th Annual Meeting of the ACL and the 11th Inter-
    national Joint Conference on Natural Language Processing (Vol. 2)). ACL (2021)
27. Van Remoortel, M., Birkholz, J.M., Alesina, M., Bezari, C., D’Eer, C., Forestier,
    E.: Women editors in europe. Journal of European Periodical Studies 6(1), 1–6
    (2021)
28. Yu, A.Z., Ronen, S., Hu, K., Lu, T., Hidalgo, C.A.: Pantheon 1.0, a manually
    verified dataset of globally famous biographies. Scientific data 3(1), 1–16 (2016)