Thesaurus Enhanced Extraction of Hohfeld’s Relations
             from Spanish Labour Law

                         Patricia Martı́n-Chozas1[0000−0001−5416−6370]
                             Artem Revenko2[0000−0001−6681−3328]
       1
            Ontology Engineering Group, Universidad Politécnica de Madrid, Madrid, Spain
                                   pmchozas@fi.upm.es
                           2
                              Semantic Web Company, Vienna, Austria
                           artem.revenko@semantic-web.com


           Abstract. In this paper we describe the design of an experiment to extract Ho-
           hfeld’s deontic relations from legal texts. Our approach intends to minimise the
           manual effort in the annotation process by expanding a set of initial annotations
           with the legal domain knowledge contained in thesauri represented in Semantic
           Web formats. With such annotations, we perform a set of iterations to train a deep
           learning relation extraction model. After analysing the results, we will adapt the
           process to work on the extraction of Hohfeld’s potestative relations. We also plan
           to use that model to recognise relations in unseen legal sub-domains.

           Keywords: Relation Extraction · Thesaurus · Terminology · Semantic Web


1     Introduction
New legal documentation is being generated daily, which implies new regulations and
laws that need to be processed and, most importantly, understood. Several works have
already tackled the difficulties in legal information processing, such as [5], which identi-
fies five major aggravating factors: multijurisdictionality, volume, accessibility, updates
and consolidation and vagueness of legal document classification.
    Natural language processing tools help solving such challenges, and they can reach
great performance on many language understanding tasks [25]. Yet, these models re-
quire significantly large annotated datasets and language resources to train. We found,
however, that legal language resources are scarce, mostly monolingual, and sometimes
published in close and proprietary formats. This may be one of the reasons why most In-
formation Extraction systems, and Relation Extraction tools specifically, do not handle
legal texts properly and, if they do, they tend to return very general results (see Sec-
tion 2). Therefore, with the aim of making legal information understandable and easier
accessible, in this paper we describe the design of an experiment to extract relations
amongst terms in legal texts. We further represent them as part of rich domain-specific
multi-lingual resources, that can be ultimately exploited for different use cases.
    This work is framed within Lynx3 project, an Innovation Action funded by the Eu-
ropean Union’s Horizon 2020, whose goal is to create a Knowledge Graph of legal and
 3
     http://lynx-project.eu/
2        P. Martı́n-Chozas et al.

regulatory data to ease the access to information from different jurisdictions, languages
and domains. Such a Legal Knowledge Graph (LKG) could be of a great help to comply
with current regulations, specially for non-legal-expert users.
     Amongst all legal relations the Hohfeld’s fundamental legal relations are the most
general ones [10]. The Hohfeld’s relations, being the highest abstraction of all possi-
ble legal relations, may serve the basis for more detailed domain-specific legal rela-
tions. In other words, the legal relations appearing in legal sub-domains may be seen
as sub-relation of Hohfeld’s relations. They are divided in two sets of relations: deon-
tic relations (Right, Duty, No-Right and Priviledge) and potestative relations (Power,
Liability, Disability and Immunity). The term “deontic” refers to a branch of the logic
that is responsible for studying the inferential relationships between normative formulas
that include the operators of permission (P), obligation (O) and prohibition (F), amongst
others [24]. While deontic relations (Figure 1) are those that modify (ordinary) actions,
potestative relations modify deontic relations. In this preliminary experiment we will
put the focus on the deontic relations, leaving potestative relations for future work.


                                                  correlative
                                    Right                         Duty


                                       opposite                      opposite


                                No-right          correlative
                                                                Privilege


                             Fig. 1: Hohfeld’s Deontic Relations


    Taking into account the nature of the deontic relations, we decided that a good start-
ing point would be to analyze the subdomain of labour law, that deals with rights and
duties of employers and employees. We have selected one of the most representative
texts of Spanish labour law, the Spanish Workers’ Statute4 . In the next steps of this
approach we aim at generalizing the models to recognize Hohfeld’s relations in dif-
ferent legal areas and in multiple languages. Since in every subarea of law, we will
find different instances of Hohfeld’s relations, we suggest anyone aiming at Informa-
tion Extraction from legal texts to start with our general models and fine-tune them
for the specific use case. As fine-tuning general requires much less training data, the
fine-tuning datasets could be created by the legal experts with little effort and would,
therefore, enable the tuning of the model to the specific task at hand.


 4
     https://www.boe.es/eli/es/rdlg/2015/10/23/2/con
       Thesaurus Enhanced Extraction of Hohfeld’s Relations from Spanish Labour Law        3

2      Related Work
Since the scope of this approach is still very open, the related work revised is equally
wide. We refer the readers to [17] that exposes in detail the latest advances several
information extraction techniques, including works in Relation Extraction.
    Throughout the literature, we can find many relation extraction experiments based
on very different technologies. Some of them are based on Knowledge Bases, such as
[28], that is based on Freebase (currently deprecated) [6], and is aimed at inferring
answers to questions in natural language. A similar work, [23], employs two differ-
ent KBs, PATTY [18] to identify DBpedia [3] predicates that allow translating natural
language questions into SPARQL queries to reason over entities. Other works employ
linguistic approaches, such as [1] that applies deep linguistic patterns to infer relations
over the English Wikipedia; and [22], that presents Falcon, a tool that identifies entities
in short texts and create relations based on KBs and linguistic patterns.
    Recent advances in deep learning methodologies [12, 11] have significantly im-
proved the state of the art results on well-established relation extraction benchmarks
such as TACRED [29] or SemEval 2010 Task 8 [9]. These models use the contextualised
pre-trained representation of word-pieces to obtain high quality semantic information
about different words in context. Best performing models, for example, SpanBERT [13]
and REDN [15], use not only the individual embeddings of tokens, but also spans of
entities, their lengths, and aggregated embeddings of contexts to get better performance.
    Based on the analysis of works cited in the previous paragraphs, we claim that rela-
tions between terms can be of different nature, going beyond hypernymy or synonymy.
We therefore intend to discover domain-specific relations amongst them, adding extra
information to each element, such as the superclass of the terms involved (subject and
object) and the kind of Hohfeld’s relation expressed by the predicate.

3      Envisioned approach
3.1     Corpus
As mentioned in Section 1, our study is based on the Spanish Workers’ Statute, that
is published in the Official State Gazette website5 . This corpus is divided into three
main sections named as “titles”. The first title covers individual labour relations; the
second title covers the rights of collective representation and workers’ assemblies inside
companies, and the third title covers collective bargaining and collective agreements. In
total, the three sections gather 92 articles, containing approximately 50.000 tokens.
     With the current state of analysis we estimate the density of relations in the Spanish
labour law to be 3.65 relations per article. This number is considered a lower boundary,
since the estimation is calculated over explicit relations, i. e. those relations that can be
attributed to a particular verb in the sentence, but we also expect to retrieve suggestions
of implicit relations predicted by the model.
     To get an idea of the number of entities contained in the corpus, we performed sta-
tistical terminology extraction with TBXTools6 , which applies its own algorithm based
 5
     https://www.boe.es/
 6
     https://sourceforge.net/projects/tbxtools/files/
4       P. Martı́n-Chozas et al.

on the calculation of n-grams (the combination of n words appearing in the corpus) and
on the normalisation of terms [20] [19]. The list of ranked extracted terms, including
multi-word expressions, is revised manually to remove noisy results. After this analy-
sis, we can count with a total of 614 terms, that are considered the arguments of our
relations. These terms do not include Named Entities, so we also consider it as a lower
boundary. Both the corpus and the entity list are publicly available7 – the results of the
experiments will also be progressively uploaded.

3.2   Methodology
In the first step a small excerpt for the legal corpus is manually annotated. We use the
well-established legal thesauri89 , generated within the frame of the Lynx project, and
the manually verified terminology to produce candidates relations. Since every type
of relation of our interest has domain and range restrictions defined manually, we can
filter candidate relations by applying the restrictions. Hence, we can efficiently generate
candidate entity pairs for each relation, for instance, amongst employee and contract in
Example 1. The total size of acquired manually verified relations at this stage is in the
order of 100 samples. These annotations include both entity and relation annotations
(see Example 1), that enable the specification of the relations of interest, including
domain and range restrictions of all relation types.
Example 1. Context El trabajador podrá rescindir el acuerdo y recuperar su libertad
    de trabajo en otro empleo (The worker may rescind the agreement and regain his
    freedom to work in another job).
Entities trabajador (worker): LegalEntity, acuerdo (agreement): LegalDocument.
Relation Type Right.
Context El empresario deberá informar por escrito al trabajador sobre las condiciones
    de trabajo (The employer must inform the worker by written notification about the
    working conditions).
Entities empresario (employer): LegalEntity, trabajador (worker): LegalEntity.
Relation Type Duty.
Context La duración del contrato no podrá ser inferior a seis meses (The duration of
    the contract must not be less than six months).
Entities duración del contrato (duration of the contract) : LegalEntity, seis meses
    (six months): Duration.
Relation Type No-right.
Context Asimismo, el Gobierno podrá otorgar subvenciones, desgravaciones y otras
    medidas (Likewise, the Government may grant subsidies, tax breaks and other mea-
    sures).
Entities Gobierno (Government): LegalEntity, subvenciones (subsidies):
    LegalConcept.
Relation Type Privilege.
 7
   https://github.com/pmchozas/term_relex
 8
   https://zenodo.org/record/3843561
 9
   http://lkg.lynx-project.eu/kos
      Thesaurus Enhanced Extraction of Hohfeld’s Relations from Spanish Labour Law      5

    At this point, we go with second step of our methodology, that is the initial training
dataset to train the Relation Extraction model – modelV0.1. For the training, we use
R-BERT [27] model. This models takes into account the aggregated entity spans as
well as the embeddings of the whole context to classify the relations. Though the model
is not reaching the best scores, it is quite competitive, robust and easy to use. Several
implementations are openly available10 .
    Once the model is trained, we reach the final step, where we can use the model
to predict new relations. As the training set is still small, we expect the model to pro-
duce many incorrect predictions. These predictions are verified manually to expend the
training set and re-train the model (see Figure 2).


                                             terminology extraction


                                legal                                    legal
                               corpus                                 thesaurus


               2) manual                1) relation candidate generation
               validation
                                                  3) training


                              annotated                                    relex
                                data                                       model


                                             4) manual validation


Fig. 2: Our envisioned methodology is composed of four steps: 1) relation candidate
generation, 2) manual validation, 3) training 4) manual validation, and then again 3)
(re-)training. The whole process can be iterated as many times as needed.


    As mentioned in the introduction, the idea is to include these Hohfeld’s relations
into the knowledge graph represented in Semantic Web formats. We find several works
that tackle the representation of Hohfeld’s relations in Semantic Web formats. One of
the most well-known legal ontologies including such concepts is LegalRuleML [2], a
markup language able to represent the particularities of the legal normative rules. On
the other hand, we can find the Provision Model [4] that was extended in [7], to cover
Hohfeld’s relations. Both of them include properties to represent deontic relations (see
Table 1) and can be of a great help to represent those found in this experiment.


10
     for example, https://github.com/monologg/R-BERT
6       P. Martı́n-Chozas et al.


Table 1: Properties representing Deontic operators as per LegalRuleML and Provision
Model ontologies.
         Hohfeld’s Deontic
                              LegalRuleML             Provision Model
         Relations
         Right                lrml:Right              prv:Right
         Duty                 lrml:Obligation prv:Duty
         No-right             lrml:Prohibition prv:Prohibition
         Privilege            lrml:Permission prv:Permission


3.3   Evaluation
For the evaluation of the performance of our model we will use well established met-
rics such as precision (P ), recall (R) and F1 score. Let the gold standard be the correct
manually annotated data. Let the true positives (TP) be all the correctly predicted rela-
tions; false positives (FP) – incorrectly predicted relations; false negatives (FN) – those
cases when a relation is not predicted, though it does exist in the gold standard; true
negatives (TN) – the relation is not predicted and it does not exist in the gold standard.
                                                     P ∗R
Then P = T PT+F P              TP
                   P , R = T P +F N and F1 = 2 ∗ P +R . These measures are well estab-
lished and widely used for evaluation of different classification models, for example,
on the aforementioned benchmarks TACRED [29] and SemEval 2010 Task 8 [9]. The
best models on these datasets currently reach the scores of 74.8% F1 on TACRED11
and above 91% F1 on SemEval12 .

3.4   Envisioned use case
The use case that we propose for this experiment is based on one of the pilots of the
aforementioned Lynx project. Lynx Pilot 213 , supported by Cuatrecasas14 , a globally
well-known Spanish law firm, describes a platform that helps lawyers effectively iden-
tify relevant documents related to the cases they are handling. This platform is built
on top of the Legal Knowledge Graph, which connects legal sources from different le-
gal orders, countries or languages in the field of labour law, enabling the retrieval of
complex information with a single query.
    Based on this pilot, we propose a use case that delves a little deeper into the extrac-
tion of information: instead of identifying documents, we propose to directly identify
what are the rights and the duties of a certain employee or employer under certain work-
ing conditions. We envision an interface, similar to OpenIE15 , where the user only needs
to add a few parameters, such as the type of relation (duty, right...) and the type of agent
11
   https://paperswithcode.com/sota/relation-extraction-on-tacred
   accessed on April 19, 2021
12
   https://paperswithcode.com/sota/relation-extraction-on-semeval-2010-task-8
   accessed on April 19, 2021
13
   https://lynx-project.eu/project/pilot2
14
   https://www.cuatrecasas.com/
15
   https://openie.allenai.org/
    Thesaurus Enhanced Extraction of Hohfeld’s Relations from Spanish Labour Law         7

(employer, employee...). First, we propose this solution at the national level, but as part
of future work it is to explore whether this technique allows us to extract this type of
fine grained information between jurisdictions and languages. The ultimate aim is to
provide non legal experts with easily understandable pieces of information, avoiding
the time-consuming task of browsing through heterogeneous legal documentation. A
preliminary diagram of the user interface and architecture shown in Figure 3.


                            User Interface: Visual + Querying


                                            Docs        Triple
                                                        Triple
                         Docs
                                            Store       Store


                                relex
                                model                     Triples


                   Fig. 3: Envisioned architecture and user interface.


4   Conclusions and future work

In this experiment, we train a model to extract instances of Hohfeld’s deontic relations
from Spanish labour law. Our methodology involves the usage of legal thesauri to per-
form entity annotation in an automatic way, therefore saving manual effort. The initial
training set of relations has to be annotated manually, however we use the (inaccurate)
predictions from the preliminary versions of the trained model to prepare samples for
manual checking and, therefore, bootstrapping the training dataset. This way we effi-
ciently use manual effort to quickly improve the model in a few iterations.
    In the next steps of our experiment we aim at using transfer learning techniques
[21, 16] and in particular cross-lingual transfer learning [8] to generalize the model and
learn representations of Hohfeld’s relations in different legal domains and in different
languages. We aim at comparing the performance of multi-lingual [26] vs monolingual
models for the specified task. Another interesting direction is to explore the usage of
modern Language Models tuned on specific legal corpora, for example, the PatentBert
[14]. These models might show better performance due to its learnt understanding of
legal expressions.
    Finally, we will do an experiment of deducing the general deontic relations to do-
main specific entities. We will use the most general trained deontic multilingual models
to recognize relations in unseen domains, for example, for contract analysis and com-
8        P. Martı́n-Chozas et al.

pliance checking. Afterwards, we will proceed to explore the automatic extraction of
potestative relations, covering the two sets of Hohfeldian legal concepts.


Acknowledgements
This work has received funding from the EU’s Horizon 2020 Research and Innovation
programme through the contracts Lynx (grant agreement No. 780602) and Prêt-à-LLOD
(grant agreement No. 825182), and from the Spanish Ministry of Economy, Industry
and Competitiveness through the Datos4.0 contract (TIN2016-78011-C4-4-R).


References
 1. Akbik, A., Broß, J.: Wanderlust: Extracting semantic relations from natural language text
    using dependency grammar patterns. In: www workshop. vol. 48 (2009)
 2. Athan, T., Governatori, G., Palmirani, M., Paschke, A., Wyner, A.: Legalruleml: Design
    principles and foundations. In: Reasoning Web International Summer School. pp. 151–188.
    Springer (2015)
 3. Auer, S., Bizer, C., Kobilarov, G., Lehmann, J., Cyganiak, R., Ives, Z.: Dbpedia: A nucleus
    for a web of open data. In: The semantic web, pp. 722–735. Springer (2007)
 4. Biagioli, C.: Law making environment: model based system for the formulation, research
    and diagnosis of legislation. Artificial Intelligence and Law (1996)
 5. Boella, G., Humphreys, L., Martin, M., Rossi, P., van der Torre, L., Violato, A.: Eunomos,
    a legal document and knowledge management system for regulatory compliance. In: Infor-
    mation systems: crossroads for organization, management, accounting and engineering, pp.
    571–578. Springer (2012)
 6. Bollacker, K., Evans, C., Paritosh, P., Sturge, T., Taylor, J.: Freebase: a collaboratively cre-
    ated graph database for structuring human knowledge. In: Proceedings of the 2008 ACM
    SIGMOD international conference on Management of data. pp. 1247–1250 (2008)
 7. Francesconi, E.: Semantic model for legal resources: Annotation and reasoning over norma-
    tive provisions. Semantic Web 7(3), 255–265 (2016)
 8. Gracia, J., Fäth, C., Hartung, M., Ionov, M., Bosque-Gil, J., Verı́ssimo, S., Chiarcos, C., Or-
    likowski, M.: Leveraging linguistic linked data for cross-lingual model transfer in the phar-
    maceutical domain. In: Pan, J.Z., Tamma, V., d’Amato, C., Janowicz, K., Fu, B., Polleres, A.,
    Seneviratne, O., Kagal, L. (eds.) The Semantic Web – ISWC 2020. pp. 499–514. Springer
    International Publishing, Cham (2020)
 9. Hendrickx, I., Kim, S.N., Kozareva, Z., Nakov, P., Séaghdha, D.Ó., Padó, S., Pennacchiotti,
    M., Romano, L., Szpakowicz, S.: Semeval-2010 task 8: Multi-way classification of semantic
    relations between pairs of nominals. In: Proceedings of the 5th International Workshop on
    Semantic Evaluation. pp. 33–38 (2010)
10. Hohfeld, W.N.: Some fundamental legal conceptions as applied in judicial reasoning. Yale
    Lj 23, 16 (1913)
11. Hu, R., Singh, A.: Transformer is all you need: Multimodal multitask learning with a unified
    transformer (2021)
12. Huang, Y.Y., Wang, W.Y.: Deep residual learning for weakly-supervised relation extraction
    (2017)
13. Joshi, M., Chen, D., Liu, Y., Weld, D.S., Zettlemoyer, L., Levy, O.: Spanbert: Improving
    pre-training by representing and predicting spans. Transactions of the Association for Com-
    putational Linguistics 8, 64–77 (2020)
     Thesaurus Enhanced Extraction of Hohfeld’s Relations from Spanish Labour Law                 9

14. Lee, J.S., Hsiang, J.: Patentbert: Patent classification with fine-tuning a pre-trained bert
    model. arXiv preprint arXiv:1906.02124 (2019)
15. Li, C., Tian, Y.: Downstream model design of pre-trained language model for relation ex-
    traction task. arXiv preprint arXiv:2004.03786 (2020)
16. Ma, J., Cheng, J.C., Lin, C., Tan, Y., Zhang, J.: Improving air quality prediction accuracy
    at larger temporal resolutions using deep learning and transfer learning techniques. Atmo-
    spheric Environment 214, 116885 (2019)
17. Martinez-Rodriguez, J.L., Hogan, A., Lopez-Arevalo, I.: Information extraction meets the
    semantic web: a survey. Semantic Web (Preprint), 1–81 (2020)
18. Nakashole, N., Weikum, G., Suchanek, F.: Patty: A taxonomy of relational patterns with se-
    mantic types. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural
    Language Processing and Computational Natural Language Learning. pp. 1135–1145 (2012)
19. Oliver, A., Vàzquez, M.: Tbxtools: a free, fast and flexible tool for automatic terminology
    extraction. In: Proceedings of the International Conference Recent Advances in Natural Lan-
    guage Processing (2015)
20. Oliver, T., Vàzquez, M.: A free terminology extraction suite. In: Proceedings of the Twenty-
    ninth International Conference on Translating and the Computer (2007)
21. Pan, S.J., Yang, Q.: A survey on transfer learning. IEEE Transactions on knowledge and data
    engineering 22(10), 1345–1359 (2009)
22. Sakor, A., Mulang, I.O., Singh, K., Shekarpour, S., Vidal, M.E., Lehmann, J., Auer, S.: Old is
    gold: linguistic driven approach for entity and relation linking of short text. In: Proceedings
    of the 2019 Conference of the North American Chapter of the Association for Computational
    Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). pp. 2336–
    2346 (2019)
23. Singh, K., Mulang’, I.O., Lytra, I., Jaradeh, M.Y., Sakor, A., Vidal, M.E., Lange, C., Auer, S.:
    Capturing knowledge in semantically-typed relational patterns to enhance relation linking.
    In: Proceedings of the Knowledge Capture Conference. pp. 1–8 (2017)
24. Von Wright, G.H.: Deontic logic. Mind (1951)
25. Wang, A., Singh, A., Michael, J., Hill, F., Levy, O., Bowman, S.R.: Glue: A multi-task bench-
    mark and analysis platform for natural language understanding (2019)
26. Wang, Z., Mayhew, S., Roth, D., et al.: Cross-lingual ability of multilingual bert: An empir-
    ical study. arXiv preprint arXiv:1912.07840 (2019)
27. Wu, S., He, Y.: Enriching pre-trained language model with entity information for relation
    classification (2019)
28. Xu, K., Reddy, S., Feng, Y., Huang, S., Zhao, D.: Question answering on freebase via relation
    extraction and textual evidence. arXiv preprint arXiv:1603.00957 (2016)
29. Zhang, Y., Zhong, V., Chen, D., Angeli, G., Manning, C.D.: Position-aware attention and
    supervised data improve slot filling. In: Proceedings of the 2017 Conference on Empirical
    Methods in Natural Language Processing. pp. 35–45 (2017)