An Indian Court Decision Annotated Corpus and
Knowledge Graph
Pariskhit Kamat1 , Shubham Kalson1 , Suraj S1 , Pooja Harde1,* ,
Nandana Mihindukulasooriya2 and Sarika Jain1,*
1
    National Institute of Technology Kurukshetra, India
2
    IBM Research, Dublin, Ireland


                                         Abstract
                                         Document collection is increasing enormously in the legal domain, which requires automatic steps to
                                         analyze the data and curate the information from the same. Many challenges are being faced by the legal
                                         stakeholders to extract the information from the lengthy and unstructured court judgment documents
                                         relating to the main concepts, topics, and named entities in the documents. It has become an essential
                                         task in the current scenario to automate the information extraction process and store the documents in
                                         a properly structured format along with the different named legal entities for ease in the information
                                         extraction. In this paper, we introduce an annotated Indian Court Decision Document Corpus consisting
                                         of 10 coarse-grained classes and 30 fine-grained classes as a benchmark data set for constructing the
                                         knowledge graph. We also construct the Indian Court Case Documents’ knowledge graph by utilizing a
                                         rule-based approach for Named Entity Recognition (NER) and Relation Extraction (RE). The results are
                                         evaluated against the proposed benchmark based on precision, recall, and F1 score and also qualitatively
                                         using SPARQL queries. The proposed approach gives a good F1 measure, though, further work is required
                                         to improve the recall.

                                         Keywords
                                         Entity Extraction, Relation Extraction, Knowledge Graph, Legal Domain.


1. Introduction
India’s vast and complex legal system routinely creates and processes large volumes of legal
documents. The limited knowledge of the general public in the field of law along with the
complex language and legal terminologies makes it difficult for them to understand the ideas
and information conveyed by legal documents. Even legal professionals who compile court
judgment documents [1, 2, 3] find it cumbersome to go through long documents, understand
and form opinions [4]. The existing portals retrieve the court decision documents either in PDF
or unstructured format using basic keywords. To overcome the constraints of keyword-based
search, semantic web and semantic search can be utilized. The semantic web[5] is a network in
which the information is stored as knowledge graphs rather than a collection of documents
linked using hyperlinks. Through semantic search, we will be able to focus on the intent and

Joint Proceedings of ISWC2022 Workshops: the International Workshop on Artificial Intelligence Technologies for Legal
Documents (AI4LEGAL) and the International Workshop on Knowledge Graph Summarization (KGSum) (2022)
*
 Corresponding author.
  $ pmharde29@gmail.com (P. Harde); nandana@ibm.com (N. Mihindukulasooriya); jasarika@nitkkr.ac.in (S. Jain)
                                       © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
    CEUR
    Workshop
    Proceedings
                  http://ceur-ws.org
                  ISSN 1613-0073
                                       CEUR Workshop Proceedings (CEUR-WS.org)


                                                                                                          79
contextual meaning of the used keywords rather than completely relying on the keywords for
information retrieval.
   One of the major objectives of this work is to create an annotated data set for the Indian
Court Decision Documents so that the machines can extract maximum information from the
case documents and represent them in a uniform structured format with the help of Knowledge
Graphs [6]. The work also explores the rule-based approach to extract and annotate the legal
entities to construct a Knowledge Graph. The resultant Knowledge base will be a prodigious
collection of interlinked case documents which can benefit the legal stakeholders. The primary
step towards constructing the Knowledge Graph is the Information Extraction (IE) in which the
various legal entities will be identified with the help of Named Entity Recognition (NER) and
the relations among these entities will be extracted through Relation Extraction (RE).
   The existing legal ontologies like JuDo [7], LKIF [8], LRI-Core [9] and so on are available for
creating the legal domain knowledge bases and to serve the legal reasoning in the legal domain,
these ontologies address legal entities and common sense entities at the very abstract level. Thus
the already available ontologies do not fulfill the purpose of extracting the information from
the court judgments like the Court, Jurisdiction under which the court can hear the case, type
of evidence presented in the court hearing of the case, the origin of the case, case background,
and so on. Other than ontologies there is a widely used XMLSchema defined as Akoma Ntoso
1 used for legal document structuring. The limitations with XML is they lack integration of

heterogeneous data on the web and also does not provide the inference to the data which RDFS
and OWL provides. Thus we move forward with the ontology which can provide the inference
to the legal data for legal reasoning and semantics. NyOn[10] (Nyaya Ontology) is designed
to primarily extract the relevant information from the court judgment documents and derive
the relationship between this information making it available for different uses cases like legal
reasoning, question-answering, legal analysis and so on. In paper [11], the JCO ontology for
Indian Court Judgments is created but it lacks the reusability and publishing of the ontology.
NyOn [10] Ontology has been chosen to be served as the metadata for Information Extraction.
The second step is to store the extracted data in a graph database in form of triples to query the
database for information retrieval.
   The prominent contributions put forth by this paper are as follows:
       • Creation of Indian Court Decision Documents Annotated Corpus guided by NyOn [10].
       • Identify the legal entities and relations from the court decision documents using the
         Rule-Based Approach.
       • Construction of Knowledge Graph from the extracted data.
       • Evaluation based upon quantitative and qualitative approaches.
  The remaining portion of the paper is compiled as follows. Section 2 examines the related
works carried out in data set construction and the rule-based approach for information extraction.
Section 3 focuses on Dataset Construction which includes the methodologies followed in creating
and validating the data set. Section 4 discusses the construction of a Knowledge Graph using
NER and RE through a rule-based approach. Section 5 sheds light on the evaluation results and
Section 6 interprets the conclusions drawn from this work and provides insights on areas of
improvement and ideas for future work.
1
    http://www.akomantoso.org/


                                               80
2. Related Work
Though the legal domain has been benefiting from the semantic web and ontology technologies
in the past years, dedicated work on the Indian legal domain is yet to be developed. We have
analyzed a few works done on ontology-based information retrieval and the same is discussed
below.
   Aboaoga et al. [12] and R.Alfred et al. [13]have proposed a Rule-Based approach for recog-
nizing the named entity type (person names) for Arabic and Malay articles respectively.
   Judith Jeyafreeda Andrew et al. [14] developed a system that helps journalists to recognize
legal entities like names of people, organizations, roles, and functions. The author used 2
methods; first, Conditional Random Fields as a statistical method and another is the rule-based
technique for generating language-specific regular expressions.
   P. H. Luz de Araujo et al. [15] presents named entity recognition dataset for Brazilian legal
documents. Along with the open domain tags such as persons, locations, time entities, and
organizations, the dataset contains law and legal cases entities specific tags.
   Based on the annotated corpus Prathamesh et al. [16] created baseline models for automati-
cally predicting rhetorical functions in court documents. They also demonstrated the use of
rhetorical roles to increase performance on summarization and legal judgement prediction tests.
   Vladislav Korablinov and Pavel Braslavski [17] provided the first Russian knowledge base
question answering (KBQA) dataset known as RuBQ. The high-quality dataset included 1,500
Russian questions of varied difficulty, their English machine translations, SPARQL queries
to Wikidata, reference responses, and a Wikidata sample of triples comprising entities with
Russian labels.
   Elena Leitner et al. [18] developed data set for Named Entity Recognition in German legal
documents. They manually annotated approx 67,000 sentences with 2 million tokens.
   Riaz et al. [19] have discussed the differences between Hindi and Urdu NER and concluded
that the NER computational models for Hindi cannot be applied to Urdu. They have also
presented a NER algorithm rule-based Urdu that outperforms the models that use statistical
learning.
   Thomas et al. [20] have investigated natural language texts of domains lacking generic named
entities labelled domain data sets. They created a hybrid NER system that combines rule-based
deep learning with clustering-based techniques to enable the extraction of generic entities.
   In a paper published by Crotti Junior et al. [21] the author discussed the difficulties they
had and the progress they achieved when creating a knowledge graph-based search engine for
Wolters Kluwer Germany’s collection of German court case data.
   Filtz et al. [22] highlights the data representation and search problems in the legal domain
data. They suggested a method for representing Austrian legal information (legal standards and
court rulings), and they demonstrated how to use such information to create a legal knowledge
graph.
   Breukers et al. [9] presented two legal core ontologies for law. The first was the outcome of
Valente’s Ph.D. thesis [23] called FOLaw. This core ontology, known as LRI-Core, is made up of
five main sections (or "worlds"): roles, occurrences, physical, mental, and abstract classes.
   Jain et al. [24] present the similar approach for extracting the NER and RE from the legal
documents. For identifying the named entities, ontology presented in [10] is being used by


                                              81
the author in the paper. Author used the JAPE rules for extracting the legal entities and legal
documents are processed using the GATE tool which are later exported in the inline XML for
RE.
   A generic architecture for legal knowledge systems, as described by Hoekstra et al. in their
publication [8], includes a legal core ontology (LKIF) that enables knowledge exchange between
current legal knowledge systems. LKIF had two primary roles 1) translation of legal knowledge
bases expressed in various representation formats and formalisms and 2) knowledge repre-
sentation formalism that is a component of a wider architecture for creating legal knowledge
systems.
   Ceci et al. [7] introduces an OWL2 ontology library of legal knowledge that relies on the
metadata contained in judicial documents known as JudO. The ontology addresses meaning-
ful legal semantics at the same time retaining a strong connection to source documents (i.e.
fragments of legal texts).
   Thomas et al. [11] presents a legal case ontology named Judicial Case Ontology (JCO) that
incorporates the concepts and relations existent in the legal domain cases including the related
terms from a set of real-life judicial decisions. The ontology supports the extraction of taxonomic
and non-taxonomic domain-specific relationships from e-judgments.


3. Dataset Construction
3.1. Dataset Description
For creating an indian legal corpus, the legal documents are collected from the ’Indian Kanoon’2
website, an online search engine provided for Indian legal documents. The Python script used
for scraping the dataset is given in the Github repository. For ease of processing, the collected
PDF documents were converted to text format. The pre-processing such as sentence splitting,
tokenization, and POS tags annotation using SPACY3 are performed on these text files data. To
restrict the scope of the data we made use of the list of the competency questions such as:
      1. List all the cases of month X.
      2. List all the cases filed in the year X.
      3. What are the total number of cases filed under case type ’criminal’?
      4. List all the cases with X is a judge.
      5. What is the count of cases with ’Appeal is accepted’ as the judgment?
      6. What is the date of judgement for the case X.
      7. List all the cases filed under ’Appellant Jurisdiction’.
      8. Petitioner Name with CASE NO.: X.
      9. List all the cases involving X as one of the party.
     10. Count of appeals ’rejected’ by the judge X.
To address the scope of the data, the required legal terms were taken from NyOn [10] Ontology,
a modular ontology to describe court judgments, and has been published adhering to the
2
    https://indiankanoon.org/
3
    https://spacy.io/


                                                82
Semantic Web best practices and FAIR principles. Two semantic classes are defined to support
domain-specific tags. That is, a coarse-grained class and a fine-grained class consisting of 10
and 30 attributes respectively. Coarse is a more general legal semantic class that includes Court,
Party, CourtDecision, Document, Jurisdiction, Location, CaseType, Author, CourtOfficial, and
DateOfJudgment classes.
   The created dataset is the gold standard dataset with manually identified Named Legal Entities
from the tokens and tagged with domain-specific tags using the CoNLL-2003 format. In CoNLL-
2003 data files, there contains four columns separated by a single space. Each word of the
sentence is added to a single line and each sentence is followed by an empty line. A word is an
initial item on each line, followed by a part-of-speech (POS) tag, a syntactic chunk tag, and a
named entity tag. The dataset is encoded in three different encodings in CoNLL-2003 format:
BILOU ((B-Beginning, I-Internal, L-Last, O-outside, U-Unit), IOB (I-Inside, O-Outside, B-Begin)
and IOBES (I-Inside, O-Outside, B-Begin, E-End, S-Single). It is to be noted that the syntactic
tags are not considered for the preparation of the data set. While a named entity is a pronoun
or noun, which usually refers to the name of a person, place, etc., legal entities are basically
the legal terms from the legal documents that might be names of parties involved, document
numbers, bench, the title of the legal document, etc. A total of four annotators have participated
in the construction of the corpus. The manually developed dataset consists of a total of 50 legal
documents with 80,733 rows of tokenized words and their corresponding annotated legal tags.
Table 1 depict the count of the particular legal tag in the whole dataset (represented by #) for
the corresponding coarse-grained and fine-grained classes.
   Some example attributes of the coarse-grained class and fine-grained class are as follows:

Party The coarse-grained class Party PT contains fine-grained classes Respondent RES, Appel-
     lant APLT, Plaintiff PLNF, Petitioner PETR.
      Ex. PETITIONER PT : B. SHANKRANAND PETR Vs.
            RESPONDENT PT : COMMON CAUSE & ORS. RES DATE OF JUDGMENT:
           11/03/1996 BENCH: RAMASWAMY, K. BENCH: RAMASWAMY, K. G.B. PAT-
           TANAIK (J).

CourtOfficial The coarse-grained class CourtOfficial CRTOF contains fine-grained classes
     Investigator INVG, Solicator SOL and Judge JD.
      Ex. DATE OF JUDGMENT: 29/04/1991 BENCH: KANIA, M. H. BENCH:
           KANIA, M. H. JD  VERMA, JAGDISH SARAN. JD    (J) CRTOF

            RAMASWAMI, V. JD          (J) CRTOF


3.2. DataSet Validation and Publication
The constructed dataset is validated by an expert Mr. Vaibhav Vats, Advocate, Punjab and
Haryana High Court, Chandigarh. The expert reported an annotation accuracy of 92%. It was
observed that the tag SPECIAL LEAVE PETITION was wrongly annotated as PETITION. The


                                               83
Table 1
Coarse-grained Classes (#: count of the tags in corpus)
      SNo. Coarse-grained               Fine-grained                                        #
      1    Court CRT                    Supreme Court SC, High Court HC, Metropolitan       155
                                        Court MTPC, District Court DC, Tribunal TRBL
      2      Party PT                   Respondent RES, Petitioner PETR, Appellant          100
                                        APLT, Plaintiff PLNF
      3      CourtDecision CD           Judgment JDG, Order ORD                             50
      4      Document DOC               Petition PTN, Appeal APPL, CourtJudgement           51
                                        CRTJD, FIR FIR, Other OTR
      5      Jurisdiction JURD          Original JUR-OGNL, Appellant JUR-APLT, Ad-          17
                                        visory JUR-ADVSY, Review JUR-REVW
      6      Location LOC               Country CTY, State STE, District DST, Taluka        37
                                        TLKA, Place PLC
      7      CaseType CTYP              Civil CIVL, Criminal CRNL                           42
      8      Author AUTH                Judge who delivers judgment                         26
      9      CourtOfficial CRTOF        Investigator INVG, Solicitor SOL, Judge JD          148
      10     DateofJudgment DOJUD       Date of Judgment                                    50


expert noticed still more entities that should have been included and the list is not limited to:
Party (Complainant, Defendant, Prosecution, and Accused for the criminal cases); Jurisdiction
(Regional, Pecuniary, Writ, Special Leave Petition); Case Type (Matrimonial, Consumer, IPR);
and more. The documents list can be very exhaustive if we want. The annotation of all these
will be considered as future scope work.
  The data set is published using FigShare4 with CC by 4.0 licence with the DOI:https://doi.
org/10.6084/m9.figshare.19719088.v4


4. Knowledge Graph Construction
Knowledge graphs are network representations of real-world entities consisting of nodes, edges,
and labels. Representing a copious collection of unstructured data using knowledge graphs will
ease the process of abridging the facts and information from extensive documents. Though
there are multiple approaches for constructing knowledge graphs from unstructured data, we
have used the rule-based approach as it can closely simulate human intelligence and offers
the flexibility to incorporate cognitive processes into machines. For extracting named entities
and their relations, the rule-based approach uses regular expressions to identify various lexical
patterns and trigger words.

4.1. Named Entity Recognition
The entity extraction process is carried out by referring to the NyOn [10] Ontology and a
total of 10 named legal entities, namely Party, Court, Date of Judgment, Court Official, Author,
Location, Case Type, Court Decision, Jurisdiction and Documents were identified as given in Table1.
4
    https://figshare.com/


                                                  84
Table 2
Sample Outputs from NER
   NER SAMPLE OUTPUT
   KEWAL KRISHAN VS. STATE OF PUNJAB on           -               CASE_NAME
   06/03/1962
   KEWAL KRISHAN                                  -               PETITIONER
   STATE OF PUNJAB                                -               RESPONDENT
   06/03/1962                                     -               DATE_OF_JUDGMENT

Table 3
Sample Outputs from RE
   RE SAMPLE OUTPUT
   CASE                           hasCaseId           196203KS1SC
   196203KS1SC                    hasCaseName         KEWAL KRISHAN VS. STATE OF PUNJAB on
                                                      06/03/1962
   196203KS1SC                    hasParty            PETITIONER
   PETITIONER                     hasName             KEWAL KRISHAN
   196203KS1SC                    hasParty            RESPONDENT
   RESPONDENT                     hasName             STATE OF PUNJAB
   196203KS1SC                    hasDate             06/03/1962


The scraped data from Indian Kanoon2 is passed through Python rules which contain regular
expressions that trigger target words. The major predicaments faced while coding the Python
rules are the amorphous nature of the legal documents which made it difficult to code regular
expressions that could fit the entire corpus. Despite the irregular structure and format, we were
able to come up with reasonable rules that fit a decent cut of the corpus. Each case will be mapped
to a central entity "CASE" in the knowledge graph. An entity "CASE_NAME" is formed with
the help of three other identified entities, namely "Petitioner"/"Appellant", "Respondent" and
"Date Of Judgment". And if none of the above three entities are identified, the "CASE_NAME"
will be assigned with "CASE_NO" or "APPEAL_NO" respectively, subject to their identification
in the document.
   The output from the NER phase is stored in a single text file with the extracted token and its
corresponding identified entity to pass to the Relation Extraction Phase for obtaining relations
between the entities. The code and the output files are provided in the Github repository.


4.2. Relation Extraction
The relation between the entities extracted in the NER phase are identified in this relation
extraction phase using a small python script. The NyOn [10] is referred for identifying the
various relationships between the extracted entities obtained in NER Phase. A total of 14
relations, namely hasCaseName, hasParty, hasDate, hasYear, hasMonth, hasAppealNo, hasCase-
Type, hasAuthor, hasCourtOfficial, hasJurisdiction, hasCourt, hasLocation and hasCourtDecision
are identified. For uniquely identifying each case, a new entity "CASE_ID" is generated by


                                                85
Table 4
Named Entity Recognition Evaluation Metrics
 S.No.   Type                       Total En-   Identified   Correct    Precision   Recall    F1
                                    tities      Entities     Entities
 1       PARTY                      100         100          100        1.00        1.00      1.00
 2       COURT                      155         43           42         0.98        0.27      0.42
 3       DATE OF JUDGEMENT          50          50           50         1.00        1.00      1.00
 4       COURT OFFICIALS            148         114          114        0.77        0.77      0.77
 5       AUTHOR                     26          25           25         0.96        0.96      0.96
 6       LOCATION                   37          28           26         0.92        0.70      0.79
 7       CASE TYPE                  42          42           39         0.92        0.92      0.92
 8       COURT DECISION             50          49           49         0.98        0.98      0.98
 9       JURISDICTION               17          15           15         0.88        0.88      0.88
 10      DOCUMENTS                  51          40           31         0.77        0.60      0.67


concatenating the year and month of judgment, an abbreviation of our system name(KS for
Kanoon Sarathi) along with the serial number of the case in the current month, and the court
abbreviation to which it belongs to(SC for Supreme Court, HC for High Court and DC for
District Court). A new relation hasCaseId is also derived from the new entity "CASE_ID".
  Since the output of the NER stage does not contain sentences, we use ’if’ statements to
annotate the relationships between the extracted entities. A sample output containing a few
entities and relations from both the NER phase and RE phase corresponding to the case KEWAL
KRISHAN VS. STATE OF PUNJAB on 06/03/1962 is given in Table 3 The code used for relation
extraction along with the output file containing identified relations(predicates) along with the
corresponding entities(subject and object) is provided in the Github repository.


5. Evaluation
5.1. Quantitative Evaluation
For quantitative evaluation of the rule-based Named Entity Recognition (NER) and Relation
Extraction (RE) with respect to our data set, we use the metrics F1-Score, Recall(for measuring
the reliability of the model in correctly identifying entity tags out of actually existing entity
tags), and Precision (for measuring the reliability of the model in correctly identifying entity
tags out of total identified entity tags). Table 4 shown below represents the Evaluation metrics
of Named Entity Recognition and Table 5 depicts the Evaluation Metrics for Relation Extraction.

5.2. Qualitative Evaluation
For the Qualitative evaluation of the Knowledge Graph, 10 competency questions were formu-
lated and the knowledge graph is queried using SPARQL to retrieve the relevant information.
The sample queries based on competency questions are performed on the knowledge graph
that are shown in figure 1. The list of the competency questions with the corresponding queries
and their outputs can be found in the Github repository.


                                                86
Table 5
Relation Extraction Evaluation Metrics
 S.No. Type                       Total Rela-   Identified   Correct     Precision Recall   F1
                                  tions         Relations    Relations
 1       hasCaseName              50            50           50          1.00      1.00     1.00
 2       hasParty                 100           100          100         1.00      1.00     1.00
 3       hasName                  248           202          188         0.93      0.75     0.83
 4       hasCaseNo                50            44           41          0.93      0.82     0.87
 5       hasDate                  50            50           50          1.00      1.00     1.00
 6       hasYear                  50            50           50          1.00      1.00     1.00
 7       hasMonth                 50            50           50          1.00      1.00     1.00
 8       hasAuthor                26            25           24          0.96      0.92     0.93
 9       hasJurisdiction          15            15           15          1.00      1.00     1.00
 10      hasCourt                 50            43           42          0.97      0.84     0.90
 11      hasLocation              37            28           28          1.00      0.75     0.85
 12      hasCaseType              42            42           42          1.00      1.00     1.00
 13      hasCourtDecision         49            49           49          1.00      1.00     1.00
 14      hasCourtOfficial         50            50           50          1.00      1.00     1.00


6. Conclusion and Future Scope
In this paper, we have presented a dataset for Knowledge Base construction in the Indian Legal
domain. We have also discussed the modus operandi for constructing Knowledge Graph from
the Indian Court Decision corpus through a rule-based approach. NyOn [10] Ontology was
used as a reference for entity extraction and relation extraction with the help of which the
triples were annotated. After triple generation, the RDF conversion process is followed using
python script, and the same is stored in Apache Jena Fuseki.
   The results derived were arguably good and comparatively better than the existing works
using a rule-based approach, albeit we have identified numerous shortcomings which can be
improved. In terms of future work, we plan to extend the dataset in two dimensions; one,
add more documents to increase the size of the dataset which will provide a good sample for
approaching the machine learning algorithms for extracting named entities, and second, to add
more entities for annotating legal norms, solicitors, evidence and so on.

6.0.1. Acknowledgements
This work is supported by the IHUB-ANUBHUTI-IIITD FOUNDATION set up under the NM-
ICPS scheme of the Department of Science and Technology, India. We thank Mr. Vaibhav Vats,
Advocate, Punjab and Haryana High Court, Chandigarh for providing his valuable reviews of
the data set.

Supplemental Material Availability: The code and the data set are available on the GitHub
Repository with link: https://github.com/semintelligence/KING.


                                                 87
   (a) List all the cases from the year 1996.                  (b) Count of all the criminal cases.


(c) List all the cases with Union of India as the        (d) List all the appeals rejected by the judge V.
    party.                                                   BOSE

Figure 1: SPARQL Queries representing different competency questions


References
 [1] A. Elnaggar, C. Gebendorfer, I. Glaser, F. Matthes, Multi-task deep learning for legal
     document translation, summarization and multi-label classification, in: Proceedings of the
     2018 Artificial Intelligence and Cloud Computing Conference, 2018, pp. 9–15.
 [2] O.-M. Sulea, M. Zampieri, S. Malmasi, M. Vela, L. P. Dinu, J. Van Genabith, Exploring the
     use of text classification in the legal domain, arXiv preprint arXiv:1710.09306 (2017).
 [3] N. Ramrakhiyani, S. Pawar, G. K. Palshikar, A system for classification of propositions of
     the indian supreme court judgements, in: Post-Proceedings of the 4th and 5th Workshops
     of the Forum for Information Retrieval Evaluation, 2013, pp. 1–4.
 [4] V. Malik, R. Sanjay, S. K. Nigam, K. Ghosh, S. Guha, A. Bhattacharya, A. Modi, Ildc for
     cjpe: Indian legal documents corpus for court judgment prediction and explanation, ????
 [5] S. Sharma, S. Jain, Comprehensive study of semantic annotation: Variant and praxis,
     Advances in Computational Intelligence, its Concepts Applications (ACI 2021) 2823 (2021)
     102–116.
 [6] M. Dragoni, S. Villata, W. Rizzi, G. Governatori, Combining Natural Language Processing
     Approaches for Rule Extraction from Legal Documents: AICOL International Workshops
     2015-2017: AICOL-VI@JURIX 2015, AICOL-VII@EKAW 2016, AICOL-VIII@JURIX 2016,
     AICOL-IX@ICAIL 2017, and AICOL-X@JURIX 2017, Revised Selected Papers, volume
     10791, 2018, pp. 287–300. doi:10.1007/978-3-030-00178-0_19.
 [7] M. Ceci, A. Gangemi, An owl ontology library representing judicial interpretations,


                                                    88
     Semantic Web 7 (2016) 229–253.
 [8] R. Hoekstra, J. Breuker, M. Di Bello, A. Boer, et al., The lkif core ontology of basic legal
     concepts., LOAIT 321 (2007) 43–63.
 [9] J. Breukers, R. Hoekstra, Epistemology and ontology in core ontologies: Folaw and lri-core,
     two, in: Proceedings of EKAW Workshop on Core ontologies [Internet]. Northamptonshire,
     UK: Sun SITE Central Europe, Citeseer, 2004.
[10] S. Jain, P. Harde, N. Mihindukulasooriya, Nyon - a multilingual modular legal ontology for
     representing court judgments, in: International Semantic Intelligence Conference (ISIC
     2022) held during May 17-19, 2022, Georgia Southern University (Armstrong Campus),
     Savannah, United States, 2022.
[11] A. Thomas, S. S., A legal case ontology for extracting domain-specific entity-relationships
     from e-judgments, 2017.
[12] M. Aboaoga, Arabic person names recognition by using a rule based approach, Journal of
     Computer Science 9 (2013) 922–927. doi:10.3844/jcssp.2013.922.927.
[13] R. Alfred, L. M. C. Leong, C. K. On, P. Anthony, Malay named entity recognition based on
     rule-based approach, International Journal of Machine Learning and Computing 4 (2014)
     300–306.
[14] J. J. Andrew, Automatic extraction of entities and relation from legal documents, in:
     Proceedings of the Seventh Named Entities Workshop, Association for Computational
     Linguistics, Melbourne, Australia, 2018, pp. 1–8. URL: https://aclanthology.org/W18-2401.
     doi:10.18653/v1/W18-2401.
[15] P. H. Luz de Araujo, T. de Campos, R. Oliveira, M. Stauffer, S. Couto, P. De Souza Bermejo,
     LeNER-Br: A Dataset for Named Entity Recognition in Brazilian Legal Text: 13th Inter-
     national Conference, PROPOR 2018, Canela, Brazil, September 24–26, 2018, Proceedings,
     2018, pp. 313–323. doi:10.1007/978-3-319-99722-3_32.
[16] P. Kalamkar, A. Tiwari, A. Agarwal, S. Karn, S. Gupta, V. Raghavan, A. Modi, Corpus
     for automatic structuring of legal documents, CoRR abs/2201.13125 (2022). URL: https:
     //arxiv.org/abs/2201.13125. arXiv:2201.13125.
[17] V. Korablinov, P. Braslavski, Rubq: A russian dataset for question answering over wikidata,
     CoRR abs/2005.10659 (2020). URL: https://arxiv.org/abs/2005.10659. arXiv:2005.10659.
[18] E. Leitner, G. Rehm, J. M. Schneider, A dataset of german legal documents for named
     entity recognition, CoRR abs/2003.13016 (2020). URL: https://arxiv.org/abs/2003.13016.
     arXiv:2003.13016.
[19] K. Riaz, Rule-based named entity recognition in urdu, in: Proceedings of the 2010 Named
     Entities Workshop, NEWS ’10, Association for Computational Linguistics, USA, 2010, p.
     126–135.
[20] A. Thomas, S. Sangeetha,                An innovative hybrid approach for extract-
     ing named entities from unstructured text data,                        Computational In-
     telligence     35     (2019)    799–826.     URL:      https://onlinelibrary.wiley.com/doi/
     abs/10.1111/coin.12214.                  doi:https://doi.org/10.1111/coin.12214.
     arXiv:https://onlinelibrary.wiley.com/doi/pdf/10.1111/coin.12214.
[21] A. Crotti Junior, F. Orlandi, D. Graux, M. Hossari, D. O’Sullivan, C. Hartz, C. Dirschl,
     Knowledge graph-based legal search over german court cases, in: European Semantic Web
     Conference, Springer, 2020, pp. 293–297.


                                               89
[22] E. Filtz, Building and processing a knowledge-graph for legal data, 2017. doi:10.1007/
     978-3-319-58451-5_13.
[23] A. Valente, Legal knowledge engineering: A modelling approach, volume 30, Penn State
     Press, 1995.
[24] S. Jain, P. Harde, N. Mihindukulasooriya, S. Ghosh, A. Dubey, A. Bisht, Constructing
     a knowledge graph from indian legal domain corpus, in: International Workshop On
     Knowledge Graph Generation From Text (Text2kg) Co-located with the Extended Semantic
     Web Conference (ESWC 2022), CEUR Workshop Proceedings, volume 3184, 2022, pp. 80–93.


                                            90