<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Concept Hierarchy Extraction from Legal Literature</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Stefan Langer Legal Horizon AG Magdeburg</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Germany stefan.langer@legalhorizon.ag</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>David Broneske Otto von Guericke University Magdeburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Gunter Saake Otto von Guericke University Magdeburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Sabine Wehnert Otto von Guericke University Magdeburg</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>Due to the ever-increasing amount of legal regulations, it became an interest of scholars to nd ways of capturing domain-relevant knowledge and facilitate the navigation in legal text corpora. Furthermore, the contextual nature of legislation requires enhanced semantic capabilities to identify relevant regulations for speci c user needs. This work aims for collecting concept hierarchies from German literature in the legal domain which are then integrated into a knowledge base with multiple clusters, allowing for di erent perspectives and e cient lookups. Having references to regulations in the leaves of the concept tree and higher levels with an increasingly abstract context, the resulting hierarchies provide the basis for creating legal domain knowledge in German law. Starting with rule-based annotation, we cluster extracted references, given their context features derived from tables of contents and reasons for citing from various textbook formats. We study the expressiveness of the obtained reference context features. Since di erent authors have their own notion of hierarchy given by the table of con-</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>BY 4.0).
tents, we propose a heterogeneous lightweight
ontology allowing for the coexistence of
similar, yet diverse concept hierarchies to
dynamically determine the best t for a user in a
semisupervised setting. This approach is novel,
since state-of-the-art ontologies are
conventionally modeled under full integration and in
a top-down manner, often not accounting for
perspectives in knowledge representation.
1</p>
    </sec>
    <sec id="sec-2">
      <title>Introduction</title>
      <p>Nowadays, enterprises as well as lawyers are facing the
challenge of keeping track of an overwhelming number
of legal texts from di erent jurisdictions. Yet, it is
their obligation to ensure compliance, so that often
manual e orts are made to monitor changes in law.
On the other hand, this means that new developments
need to be integrated into already existing knowledge,
e.g., if a law is amended and impacts other regulations
which are used in a speci c scenario, the knowledge
needs to be adapted accordingly. There is a need for
context-sensitive search and a grouping method which
ensures that all relevant documents are retrieved for
a speci c situation. The natural language processing
(NLP) community has made many advances, such as
building citation networks [ZK07, WLM16].
Surprisingly, there are few works addressing the extraction of
legal concept hierarchies based on implicit semantic
relations between legal texts. We de ne implicit
semantic relations as relationships among legal texts which
only apply in speci c contexts, so that they are not
coded as explicit citations within generally applicable
regulations. For example, depending on the expertise
of a lawyer (i.e., knowledge about implicit semantic
relations), he can use his background to identify
connected laws which are important for a speci c case.</p>
      <p>In this paper, we propose a method to extract
information from a large number of textbooks. It can be
used to identify contextually relevant texts based on
their mentions within literature, providing evidence of
a semantic relationship between legal texts depending
on their closeness within the resulting concept
hierarchy. This form of domain knowledge is modeled in a
bottom-up manner, using the references to legal texts
in the literature as instances in the bottom levels of
the concept hierarchy. Above, descriptive context
representations are desired, which we refer to as reasons
for citing, for each respective regulation. These
representations and relationships can be modeled according
to the desired expressiveness of the resulting ontology.
Winkels et al. show that reasons for citing can be
extracted from the sentence referring to the respective
regulation, and narrow them down to four relationship
categories: selection, application, concluding (denying)
and a category for in relation to [WBVvS14]. Zhang
and Koppaka link relevant legal texts based on
reasons for citing and let experts assess their
contextual quality [ZK07]. There are works addressing legal
text linking based on the information given therein
[FMPT10, BDCG+15]. These approaches use explicit
citations from within the document itself or its
metadata. We choose to use external knowledge from
literature to nd relationships which cannot be directly
detected within these documents. For this, we model
relationships among legal texts in a concept hierarchy,
founded upon the spatial co-occurrence of their
mentions in legal literature.</p>
      <p>Our approach is therefore a step in a new
direction of legal informatics, because we consider legal
literature as a source of concept hierarchies to build
domain knowledge. We base our method on the
assumption that a (sub-) chapter headline corresponds
approximately to the concept described in the section.
Furthermore, the cited legal texts in each passage are
seen as semantically related to the discussed concept
of the respective section. While this assumption does
not always hold - especially in cases where authors use
creative titles - our studied literature contains
descriptive concepts in most headings of sections.</p>
      <p>For the scope of this paper, we establish a
connection between legal documents which co-occur in the
same chapter, part, section or lower level subsections.
By means of a concept hierarchy, we are able to
identify closely related legal texts in the lower parts, as well
as those which have a higher distance given only one
common concept on a high abstraction level. A
limitation of this approach is that we extract and maintain
explicit keywords forming a concept. Hence, we do not
integrate it into a common understanding of
standardized concepts, as it can be encountered in standard
ontologies. Having legal textbooks of many di erent
formats and authors as data sources, we expect many
contradictions to occur during an attempt to
establish mapping rules for a standard axiomatic ontology.
Therefore, we follow a di erent notion of knowledge
representation.</p>
      <p>Similar to the process of studying law, we aim for a
diversity of perspectives within our system, which are
chosen depending on the context. Speci cally, we are
interested in the e ects of letting a concept hierarchy
remain in its original structure, derived from the table
of contents (TOC), and coexist among other similar
concept hierarchies belonging to the same cluster. In
this work, we show how such an approach can model
the contextual application of regulations and how it is
able to adapt to user-given feedback. Thus, the
contribution of this work is a combination of the following
techniques:</p>
      <p>We apply rules to annotate elements in a
textbook.</p>
      <p>We access DBpedia knowledge for named entity
resolution.</p>
      <p>We form concept hierarchies and evaluate their
components.</p>
      <p>We group concept hierarchies with nominal
clustering.</p>
      <p>We discuss the use of heterogeneous lightweight
ontology clusters for legal texts.</p>
      <p>The remainder is structured as follows: Section 2
contains related work regarding concept hierarchy
extraction, lightweight ontologies and the formation of
clusters. Since our approach is derived from observations
of research gaps for our speci c use case, we provide
a justi cation of our methods alongside. In Section 3,
we describe our method of extracting concept
hierarchies from legal literature and the subsequent steps of
constructing the domain knowledge. We discuss
experimental results in Section 4. Finally, we conclude
our ndings and unveil future research potential.
2</p>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>We introduce three main aspects regarding our aim
of capturing and applying knowledge from textbooks.
The concept hierarchy is derived from the inherent
structure of a piece of literature. In this section, we
rst name some alternative approaches to extract
concept hierarchies. Second, we provide the background
for the formation of our knowledge base, being
derived from a heterogeneous ontology. Third, we brie y
outline a clustering method because it provides some
further optimization options to control the cluster
formation of a heterogeneous ontology.
2.1</p>
      <sec id="sec-3-1">
        <title>Concept Hierarchy Extraction</title>
        <p>Concept hierarchies are a means for representing
knowledge in a hierarchical manner, having nodes of
increasing abstraction per level and things as instances
in the leaves of the tree. We intend to represent links
between legal texts by shared concepts: The higher
a linking node between two instances is located in
the concept hierarchy, the more distant are two
documents. There are several approaches for extracting
concept hierarchies from unstructured text. Among
them, we nd rules to detect hyponomy relations based
on Hearst Patterns [Hea92], for example to represent
legal vocabularies. Also eigenvector decomposition is
a method for identifying term taxonomies [BDMP06].
Those patterns, however, are not applicable for the
use case of linking legal texts. Lexical hyponomies are
not suitable for references modeled as instances of the
concept hierarchy tree, since the subsumption relation
is not based on the vocabulary, but semantic
relatedness gained from textbooks. Kuo et al. [KTH06]
propose hierarchical clustering to build concept
hierarchies, while also the extraction of noun groups is a
valid approach [ROB17].</p>
        <p>We examine methods of noun group extraction
combined with hierarchical clustering further, and propose
a combination of them for concept hierarchy
extraction from literature. This approach is based on the
assumption that an author captures the topic of a
section within its title. In the highest levels of abstraction
within our concept hierarchy, we gather elements from
the Tables of Contents (TOC) within literature.
Finally, we obtain a coarse- to ne-grained clustering of
regulations based on the understanding of the
corresponding author, while we assume that the reasons for
citing in particular are relevant features justifying the
cluster membership of a regulation.</p>
        <p>Similar to this work, Gunel and Asl yan [GA10]
describe how to extract concepts from tutoring
material in TEX format using domain relevance, entropy
and lexical cohesion as inclusion criteria. Wang et
al. extract concept hierarchies from textbooks by the
TOC and Wikipedia [WLW+15]. We also use the
TOC to nd local relatedness of regulations given the
section title and Wikipedia for Named Entity
Resolution. Robin et al. compare two approaches for legal
concept hierarchy extraction: hierarchical clustering
and the extraction of topical expressions composed of
noun groups [ROB17]. Bruckschen et al. populate
a legal ontology based on Named Entity Recognition
[BNS+10]. In a related eld, an approach using
syntactic positions, called Formal Concept Analysis, is
suggested by Cimiano et al. to extract concept
hierarchies [CHS04]. Based on topic modeling,
part-ofspeech tags and tf-idf weighting, Anoop et al. [AAD16]
suggest an unsupervised method for concept hierarchy
extraction. A possible drawback of statistical topic
modeling methods is the instability of retrieved topics
and their keywords if the process is repeated on the
same data. Belford et al. propose a method relying
on matrix factorization to increase the stability and
accuracy of topic models [BMNG18].</p>
        <p>In contrast to these implementations, we use a
rulebased approach to extract information. Legal
applications can bene t from the control over data
quality that a system designer has while using rule-based
approaches, without compromising on the amount of
data. Despite some deviations from the pattern
where authors incorporate creative headings for
didactic purposes - we nd very few of these cases in our
collection of legal literature. We show the results of
our approach in Section 4.
2.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Heterogeneous Legal Ontology</title>
        <p>Despite some variation in the style format among the
pieces of literature, another major challenge arises
from the obtained concept hierarchies themselves:
Initially, we obtain standalone hierarchies from each
book, and the di erence among them is unknown.
However, topical overlaps are possible for diversi ed
literature, thus posing a challenge in integrating all
concept hierarchies in a non-contradicting manner.</p>
        <p>Instead, we capture the contextual character of
legal texts. Following the notion of hierarchical ontology
clusters proposed in [VC98], we develop the idea of
allowing multiple concept hierarchies to coexist without
integrating them. Conventionally, one common
language and understanding is desired for system
architectures whose components access the same domain
knowledge. Despite these advantages, for our
application such an ontology requires high maintenance e orts
resulting from frequent insertions of further
knowledge, either by automatically determining valid
mappings or checking for logically matching candidates.</p>
        <p>In the legal domain, a common requirement is to
ensure that all relevant documents are retrieved, thus
we optimize for a high recall. This is however
challenging when working with natural language, for
example when encountering its cases of ambiguity,
nearsynonyms and polysemy. We therefore argue that
concepts in legal literature may di er even for equal
topics, which is due to di erent perspectives of the
authors and their own interpretation. However, any
human regularly overcomes these inconsistencies and
ambiguity by either choosing one concept for a
narrow but consistent understanding, or by broadening
the scope and encompassing multiple sources to avoid
omissions of important items, while accessing the most
appropriate t based on a contextual decision
criterion. This criterion can be derived from user-provided
feedback, for example by marking a document as
irrelevant. Then, the concept hierarchy will be selected
which most likely captures the user need based on the
recomputation of relevance.</p>
        <p>Since our intended knowledge base is built in a
bottom-up manner, this work is di erent from
axiomatic ontologies. There are legal ontologies
available such as ALLOT [BDIPV13] or LKIF [HBDB+07],
which are able to encompass multiple legal data
sources, however also requiring alignment of the
respective classes. These ontologies are built upon a
document standard called Akoma Ntoso [VZ07] and
o er many ways of standardized information
modeling on the document level and beyond. For our
speci c use case, we identify two possibilities to achieve
our goal: Either an expert maintains contextual
information regarding speci c applications of laws
together in such a standardized ontology - for instance,
by using the contextual ontology language C-OWL
[BGvH+03] - or there is a system for legal literature
covering di erent scenarios, user categories and
jurisdictions, ideally resulting in a complete collection of
all regulations needed for a case. Several bottom-up
lightweight ontologies for legislative terms and entities
exist [BGBI16, ABC+16]. Our knowledge
representation di ers from these works substantially in terms of
the application scenario and extraction method. To
the best of our knowledge, there is no approach for
the same use case within the legal domain allowing for
a fair comparison with our work.
2.3</p>
      </sec>
      <sec id="sec-3-3">
        <title>Concept Hierarchy Clusters</title>
        <p>Given a large collection of textbooks, we apply
clustering to increase contextuality and to reduce the search
space for nding the the most applicable concept
hierarchy for a context. As a result, many references from
di erent concept hierarchies are merged together. In
order to stucture the cluster, the distance
information given by a hierarchical clustering algorithm can
be exploited. For user-centered applications, a
semisupervised clustering method has been proposed by
Bade and Nurnberger [BN14]. They introduce
mustlink-before constraints for clustering algorithms which
can be applied to hierarchical agglomerative
clustering. Those constraints identify instances to be linked
and those which shall remain separate. Di erent from
other works, this method also implies the means to
model the hierarchical order of instances without
requiring to de ne the exact level di erence. As a use
case for an enforced hierarchy, consider a scenario
where a distance between European and national law is
desired. After including must-link-before constraints,
instances from the speci ed category are located closer
to the reference instance than those which are forced
to link on a higher node of the concept tree. The
algorithm we use in the scope of this work allows for
mustlink and cannot-link constraints by de ning a
relationship between two features [MHAK16]. Due to space
limitations, we leave the examination of constraint
effects for future work and implement the clustering
algorithm without constraints.
3</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Concept Extraction neous Ontologies for</title>
    </sec>
    <sec id="sec-5">
      <title>Heteroge</title>
      <p>Following relevant literature and the justi cation of
our method, we outline our approach for building a
heterogeneous ontology. In particular, we describe the
process of annotating features in textbooks to obtain a
contextual representation of the reference by means of
concept hierarchy clusters. Figure 1 depicts the
workow.
1. An electronic literature resource is converted into
a txt le.
2. The text is preprocessed by performing
tokenization, sentence chunking, orthographic coreference
resolution, parts-of-speech tagging, roman literal
identi cation and named entity resolution using
web knowledge from DBpedia.
3. Rule-based annotation is applied to match TOC
components (Chapter, Part, Subchapter,
Subsubchapter ), CS components (regulation name REG,
DBpedia concept DBp, relationship REL and
references REF.
4. All annotations are extracted into a csv le,
resulting in a table of tokens T with their respective
annotation features.
5. The le is treated as a lookup table and for each</p>
      <p>TOC component, boundaries are determined.
6. All references are matched in document order to
each TOC component with respect to the di
erent section boundaries. Also, the CS information
is retrieved from an extracted annotation le and
assigned to the REF.
7. After the feature information has been detected,
a at representation of the concept hierarchy is
stored, with one REF instance per line and its
TOC and CS feature information.</p>
      <sec id="sec-5-1">
        <title>Cluster Concept</title>
        <p>(8)</p>
      </sec>
      <sec id="sec-5-2">
        <title>Hierarchy Instances</title>
        <p>C1 REF1 X</p>
        <p>REF20 x Feedback
REF55 x
”REF” found in:</p>
        <p>REF1 C1
REF5 C5
Context Descriptor
Label of Cluster</p>
      </sec>
      <sec id="sec-5-3">
        <title>Query</title>
      </sec>
      <sec id="sec-5-4">
        <title>Knowledge Base</title>
        <p>REF1, &lt;CS&gt;, &lt;TOC&gt;
REF2, &lt;CS&gt;, &lt;TOC&gt; Instances
REF3, &lt;CS&gt;, &lt;TOC&gt;</p>
      </sec>
      <sec id="sec-5-5">
        <title>Compose Flat</title>
      </sec>
      <sec id="sec-5-6">
        <title>Concept Hierarchy</title>
        <p>REF1 in 1
REF2 in 1, 1.1
REF3 in 1, 1.2</p>
      </sec>
      <sec id="sec-5-7">
        <title>Lookup and Matching</title>
        <p>(9)
(7)
(6)</p>
      </sec>
      <sec id="sec-5-8">
        <title>Book</title>
      </sec>
      <sec id="sec-5-9">
        <title>Preprocessing</title>
      </sec>
      <sec id="sec-5-10">
        <title>Concept</title>
      </sec>
      <sec id="sec-5-11">
        <title>Annotation</title>
      </sec>
      <sec id="sec-5-12">
        <title>Annotation</title>
        <p>Extraction
(5) Grouping REF by
&lt;TOC&gt; Component
1
1.1
1.2
&lt;TOC&gt;
&lt;CS&gt;</p>
        <p>REF
T
T
T
T
&lt;TOC&gt; &lt;CS&gt; REF</p>
        <p>X x x
x X x
x x X
x x x</p>
        <p>Selected process steps to obtain the knowledge base
are described in more detail in the following. We
share more implementation details and program code
on GitHub.1
10. A feedback mechanism can be implemented to
narrow down relevant references. Di erent from
our idea, Boonchom and Soonthornphisaj use
term frequency-based ontology seeds for a legal
ontology search task [sBS12]. A similar approach
for query expansion using a hierarchical legal
knowledge base is by Schweighofer et al. [SG+07].
Yet, their relevance feedback is based on the
preferences of other users, unlike our approach
focusing only on content.
2https://gate.ac.uk/sale/tao/splitch8.html
3We use the German german-hgc.tagger from the Stanford
parser https://nlp.stanford.edu/software/tagger.shtml
Depending on the publisher, a table of contents
manifests itself in various styles. From numeral-only
versions to mixed alphabet, roman literal and numeric
variations, we de ne separate rules to capture each
distinct heading element including its level in the
context of the table of contents. Despite the e orts in
rule de nition, there are not many substantial
variations within each publishing style, so that minor
inconsistencies may be captured by generalization from
seen examples. Waltl et al. combine the advantages of
rule-based approaches with those of machine learning
techniques because domain knowledge can be directly
incorporated into the training phase to obtain more
control over results [WBM18]. However, it is out of
scope of this work to train an annotation classi er and
a potential future optimization task. After annotation,
we export the TOC features. Based on the detected
elements, we determine the boundaries for each level of
the TOC hierarchy to store the respective references
contained per part, subchapter and subsubchapter.
3.1.2</p>
        <sec id="sec-5-12-1">
          <title>Reasons for Citing (RFC) and Relationships (REL)</title>
          <p>Each sentence with a reference to a legal text
potentially contains information about the rationale of this
citation, which serves as a contextual summary. We
divide the citation summary CS into the regulation
name REG, the reason for citing RFC - following the
notion of an entity - and its relationship REL with the
regulation, captured by verb forms. Extracting the
CS serves as feature information for a clustering
algorithm. Another application is in connection with a
reasoner based on the abstract relationships. Similar
to the approach of Winkels et al. [WBVvS14], a model
of relationships among legal texts can be derived from
textbooks and then be incorporated into the concept
hierarchy. In addition, reasons for citing RFC can be
considered for the user of a (content-based) legal
recommender system as an explanatory component, to
be displayed alongside the reference as a context
descriptor. We nd several pattern varieties proposed for
keyphrase extraction and consider them for the RFC
[WZH16, Hul03]. While the respective authors
analyze English language and capture adjective groups in
addition to noun groups as well, there are more
distinctions available for part-of-speech-tags in German
language. Since including all adjective groups results
in a larger number of distinct nominal features, we
limit the pattern to minor sequence variations
allowing for attributive adjectives. In our use case, we de ne
the following expression to capture the RFC :
RFC = (NN j NNS j NNP j NNPS j NE j
(NN (ADJA j NN) NN))+
(1)
Due to space limitations, this pattern is a
simplied version of the actual one, here only listing
candidate part-of-speech tags (POS) using the SSTS tagset
[STT95]. Our rules account for a variety of
possible sentence structures in German natural language.
Those patterns which are formulated by using the
more expressive JAPE rule syntax are de ned with
priorities, so that the most restrictive rule is applied rst.
Likewise, there are patterns for relationship extraction
examined by multiple authors, as well [FSE11]. We
adapted them to German language and added
negation tags with</p>
          <p>REL = (PTKNEG j V-INF j V-PP j V-FIN)+
(2)
as the simpli ed relationship pattern REL. In the verb
categories we subsume the tags using a hyphen, for
example V-INF is a placeholder for VAINF, VVINF
and VMINF, which are originally output by the
Stanford parser. The relationship feature of the
annotation in this case is formed as a concatenation of REL
matches within a sentence containing RFC. We
adjust the matching rule regarding speci c word patterns
for important indicators - strings indicating
contradictions (e.g., in German \Widerspruch") or selections
(e.g., in German \Beispiel") - which cannot be
generalized with parts-of-speech information. Also, if there
is a syntactic indication of a legal term de nition (e.g.,
in German \nach" or \gema ") within a law, we ll
undetected REL elds with an is-relationship (in
German: \ist"). Furthermore, we clean the matches by
parsing out non-descriptive strings for a relationship
between a reference and its reason for citing (e.g., in
German \denke"). This consequently results in sparse
relationship features, since the above rules are both
speci ed within sentence boundaries. While our
assumption that a sentence citing a regulation contains
RFC and REL patterns, this is not always the case.
For the subsequent steps, we only consider those
regulations containing RFC, and optionally REL. Any
annotated regulation contained in the document where
RFC is missing may not hold enough context
information to determine its applicability for the context.
Despite this limitation, it shall not have severe
consequences in case of a su ciently large heterogeneous
ontology, since other extracted concept hierarchies for
the same context shall cover possible gaps due to the
highly regularized nature of legislation.
3.1.3</p>
        </sec>
        <sec id="sec-5-12-2">
          <title>Regulations (REG, REF)</title>
          <p>Many scholars have examined methods to extract
regulations from unstructured text [WLM16], often to
create a citation network based on the references within
the original regulation text [WBVvS14]. While
currently machine learning approaches remain popular,
rule-based methods achieve high precision and recall,
as well, which is due to the highly regularized pattern
of regulation citation. In German law, there are xed
citation guidelines. Therefore, a su ciently high
proportion of citations can be detected with rules, with
precision and recall in the range from 80% to 90%
[WLM16]. In addition, legal language contains term
de nitions, which are implicitly referenced by other
laws [WLM16]. Those term de nitions can be
extracted with rules and stored in a Lookup dictionary.
Although it is out of scope of this work, we plan to
analyze and enrich regulations with legal term de
nitions - to be found in other regulations - to gain more
context information from the knowledge provided in
the data source itself. We considered corner cases in
reference citations, thus aiming for an improvement
of the already high regulation coverage. These
corner cases include references containing more than two
regulations from di erent sources, and occurrences of
connection indicators, in German abbreviated as \i.
V. m.". These annotations shall contribute to a rich
knowledge base.
3.1.4</p>
        </sec>
        <sec id="sec-5-12-3">
          <title>Access Web Knowledge (DBp)</title>
          <p>Wang et al. suggest in their approach to apply
web knowledge for identifying concept candidates
[WLW+15]. We access Wikipedia-based linked open
data through the DBpedia Spotlight 4 plugin for
GATE5. Unlike their method, we intend the
knowledge base to perform named entity resolution directly
on the citation summary. If a DBpedia entry exists in
the sentence containing a reference, we split the URI to
obtain the concept name as a nominal feature. We
observe that most matches occur for the regulation or the
RFC tokens. There is one frequent misclassi cation
regarding the German Civil Code (BGB), where the
DBpedia lookup yields a swiss political party instead
of the civil code, which we manually corrected before
composing the concept hierarchy. After having
annotated the nine feature types (Chapter, Part,
Subchapter, Subsubchapter, REG, DBp, RFC, REL, REF ), we
export them from GATE and build the concept
hierarchy.
3.2</p>
        </sec>
        <sec id="sec-5-12-4">
          <title>Compose Concept Hierarchies</title>
          <p>Figure 2 shows how we compose and evaluate the
concept hierarchy. In this example, there are two
simpli ed concept hierarchies, which are obtained from
the JAPE rule-based annotations. In the ctive CS
4https://www.dbpedia-spotlight.org/
5http://www.semanticsoftware.info/lodtagger
node, we summarize the features REG, DBp, RFC,
REL for space reasons, however, they are all
standalone features. Each element has mandatory values
for the Chapter, RFC and Reference. The other elds
are optional because we do not assert that the rules
return values for each feature.</p>
          <p>Given the illustrated concept hierarchy in Figure
2, we evaluate the results by setting the Chapter as
a class label - thus expecting a reproduction of the
structure of a chapter - and by not including it in the
features to be processed. As indicated by the arrows,
the test data can match the learned examples by
comparison of the subfeatures and early merges are an
indicator for higher similarity between two instances. A
possible limitation of this approach comes from the
reliance on explicitly stated information. For instance,
if the RFC are not indicated within the reference
sentence or if they are faulty extracted, this can decrease
the expressiveness of the features for the desired
structure. Since the resulting concept hierarchy depends
on the author of the book, his perspective may not
be suitable for any user. Therefore, we see a
possible remedy in the notion of concept hierarchy clusters,
forming a heterogeneous lightweight ontology.
3.2.1</p>
        </sec>
        <sec id="sec-5-12-5">
          <title>Concept Hierarchy Clusters.</title>
          <p>Extracting a narrow concept hierarchy with only
nominal features leads to a lower probability of getting
all relevant references for a speci c information need.
Consider the following example: While one book may
focus on the aspects of national law, another depicts
European legislation. In reality, this information needs
to be considered as a whole, since European legislation
supersedes national law.</p>
          <p>Recalling the discussion from Section 2.2, we show
how exactly a heterogeneous ontology can serve a user
who is interested in complete, reliable and founded
information. Aside from our experiment of matching
extracted instances with Chapter labels, an actual
application of this method is to classify for Relevance
instead. Figure 3 illustrates how a heterogeneous
ontology in legal contexts may emerge. In the setting of
a recommender system, suppose there is a cluster
containing two concept hierarchies with sets of instances
(1, 5, 8) and (1, 2, 4, 8) respectively. In the rst
scenario depicted on the left hand side, the recommender
system receives positive user feedback regarding
instance 1. Since this instance is present in the
current context which is more narrow than other concept
hierarchy, the context is not altered. In contrast, a
similarity function ( A) receives negative feedback for
instance 5 in the second scenario, thus resulting in a
context switch to the other concept hierarchy without
instance 5. There are several approaches for similarity
P</p>
          <p>C
S
SS
CS
§
unknown connenction
inferred membership from feature
training data
test data
Book
Chapter
Part
Subchapter
Subsubchapter
Citation Summary (REG, DBp, RFC, REL)</p>
          <p>We conducted some experiments with subsets from
the 78 documents (subchapters from three xed
chapters), the results are shown in the next Section 4.
4</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Results</title>
      <p>To show the e ect of adding knowledge to the
heterogeneous lightweight ontology, we evaluate the
annotation and perform two experiments. The rst
experiment applies COBWEB clustering on the features,
without knowing the Chapter class label. The second
approach is a classi er for the same features, this time
we use the COBWEB tree. Before we present their
results, we describe the experiment setting and
evaluation measures.
4.1</p>
      <sec id="sec-6-1">
        <title>Evaluation Setup</title>
        <p>The aim of this evaluation is to determine the
expressiveness of our selected features to distinguish between
abstract concepts. In this work, we intend to show
the feasibility of our proposed knowledge extraction
and representation method. Therefore, we create
clusters of semantically similar concept hierarchies by
using the COBWEB algorithm [Fis87]. It is a recursive
4.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Evaluation Measures</title>
        <p>Regarding the annotation success, we determine the
effectiveness of context feature extraction by computing
the average coverage of references REF by RFC
annotations. Basically, if a sentence contains a pattern
which can be detected by our JAPE rules, there will be
an RFC annotation. Since we only considered those
regulations whose context features (especially RFC )
could be retrieved, this evaluation is important to
understand how many data points were the basis for the
subsequent steps of clustering and classi cation.</p>
        <p>Our evaluation measure for the supervised
clustering experiment is the Adjusted Rand Index (ARI),
originally proposed by Hubert and Arabie [HA85]. It
quanti es the overlap between two partitioning
approaches, in our case, we compare the COBWEB
clustering and the class labels (i.e., textbook chapters). Its
expected value 0 indicates a random clustering, while
a value close to 1 corresponds to a high agreement
6https://github.com/cmaclell/concept formation
1 = relevant
current context
1
5
8
5 = irrelevant
current context
1
2
4
8</p>
        <p>In Table 1, we list the number of reference
annotations corresponding to the book chapters:
(1) Bankvertragliche Grundlagen (English:
Foundations of Banking Contracts), (4) Kapitalmarkt- und
Auslandsgeschafte (English: Capital Market and
Foreign Transactions), (8) Europaisches Bankenrecht mit
Landerabschnitten (English: European Banking Law
by Country). Additionally, we indicate the number
of RFC and the average percentage of detected RFC
from all REF annotations per chapter. The numbers
in the column header depict the document number,
corresponding to the subchapters of the textbook. We
nd that almost 75% of the references have an
annotation value for RFC. The restrictions we included
in our pattern prevent us from extracting the chapter
name as a REF, and despite some missing references
and RFC due to long-range dependencies within the
sentence or unwanted headline text insertions at page
breaks, the noise in the text data (e.g., citations of
other books in a reference-like format) did not a ect
the extraction substantially. Nevertheless, all
subsequent steps depend on the annotation, so that a loss
in this step propagates forward to the clustering and
classi cation task.
4.3</p>
      </sec>
      <sec id="sec-6-3">
        <title>Evaluation of Annotation</title>
        <p>We evaluate our annotation results regarding the
number of detected references REF compared to the
number of extracted RFC in the chapter, since we require
the latter for concept formation. Spiegel-Rosing found
for scienti c texts descriptive RFC context in 80% of
the sentences. We assume that in a German legal
textbook, slightly less RFC will be detected, due to a
different writing style (e.g., more complex syntax and
longer sentences). Consequently, our aim for RFC
annotation is set to 70% of REF occurrences. Therefore,
4.4</p>
      </sec>
      <sec id="sec-6-4">
        <title>Evaluation of Heterogeneous Legal Ontology</title>
        <p>We evaluate our results for the COBWEB clustering
algorithm using the extracted Chapter feature as the
ground truth class. With the remaining context
information starting with the Part feature until the REF
feature, the instances are supposed to be grouped by
the COBWEB clustering algorithm. In order to show
the e ect of a successful extraction method, we
restricted the instances only to those cases where a value
could be retrieved for the Part feature, since this is the
most abstract class. To have an equal class
distribution, we downsampled the instances of other chapters
to match the class with the fewest instances left. This
has not been achieved with a random selection, but
instead we selected a group of instances which were
previously spatially close in the textbook. This has the
advantage of not missing important context, as well as
limiting the variance in nominal features. For a fair
comparison, running the evaluation with di erent
instance groups yielded mostly similar results, however
we observe that more variability leads to less similar
examples and thus a lower ARI score.</p>
        <p>For the rst evaluation shown in Figure 4 with 2
principal components p, 3 Chapters and 1020 instances
i of balanced classes, we obtain an adjusted rand
index ( ARI) of 0.28. Each axis holds one principal
component analysis (PCA) dimension to visualize a
projection of the cluster shape. According with our
expectation, there are three clusters, while each cluster
consists of two to three ellipsis shapes. The chapter
labels in Figure 4 indicate that the algorithm does not
have enough information to distinguish between
chapter (1) (labeled as B) and chapter (4) (labeled as K)
and chapter (8) (labeled as E). Many instances of
particularly chapters (4) and (8) are placed in the wrong
cluster. From this, we conclude that despite having
balanced classes, there may be topical overlaps among
the concept hierarchies which shall either result in a
merge or are lacking evidence for separate groups. If
we allow for a slight class imbalance of the instances by
increasing the number of chapter (1) and (4) instances
in a comparable amount to 1149, the ARI increases to
0.64, as shown in Figure 5. This also led to a di
erent cluster shape and a better discrimination between
the three chapter classes. The improvement can be
seen in the classes, where more labels correspond to
the cluster membership. It indicates that the
clustering approach found more agreement between clusters
and the ground truth classes. That observation lets us
conclude that additional examples can lead to a higher
ARI if they only broaden the feature value space
moderately. In previous experiments, we applied the
algorithm to all extracted instances, leading to an ARI of
0.05, presumably because of the high variance of
instances within a chapter and di erent chapter length.</p>
        <p>COBWEB tree with r=10, num=100,</p>
        <p>COBWEB tree with r=10, num=100,
Since this class imbalance will naturally occur in a
heterogeneous ontology, we need to investigate futher
how the approach scales and what the limitations are
regarding the feature diversity.</p>
        <p>We perform a second experiment on the same data,
but in the classi cation setting with a COBWEB tree
with 10 runs r and 300 training instances num. The
result of the classi cation algorithm is shown in
Figures 6 and 7, including 95% con dence intervals for
the average precision and recall values. In Figure 6,
the con dence intervals obtain a range of 40
percentage points (pp), witnessing of an unstable classi cation
result of 80% precision and 87% recall on average
after 200 training examples. The e ect of adding further
examples is illustrated in Figure 7 and similar to the
previous experiment, which manifests in a gain in
precision of about 10pp and a slight increase of 5pp in the
average recall score. Please note that the range of the
con dence interval is reduced to 20pp for recall and
to 10pp for precision, which is a signi cant
improvement of the classi er performance. In summary, the
results for the COBWEB algorithm vary depending
on the number of examples for each concept hierarchy.
A recall of more than 90% is desirable, so that the
results from the second setup of each experiment are
regarded as su cient evidence for descriptive features
to distinguish between di erent contexts. We discuss
the general applicability of the results.
There is more research potential in the question
whether this approach also works for other domain
literature, or what happens if other clustering
algorithms with advanced capabilities of constraint
formulation are chosen. Considering that we used concept
hierarchies mostly about general banking law,
nancial markets and european banking law, the overlap
of REG and RFC is considerable. After other books
about di erent subjects are added, those three concept
hierarchies may form a cluster. During the concept
hierarchy extraction, we found that there are four major
limitations of our approach: First, literature resources
are needed which cover the information need.
Otherwise, a user may not nd his case represented. Second,
for each textbook, there can be a di erent format of
citations or the TOC components. This results in a
higher manual e ort for rule formulation. Third, since
we only had the PDF les of literature available, there
were challenges in segmenting the le and assigning
references to each section, leading to missing feature
values. Fourth, despite having gained much domain
information from the textbook, we need to investigate
more methods of leveraging those. Since we plan to
implement a lightweight heterogeneous ontology, we
uncover future research elds in Section 5.
5</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusion and Future Work</title>
      <p>To conclude, our lightweight heterogeneous ontology
is composed of concept hierarchies which are derived
from literature. It is a promising area for further work.
We pointed out the reasons for accepting coexisting
perspectives in the legal domain and gave indications
of how to take advantage of many sources, while still
controlling the results with constraints and user
feedback. The rule-based annotation method provided
features for context-aware classi cation and clustering of
the concept hierarchies. Overall, the results indicate
that the chosen features, the extraction method and
the concept formation library are suitable for
detecting semantic similarity in the book we selected.
Regarding future work, we are curious about how this
method performs, if additional features of the content
of referenced regulations and term de nitions are taken
into account. Another eld to study is the impact
of abstract relationship categories on clustering. We
see possible applications of the learned ontology in the
eld of law clustering, legal context search, topic
detection and legal recommender systems and intend to
explore more about these use cases.
6</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>The authors would like to thank Andreas Nurnberger
and the anonymous referees for their valuable
comments. The work is supported by Legal Horizon AG,
Grant No.:1704/00082</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [AAD16]
          <string-name>
            <given-names>VS</given-names>
            <surname>Anoop</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S</given-names>
            <surname>Asharaf</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P</given-names>
            <surname>Deepak</surname>
          </string-name>
          .
          <article-title>Unsupervised concept hierarchy learning: a topic modeling guided approach</article-title>
          .
          <source>Procedia Computer Science</source>
          ,
          <volume>89</volume>
          :
          <fpage>386</fpage>
          {
          <fpage>394</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [ABC+16]
          <string-name>
            <surname>Gianmaria</surname>
            <given-names>Ajani</given-names>
          </string-name>
          , Guido Boella, Luigi Di Caro, Livio Robaldo, Llio Humphreys, Sabrina Praduroux, Piercarlo Rossi, and
          <string-name>
            <given-names>Andrea</given-names>
            <surname>Violato</surname>
          </string-name>
          .
          <article-title>The european taxonomy syllabus: A multi-lingual, multi-level ontology framework to untangle the web of european legal terminology</article-title>
          .
          <source>Applied Ontology</source>
          ,
          <volume>11</volume>
          (
          <issue>4</issue>
          ):
          <volume>325</volume>
          {
          <fpage>375</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [BDCG+15]
          <string-name>
            <surname>Guido</surname>
            <given-names>Boella</given-names>
          </string-name>
          , Luigi Di Caro, Michele Graziadei, Loredana Cupi, Carlo Emilio Salaroglio, Llio Humphreys, Hristo Konstantinov, Kornel Marko, Livio Robaldo, Claudio Ru ni, Kiril Simov, Andrea Violato, and
          <string-name>
            <given-names>Veli</given-names>
            <surname>Stroetmann</surname>
          </string-name>
          .
          <article-title>Linking legal open data: Breaking the accessibility and language barrier in european legislation and case law</article-title>
          .
          <source>In Proceedings of the 15th International Conference on Articial Intelligence and Law</source>
          ,
          <source>ICAIL '15</source>
          , pages
          <fpage>171</fpage>
          {
          <fpage>175</fpage>
          , New York, NY, USA,
          <year>2015</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [BDIPV13]
          <string-name>
            <given-names>Gioele</given-names>
            <surname>Barabucci</surname>
          </string-name>
          , Angelo Di Iorio, Francesco Poggi, and
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Vitali</surname>
          </string-name>
          .
          <article-title>Integration of legal datasets: From meta-model to implementation</article-title>
          .
          <source>In Proceedings of International Conference on Information Integration and Web-based Applications &amp;#38; Services</source>
          , IIWAS '
          <volume>13</volume>
          , pages
          <fpage>585</fpage>
          :
          <fpage>585</fpage>
          {
          <fpage>585</fpage>
          :
          <fpage>594</fpage>
          , New York, NY, USA,
          <year>2013</year>
          . ACM.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [BDMP06]
          <string-name>
            <given-names>Holger</given-names>
            <surname>Bast</surname>
          </string-name>
          , Georges Dupret, Debapriyo Majumdar, and
          <string-name>
            <given-names>Benjamin</given-names>
            <surname>Piwowarski</surname>
          </string-name>
          .
          <article-title>Discovering a term taxonomy from term similarities using principal component analysis</article-title>
          .
          <source>In Markus Ackermann</source>
          , Bettina Berendt, Marko Grobelnik, Andreas Hotho, Dunja Mladenic, Giovanni Semeraro, Myra Spiliopoulou, Gerd Stumme, Vojtech Svatek, and Maarten van Someren, editors,
          <source>Semantics, Web and Mining</source>
          , pages
          <volume>103</volume>
          {
          <fpage>120</fpage>
          . Springer Berlin Heidelberg, Berlin, Heidelberg,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [BGBI16]
          <article-title>MarA~ a G. Buey, Angel Luis Garrido, Carlos Bobed, and Sergio Ilarri. The ais project: Boosting information extraction from legal documents by using ontologies</article-title>
          .
          <source>In Proceedings of the 8th International Conference on Agents and Arti cial Intelligence</source>
          , pages
          <fpage>438</fpage>
          {
          <fpage>445</fpage>
          ,
          <year>2016</year>
          . Exported from https://app.dimensions.
          <source>ai on</source>
          <year>2018</year>
          /08/19.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [BGvH+03]
          <string-name>
            <surname>Paolo</surname>
            <given-names>Bouquet</given-names>
          </string-name>
          , Fausto Giunchiglia, Frank van Harmelen,
          <article-title>Luciano Sera ni, and Heiner Stuckenschmidt. C-owl: Contextualizing ontologies</article-title>
          . In Dieter Fensel, Katia Sycara, and John Mylopoulos, editors,
          <source>The Semantic Web - ISWC</source>
          <year>2003</year>
          , pages
          <fpage>164</fpage>
          {
          <fpage>179</fpage>
          , Berlin, Heidelberg,
          <year>2003</year>
          . Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [BMNG18]
          <string-name>
            <given-names>Mark</given-names>
            <surname>Belford</surname>
          </string-name>
          , Brian Mac Namee, and
          <string-name>
            <given-names>Derek</given-names>
            <surname>Greene</surname>
          </string-name>
          .
          <article-title>Stability of topic modeling via matrix factorization</article-title>
          .
          <source>Expert Systems with Applications</source>
          ,
          <volume>91</volume>
          :
          <fpage>159</fpage>
          {
          <fpage>169</fpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [BN14]
          <article-title>Korinna Bade and Andreas Nurnberger. Hierarchical constraints - providing structural bias for hierarchical clustering</article-title>
          .
          <source>Machine Learning</source>
          ,
          <volume>94</volume>
          (
          <issue>3</issue>
          ):
          <volume>371</volume>
          {
          <fpage>399</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [BNS+10]
          <article-title>M rian Bruckschen, Caio North eet</article-title>
          , DM Silva, Paulo Bridi, Roger Granada, Renata Vieira,
          <string-name>
            <given-names>Prasad</given-names>
            <surname>Rao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Tomas</given-names>
            <surname>Sander</surname>
          </string-name>
          .
          <article-title>Named entity recognition in the legal domain for ontology population</article-title>
          .
          <source>In In: 3rd Workshop on Semantic Processing of Legal Texts (SPLeT</source>
          <year>2010</year>
          ), page 16,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [CHS04]
          <string-name>
            <given-names>Philipp</given-names>
            <surname>Cimiano</surname>
          </string-name>
          , Andreas Hotho, and
          <string-name>
            <given-names>Steffen</given-names>
            <surname>Staab</surname>
          </string-name>
          .
          <article-title>Clustering concept hierarchies from text</article-title>
          .
          <source>In Proceedings of the Conference on Lexical Resources and Evaluation (LREC)</source>
          , pages
          <fpage>1721</fpage>
          {
          <fpage>1724</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [DKB08]
          <string-name>
            <given-names>Peter</given-names>
            <surname>Derleder</surname>
          </string-name>
          ,
          <string-name>
            <surname>Kai-Oliver Knops</surname>
          </string-name>
          , and Heinz Georg Bamberger.
          <article-title>Handbuch zum deutschen und europaischen Bankrecht</article-title>
          . Springer Science &amp; Business
          <string-name>
            <surname>Media</surname>
          </string-name>
          ,
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [Fis87] Douglas H Fisher.
          <article-title>Knowledge acquisition via incremental conceptual clustering</article-title>
          .
          <source>Machine learning</source>
          ,
          <volume>2</volume>
          (
          <issue>2</issue>
          ):
          <volume>139</volume>
          {
          <fpage>172</fpage>
          ,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [FMPT10]
          <string-name>
            <given-names>Enrico</given-names>
            <surname>Francesconi</surname>
          </string-name>
          , Simonetta Montemagni, Wim Peters, and
          <string-name>
            <given-names>Daniela</given-names>
            <surname>Tiscornia</surname>
          </string-name>
          .
          <article-title>Integrating a bottom{up and top{down methodology for building semantic resources for the multilingual legal domain</article-title>
          .
          <source>In Semantic Processing of Legal Texts</source>
          , pages
          <volume>95</volume>
          {
          <fpage>121</fpage>
          . Springer,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [PARR11]
          <string-name>
            <given-names>Karteeka</given-names>
            <surname>Pavan</surname>
          </string-name>
          , Allam Appa Rao, and
          <string-name>
            <given-names>A V</given-names>
            <surname>Rao</surname>
          </string-name>
          .
          <article-title>An automatic clustering technique for optimal clusters</article-title>
          .
          <source>abs/1109</source>
          .1068:
          <issue>133</issue>
          {
          <fpage>144</fpage>
          , 09
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [FSE11]
          <string-name>
            <given-names>Anthony</given-names>
            <surname>Fader</surname>
          </string-name>
          , Stephen Soderland, and
          <string-name>
            <given-names>Oren</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Identifying relations for open information extraction</article-title>
          .
          <source>In Proceedings of the conference on empirical methods in natural language processing</source>
          , pages
          <volume>1535</volume>
          {
          <fpage>1545</fpage>
          . Association for Computational Linguistics,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [GA10]
          <article-title>Korhan Gunel and R fat Asl yan. Extracting learning concepts from educational texts in intelligent tutoring systems automatically</article-title>
          .
          <source>Expert Systems with Applications: An International Journal</source>
          ,
          <volume>37</volume>
          (
          <issue>7</issue>
          ):
          <volume>5017</volume>
          {
          <fpage>5022</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [GF14]
          <string-name>
            <given-names>Marian</given-names>
            <surname>George</surname>
          </string-name>
          and
          <string-name>
            <given-names>Christian</given-names>
            <surname>Floerkemeier</surname>
          </string-name>
          .
          <article-title>Recognizing products: A per-exemplar multilabel image classi cation approach</article-title>
          .
          <source>In European Conference on Computer Vision</source>
          , pages
          <volume>440</volume>
          {
          <fpage>455</fpage>
          . Springer,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [HA85]
          <string-name>
            <given-names>Lawrence</given-names>
            <surname>Hubert</surname>
          </string-name>
          and
          <string-name>
            <given-names>Phipps</given-names>
            <surname>Arabie</surname>
          </string-name>
          .
          <article-title>Comparing partitions</article-title>
          .
          <source>Journal of classi cation</source>
          ,
          <volume>2</volume>
          (
          <issue>1</issue>
          ):
          <volume>193</volume>
          {
          <fpage>218</fpage>
          ,
          <year>1985</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [HBDB+07]
          <string-name>
            <surname>Rinke</surname>
            <given-names>Hoekstra</given-names>
          </string-name>
          , Joost Breuker, Marcello Di Bello,
          <string-name>
            <given-names>Alexander</given-names>
            <surname>Boer</surname>
          </string-name>
          , et al.
          <article-title>The lkif core ontology of basic legal concepts</article-title>
          .
          <source>LOAIT</source>
          ,
          <volume>321</volume>
          :
          <fpage>43</fpage>
          {
          <fpage>63</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [Hea92]
          <article-title>Marti A Hearst. Automatic acquisition of hyponyms from large text corpora</article-title>
          .
          <source>In Proceedings of the 14th conference on Computational linguistics-Volume</source>
          <volume>2</volume>
          , pages
          <fpage>539</fpage>
          {
          <fpage>545</fpage>
          . Association for Computational Linguistics,
          <year>1992</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [Hul03]
          <string-name>
            <given-names>Anette</given-names>
            <surname>Hulth</surname>
          </string-name>
          .
          <article-title>Improved automatic keyword extraction given more linguistic knowledge</article-title>
          .
          <source>In Proceedings of the 2003 conference on Empirical methods in natural language processing</source>
          , pages
          <volume>216</volume>
          {
          <fpage>223</fpage>
          . Association for Computational Linguistics,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [KTH06]
          <string-name>
            <surname>Huang-Cheng</surname>
            <given-names>Kuo</given-names>
          </string-name>
          , Tsung-Han
          <string-name>
            <surname>Tsai</surname>
          </string-name>
          , and
          <string-name>
            <surname>Jen-Peng Huang</surname>
          </string-name>
          .
          <article-title>Building a concept hierarchy by hierarchical clustering with join/merge decision</article-title>
          .
          <source>In Proceedings of the 9th Joint Conference on Information Sciences, JCIS</source>
          <year>2006</year>
          , volume
          <year>2006</year>
          ,
          <volume>01</volume>
          2006.
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [ROB17]
          <string-name>
            <given-names>Cecile</given-names>
            <surname>Robin</surname>
          </string-name>
          ,
          <string-name>
            <surname>James O'Neill</surname>
            , and
            <given-names>Paul</given-names>
          </string-name>
          <string-name>
            <surname>Buitelaar</surname>
          </string-name>
          .
          <article-title>Automatic taxonomy generation - A use-case in the legal domain</article-title>
          .
          <source>CoRR, abs/1710</source>
          .
          <year>01823</year>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [sBS12]
          <article-title>Vi sit Boonchom and Nuanwan Soonthornphisaj</article-title>
          .
          <article-title>Atob algorithm: an automatic ontology construction for thai legal sentences retrieval</article-title>
          .
          <source>Journal of Information Science</source>
          ,
          <volume>38</volume>
          (
          <issue>1</issue>
          ):
          <volume>37</volume>
          {
          <fpage>51</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <surname>[SE09] Jorge</surname>
            <given-names>M Santos</given-names>
          </string-name>
          and
          <string-name>
            <given-names>Mark</given-names>
            <surname>Embrechts</surname>
          </string-name>
          .
          <article-title>On the use of the adjusted rand index as a metric for evaluating supervised classi cation</article-title>
          .
          <source>In International Conference on Arti cial Neural Networks</source>
          , pages
          <volume>175</volume>
          {
          <fpage>184</fpage>
          . Springer,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [SG+07]
          <string-name>
            <surname>Erich</surname>
            <given-names>Schweighofer</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>Anton</given-names>
            <surname>Geist</surname>
          </string-name>
          , et al.
          <article-title>Legal query expansion using ontologies and relevance feedback</article-title>
          .
          <source>In LOAIT</source>
          , pages
          <volume>149</volume>
          {
          <fpage>160</fpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [SN11]
          <article-title>Sebastian Stober and Andreas Nurnberger. An experimental comparison of similarity adaptation approaches</article-title>
          .
          <source>In International Workshop on Adaptive Multimedia Retrieval</source>
          , pages
          <volume>96</volume>
          {
          <fpage>113</fpage>
          . Springer,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [STT95]
          <article-title>Anne Schiller, Simone Teufel, and Christine Thielen. Guidelines fur das tagging deutscher textcorpora mit stts</article-title>
          .
          <source>Technical report, Universitaten Stuttgart und Tubingen</source>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <surname>[VC98] Pepijn</surname>
            <given-names>R.S.</given-names>
          </string-name>
          <string-name>
            <surname>Visser</surname>
            and
            <given-names>Zhan</given-names>
          </string-name>
          <string-name>
            <surname>Cui</surname>
          </string-name>
          .
          <article-title>Heterogeneous ontology structures for distributed architectures</article-title>
          ,
          <year>1998</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [VZ07]
          <string-name>
            <given-names>Fabio</given-names>
            <surname>Vitali</surname>
          </string-name>
          and
          <string-name>
            <given-names>Flavio</given-names>
            <surname>Zeni</surname>
          </string-name>
          .
          <article-title>Towards a country-independent data format: the akoma ntoso experience</article-title>
          .
          <source>In Proceedings of the V legislative XML workshop</source>
          , pages
          <volume>67</volume>
          {
          <fpage>86</fpage>
          .
          <string-name>
            <surname>Florence</surname>
          </string-name>
          , Italy: European Press Academic Publishing,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [WBM18]
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Waltl</surname>
          </string-name>
          , Georg Bonczek, and
          <string-name>
            <given-names>Florian</given-names>
            <surname>Matthes</surname>
          </string-name>
          .
          <article-title>Rule-based information extraction - advantages, limitations, and perspectives</article-title>
          .
          <source>Jusletter IT</source>
          , 02
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          <string-name>
            <surname>[MHAK16] C.J. MacLellan</surname>
            , E. Harpstead,
            <given-names>V.</given-names>
          </string-name>
          <string-name>
            <surname>Aleven</surname>
            , and
            <given-names>K.R.</given-names>
          </string-name>
          <string-name>
            <surname>Koedinger</surname>
          </string-name>
          .
          <article-title>Trestle: A model of concept formation in structured domains</article-title>
          .
          <source>Advances in Cognitive Systems</source>
          ,
          <volume>4</volume>
          :
          <fpage>131</fpage>
          {
          <fpage>150</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [WBVvS14]
          <string-name>
            <given-names>Radboud</given-names>
            <surname>Winkels</surname>
          </string-name>
          , Alexander Boer, Bart Vredebregt, and Alexander van Someren.
          <article-title>Towards a legal recommender system</article-title>
          .
          <source>In JURIX</source>
          , volume
          <volume>271</volume>
          , pages
          <fpage>169</fpage>
          {
          <fpage>178</fpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [WLM16]
          <string-name>
            <given-names>Bernhard</given-names>
            <surname>Waltl</surname>
          </string-name>
          , Jorg Landthaler, and
          <string-name>
            <given-names>Florian</given-names>
            <surname>Matthes</surname>
          </string-name>
          .
          <article-title>Di erentiation and empirical analysis of reference types in legal documents</article-title>
          .
          <source>In JURIX</source>
          , pages
          <volume>211</volume>
          {
          <fpage>214</fpage>
          ,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [WLW+15]
          <string-name>
            <surname>Shuting</surname>
            <given-names>Wang</given-names>
          </string-name>
          , Chen Liang, Zhaohui Wu,
          <string-name>
            <given-names>Kyle</given-names>
            <surname>Williams</surname>
          </string-name>
          , Bart Pursel, Benjamin Brautigam, Sherwyn Saul,
          <string-name>
            <given-names>Hannah</given-names>
            <surname>Williams</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Kyle</given-names>
            <surname>Bowen</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C</given-names>
            <surname>Lee</surname>
          </string-name>
          <article-title>Giles</article-title>
          .
          <article-title>Concept hierarchy extraction from textbooks</article-title>
          .
          <source>In Proceedings of the 2015 ACM Symposium on Document Engineering</source>
          , pages
          <volume>147</volume>
          {
          <fpage>156</fpage>
          . ACM,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          <string-name>
            <surname>[WZH16] Minmei</surname>
            <given-names>Wang</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bo Zhao</surname>
            ,
            <given-names>and Yihua</given-names>
          </string-name>
          <string-name>
            <surname>Huang</surname>
          </string-name>
          . Ptr:
          <article-title>Phrase-based topical ranking for automatic keyphrase extraction in scienti c publications</article-title>
          .
          <source>In International Conference on Neural Information Processing</source>
          , pages
          <volume>120</volume>
          {
          <fpage>128</fpage>
          . Springer,
          <year>2016</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [ZK07]
          <string-name>
            <given-names>Paul</given-names>
            <surname>Zhang</surname>
          </string-name>
          and Lavanya Koppaka.
          <article-title>Semanticsbased legal citation network</article-title>
          .
          <source>In Proceedings of the 11th international conference on Arti - cial intelligence and law</source>
          , pages
          <volume>123</volume>
          {
          <fpage>130</fpage>
          . ACM,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>