<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Singling out Legal Knowledge from World Knowledge. An NLP-based approach</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Francesca Bonin</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Felice Dell'Orletta◦</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Giulia Venturi◦</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Simonetta Montemagni◦</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universita` di Pisa</institution>
          ,
          <addr-line>Dipartimento di Informatica - Pisa</addr-line>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2010</year>
      </pub-date>
      <fpage>39</fpage>
      <lpage>50</lpage>
      <abstract>
        <p>Ontology learning in the legal domain rises the well-known problem of epistemological promiscuity between legal entities and regulated domain instances. In this paper, we propose a new term extraction approach specifically aimed at tackling such a problem through the acquisition of a term glossary where legal terms, expressing legal concepts, and domain terms, providing a description of the regulated world knowledge, are automatically singled out. The proposed approach has been tested with promising results on a corpus of Italian European legal texts regulating the environmental domain.</p>
      </abstract>
      <kwd-group>
        <kwd>Terminology Extraction</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Legal Ontology</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Scholars committed to modeling legal domain knowledge have widely
acknowledged with the need for domain–specific knowledge organization, i.e.
legal ontologies, where domain knowledge (legal knowledge) and
knowledge of domains of interest to be regulated (referred to as world knowledge)
are not mixed. However, as pointed out in
        <xref ref-type="bibr" rid="ref4">Breuker et al. (2004)</xref>
        , the
indiscriminate mixture of the two types of knowledge is a common attitude in
constructing legal ontologies. In particular, Breuker and colleagues speak
of epistemological promiscuity, putting the emphasis on how this is a
serious problem in core ontology development. They point out that many legal
ontologies collapse together epistemological and ontological perspectives.
Starting from the well-known assumption that “by its very nature, law deals
with behaviour in the world”, they discuss how domain independent concepts
of law are tained with common–sense notions which refer to social activities.
Interestingly, they claim that “the domain ontologies [they] developed in the
various project contained almost ninety–nine percent terms that belonged to
the category ‘world knowledge’, i.e. the world the legal domain is about”.
On the contrary, a core ontology should exclusively include “typical legal
concepts, like norm, responsibility, person (agent), action, etc.”. Moreover,
the most serious consequence envisaged is that “ontologies mixed with
epistemological frameworks have a far more limited re–use and may pose more
interoperability problems than clean ontologies.” In fact, the level of
generality adopted in constructing a domain ontology is closely related to the
reusability issue. According to the state of the art in ontology design criteria
reported in
        <xref ref-type="bibr" rid="ref6">Casellas (2008)</xref>
        , several levels can be established ranging from the
more abstract top or upper–level ontologies, which include general concepts
not domain–specific, and core ontologies, which provide top–level domain–
specific (i.e. legal) concepts, to domain–specific ontologies, which organize
world knolwedge, providing a description of a specific domain of interest to
be regulated.
      </p>
      <p>
        Building on these emergent issues,
        <xref ref-type="bibr" rid="ref9">Francesconi (2010)</xref>
        has recently
proposed an approach to legal knowledge modeling based on the separation of
legal and world knowledge and oriented to interoperability and reusability.
According to the knowledge model suggested, two levels of conceptualization
are envisaged: a Domain Independent Legal Knowledge (DILK) level, which
provides a model for legal rules independently from the domain they apply
to, and a Domain Knowledge (DK) level, which offers information and
relationships among entities specific for a given regulated domain. This approach
follows
        <xref ref-type="bibr" rid="ref2">Biagioli (2009)</xref>
        , who claims that a law simultaneously describes the
occurring events and regulates them.
      </p>
      <p>In this paper, we face the epistemological promiscuity problem at the
level of the acquisition of terminological knowledge from legal texts.
Instead of starting from ready–made epistemological and ontological concepts,
which are defined a priori on the basis of domain–theoretical assumptions,
we propose a term extraction approach overtly aimed at automatically
discriminating legal terms from regulated–domain terms. The paper is organised
as follows: in Section 2, we motivate the proposed approach by discussing
the background literature. Section 3 presents our Terminology Extraction
methodology, while the results of a term extraction experiment on a corpus
of Italian European legal texts concerning the environmental domain are
reported in Section 4. The evaluation of achieved results is discussed in Section
5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background and motivation</title>
      <p>
        As widely acknowledged in the literature, terminology extraction is the first
and most–established step in ontology learning from texts. To put it in
        <xref ref-type="bibr" rid="ref5">Buitelaar et al. (2005)</xref>
        words, “terms are linguistic realizations of domain–specific
concepts and are therefore central to further, more complex tasks”. In this
context, the peculiar challenge posed by legal texts consists in the fact that
they simultaneously contain legal terms and regulated domain terms. When
dealing with legal texts, the process of terminological acquisition thus needs
to take into account two main issues: i) the extraction of terms corresponding
to domain–relevant concepts, and ii) the identification of the specific domain
they refer to (i.e. the regulated domain or the legal domain). We strongly
believe that singling out legal terms, i.e. those which express legal
knowledge, from terms of the specific domain being regulated, i.e. those which
express world knowledge, represents a helpful starting point for any further
construction of legal ontologies where legal and world knowledge is kept
separate.
      </p>
      <p>
        Differently from the community of legal ontology developers, to our
knowledge the problem of legal knowledge mingled with world knowledge has been
addressed only in a few cases within the terminology extraction literature, i.e.
by
        <xref ref-type="bibr" rid="ref11">Lame (2005)</xref>
        and Lenci et al. (2009). The NLP–based terminology
extraction experiments from French Codes carried out in
        <xref ref-type="bibr" rid="ref11">Lame (2005)</xref>
        and aimed at
identifing legal ontology components resulted in the irrelevance of statistical
indices (such as Term frequency or Tf, Inverse document frequency or idf,
etc.) to single out legal terms from domain terms. In the analysis of results
achieved with the T2K (Text–to–Knowledge ) ontology learning system, Lenci
et al. (2009) notice that, as expected from the peculiar nature of processed
documents, the acquired term bank includes both legal and regulated–domain
terms. Since the two classes of terms show quite different frequency
distributions, several acquisition experiments were carried out by setting different
thresholds: it turned out that terms belonging to the target domain
regulated by law are always scarcely represented in the final result, due to their
high rank (and low frequency) according to Zipf’s law. Note however that,
differently from
        <xref ref-type="bibr" rid="ref11">Lame (2005)</xref>
        , Lenci et al. (2009) main concern was not
the classification of terms but rather the fact that both term types should be
adequately represented in the final result.
      </p>
      <p>
        To deal with the epistemological promiscuity problem and to overcome the
aforementioned difficulties, we propose an approach simultaneously meant
to acquire relevant terminology from legal texts and to discriminate between
legal and regulated–domain terms. For this purpose, we follow the layered
approach to terminology extraction described in
        <xref ref-type="bibr" rid="ref3">Bonin et al. (2010)</xref>
        , where,
firstly, candidate terms are identified using state–of–the–art statistical
measures and, secondly, a shortlist of well–formed and relevant candidate terms
is reranked by applying a contrastive method. The goal of this paper is to show
to what extent such a methodology is successful in acquiring from a corpus of
Italian European legal texts concerning the environmental domain a term list
where terms belonging to the legal domain (e.g. disposizione nazionale
‘national provision’, disposizione di presente direttivo ‘provision of the present
directive’, etc.) and to the regulated environmental domain (e.g. sostanza
pericoloso ‘hazarous substance’, valore limite di emissione ‘emission limit
value’, etc.) are clearly singled out. Following
        <xref ref-type="bibr" rid="ref5">Buitelaar et al. (2005)</xref>
        , this can
be the starting point to develop a domain ontology where concepts expressing
legal and world knowledge are not mixed.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. The term extraction approach</title>
      <p>
        The term extraction method we followed, described in detail in
        <xref ref-type="bibr" rid="ref3">Bonin et al.
(2010)</xref>
        , combines NLP techniques, linguistic and statistical filters. For our
present purposes, we are interested both in one–word terms (single terms),
e.g. president, as well as multi–word terms (complex terms), e.g. president of
republic.
      </p>
      <p>
        As shown in Figure 1, which illustrates the general extraction process, the
input text is firstly tokenized, morphologically analyzed (i.e. PoS–tagged)
and lemmatized passing through a pipeline of state–of–the–art NLP tools for
the analysis of Italian texts. The PoS–tagged text, obtained with the tagger
described in
        <xref ref-type="bibr" rid="ref8">Dell’Orletta (2009</xref>
        ), is searched for on the basis of linguistic
filters aimed at identifying a) nouns, expressing candidate single terms and
b) PoS patterns covering the main nominal modification types which express
candidate complex terms. It is the case of morpho–syntactic templates such
as noun + adjective (e.g. decreto legislativo ‘legislative decree’), noun +
preposition + noun (e.g. decreto del presidente lit. ‘decree of the president’),
etc.
      </p>
      <p>
        At this stage, the candidate single terms are ranked on the basis of their
frequency of occurrence in the input text, while the candidate complex terms
are ranked on the score of a different statistical filter. For this purpose, the
C-NC Value measure is used as described in
        <xref ref-type="bibr" rid="ref10">Frantzi et al. (1999)</xref>
        and Vintar
(2004). It is currently considered as the state–of–the–art method for
terminology extraction and it is meant to assessing the likelihood for a term of being
a well–formed and relevant multi–word term. Afterwards, the contrastive
method is applied against the list of ranked candidate single and multi–word
terms. As shown in Figure 1, where the intermediate output of the extraction
process is displayed in a dotted box, the two top lists of candidate (single and
multi-word) terms are contrasted firstly against the term list extracted from an
open–domain corpus and secondly against a top list of terms acquired from a
legal corpus differing at the level of the regulated domain. In both contrastive
phases, the contrastive function (CSmw) newly introduced in
        <xref ref-type="bibr" rid="ref3">Bonin et al.
(2010)</xref>
        is used. The CSmw score is based on the arctangent function that tends
to valorize less frequent data, and in fact reveled to be suitable for handling
variation in low frequency events such as multi–words or regulated–domain
terms. The first contrastive analysis stage (so–called “1st contrast”) is meant
to prune common words (if any) from the list of domain–relevant terms,
while the second contrastive analysis stage (so–called “2nd contrast”) allows
obtaining a list of terms where regulated–domain and legal terminology is
discriminated, being respectively at the top and at the bottom of the final term
list.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Experiments and results</title>
      <p>The term extraction methodology described above has been tested on a
document corpus constituted by a collection of European legal texts of 394,088
word tokens concerning the environmental domain (hereafter referred to as
“Environmental Corpus”). Following the extraction process illustrated in
Section 3, for the first contrastive analysis stage we used as open–domain
contrastive corpus the PAROLE Corpus (Marinelli et al., 2003), made up of about
3 million words and including Italian texts of different types (newspapers,
books, etc.) testifying general language usage; for the second contrastive
analysis stage, a corpus of 74,210 word tokens, containing European law texts
on consumer protection (hereafter generically referred to as “Legal Corpus”),
was used instead.</p>
      <p>
        In the rest of the paper, we will focus on the extraction of multi–word
terms. The reason for this choice is twofold: if on the one hand multi–word
terms have been demonstrated to cover the vast majority of domain-specific
terminology (85% according to Nakagawa et al. (2003)), on the other hand
the proposed process of complex terms extraction highlights a number of
novelties worth discussing further. As noted in
        <xref ref-type="bibr" rid="ref3">Bonin et al. (2010)</xref>
        , differently
from previous studies which follow contrastive approaches, such as
        <xref ref-type="bibr" rid="ref1">Basili et
al. (2001)</xref>
        , Penas et al. (2001) and
        <xref ref-type="bibr" rid="ref7">Chung et al. (2004)</xref>
        , we prefer basing
complex term acquisition on their concrete occurrence in texts as unique elements
separate from single terms. Althought this novelty is not the main focus of
the present work, it is interesting to point out how this new method aims at
extracting only those multi-words that are specifically relevant in the domain
at hand. In fact, the relevant single term principio ‘principle’ is extracted.
However multi–words headed by this single term are not extracted, unless
they are relevant themselves for the domain topic, differently from
        <xref ref-type="bibr" rid="ref1">(Basili et
al., 2001)</xref>
        where all multi–word terms, having a domain specific single head,
are extracted, independently from their domain specificity; in other words,
we will not extract terms such as principio di precauzione ‘precautionary
principle’ and principio fondamentale ‘fundamental principle’ even if they
occur in texts and share the same single head term (i.e. principio ‘principle’).
Instead we acquire complex terms such as principio attivo ‘active
ingredient’ and principio di sussidiariet a` ‘principle of subsidiarity’ that are relevant
multi–word terms themselves.
      </p>
      <p>In the extraction experiment we carried out, we started from the extraction
of a list of well formed candidate multi-words, in line with the morpho–
syntactic constraints we set. Then, we selected a top list1 from the candidate
term list ranked on score of the statistical filter, thus obtaining a shortlist of
600 either legal (e.g. norma europea, ‘European norm’), environmental (e.g.
emissione di gas a effetto serra, ‘emission of greenhouse gases’) or open–
domain terms (e.g. direttore generale, ‘director–general’). Afterwards, we
firstly contrasted the top list of 600 multi–word terms against the top list
extracted from the PAROLE Corpus, in order to reduce the noise deriving
from highly frequent common words (e.g. giorno successivo, ‘following day’
or anno precedente, ‘previous day’), obtaining a list mainly made of
environmental and legal terms. Then, in order to distinguish environmental and legal
terms, we contrasted a top list of 300 environmental–legal multi–word terms
against the top list extracted from the Legal Corpus, obtaining a final list of
300 terms ranked on the contrastive score. In this final list, environmental
terms were expected to be found at the top of the final list ranked according
to the contrastive score, while the legal terms were expected at the bottom.
Tables I and II report respectively the first and the last 10 multi–word terms
of the final 300 multi–word term list we obtained after the second step of
contrast. Interestingly enough, the top of the final list as reported in Table I
contains environmental terms, represented by the first 10 multi–word terms
extracted from the Environmental Corpus ranked according to their
decreasing contrastive score. Table II shows the final part of the list, constituted by
the legal terms (the 10 multi–word terms extracted from the Environmental
Corpus ranked according to their increasing contrastive score). These results
will be discussed in Section 5.</p>
      <p>1 Note that the thresholds we set up for this experiment were empirically defined and
mainly meant to show to what extent the proposed approach was correctly working for what
concerns the filtering of legal and environmental terms. It goes without saying that final
thresholds should be defined by taking into account the size of the document collection as well as
typology and reliability of expected results.
sostanza pericoloso (hazarous substance)
salute umano (human health)
sviluppo sostenibile (sustainable developement)
principio attivo (active ingredient)
inquinamento atmosferico (air pollution)
effetto serra (greenhouse effect)
rifiuto pericoloso ( hazardous waste)
valore limite di emissione (emission limit value)
corpo idrico (water body)
cambiamento climatico (climate change)
Contrastive ranking
funzionamento di mercato interno
(functioning of national market)
disposizione nazionale (national provision)
disposizione essenziale di diritto interno
(essential internal provision of national law)
testo di disposizione essenziale
di diritto (text of essential provision )
testo di disposizione (text of provision )
diritto nazionale (national law)
diritto interno (national law)
livello di protezione (level of protection)
disposizione di presente direttivo
(provision of the present directive)
norma nazionale (national rule)
Contrastive ranking</p>
    </sec>
    <sec id="sec-5">
      <title>5. Evaluation</title>
      <sec id="sec-5-1">
        <title>5.1. GENERAL EVALUATION CRITERIA</title>
        <p>The multi–word term list extracted from the Environmental Corpus has been
evaluated in two different steps. First, it has been automatically compared
against two different gold-standard resources selected for the environmental
and legal domains. In particular, we used a) the thesaurus EARTh
(Environmental Applications Reference Thesaurus)2, containing 12,398 terms, as a
reference resource for what concerns the environmental domain, and b) the
Dizionario giuridico (Edizioni Simone) available online3, including 1,800
terms, for the legal domain. Afterwards, those terms which have not been
categorized as belonging to a specific domain during this automatic
evaluation phase were manually validated by legal and environmental experts. These
two different phases of evaluation were due to the fact that the considered
reference resources have a good coverage of domain specific single terms (e.g.
disposizione, ‘provision’, valore ‘value’, etc.), but they do not have a proper
coverage of domain-specific complex terms (e.g. disposizione essenziale del
diritto, ‘law essential provision’, valore limite di emissione ‘emission limit
value’).</p>
        <p>In order to evaluate how legal and environmental terms are distributed
in the acquired 300–term list we further divided this list in 30–term groups.
Interestingly, although the top list of 300 evaluated terms is quite small, it
proved to be reliable in order to test to what extent the term extraction method
we proposed can help to single out legal and regulated–domain terminology.
However, we think that a future evaluation of a wider amount of extracted
terms can provide more detailed insights into the distribution of the two types
of terminology within a term list automatically acquired from legal corpora.
Similarly, we can foresee an evaluation in terms of recall (calculated as the
percentage of correctly acquired terms with respect to all terms in the gold
standard lexicon): unfortunately, this type of evaluation poses so far a
considerable problem due to the lack of a reference terminological resource aligned
with respect to the acquisition corpus.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. DISCUSSION OF RESULTS</title>
        <p>The distribution of three different types of terms was evaluated. For each
30–term group of the final 300–term list we computed the amount of i)
environmental terms, ii) legal terms, iii) terms which can refer to both domains,
such as politica ambientale, ‘environmental policy’. The remaining amount</p>
      </sec>
      <sec id="sec-5-3">
        <title>2 http://uta.iia.cnr.it/earth.htm#EARTh%202002 3 http://www.simone.it/newdiz</title>
        <p>of terms which were not categorized as belonging to types i), ii) or iii) are
represented by errors.</p>
        <p>As we can see in Table III which reports the distribution of the different
term types within each single 30–term group, the adopted contrastive function
is able to discriminate between environmental and legal terms. The first group
contains 16 environmental terms against 5 legal terms; in the last group 22
legal terms and no environmental terms occur. This trend is pointed out in
Figure 2, where the divergent lines show the different distributions of
environmental and legal terms across the different 30–term groups. The central
zone of the chart, with lines crossing each other, shows the turning point of
this trend, where legal terms outnumber the environmental ones. Moreover,
Figure 2 reveals a quite homogeneous distribution of terms which can refer
to both domains (referred to as ‘Environmental/Legal’ in Table III). It is the
case of terms such as politica ambientale ‘environmental policy’, obiettivo
ambientale ‘environmental object’, informazione ambientale ‘environmental
knowledge’, etc. which have been categorized by both domain experts as
belonging to a ‘twilight’ zone since they express general legal concepts which
acquire a domain–specific meaning. Interestingly, the analysis carried out by
the legal expert highlighted that some of the acquired environmental terms
are explicitly defined in the legal texts being considered: such terms are
associated with a high contrastive score and are located in the first 30–term group.
This is the case of rifiuto pericoloso , ‘hazardous waste’, sostanza pericolosa,
‘hazarous substance’, valore limite di emissione, ‘emission limit value’, etc.
whose meanings are explicitly defined in the acquisition corpus. For
example, Article 2 “Definitions”, letter g) of the Regulation (EC) no 2150/2002 of
the European Parliament and of the Council of 25 November 2002 on waste
statistics contains the following definition of ‘hazardous waste’: “hazardous
waste shall mean any waste as defined in Article 1(4) of Council Directive
91/689/EEC of 12 December 1991 on hazardous waste”. It may be possible
to conclude that such terms are particularly relevant for the regulated domain
being considered, and for this reason, occur with higher frequencies in the
target domain. This could open interesting developments in the field of legal
re–definition of the regulated–domain terms. In fact, as overtly pointed out
in Walter et al. (2006), the successful retrieval of definitions contained in
statutes and legal texts can help providing a large knowledge base to be used
in text–based ontology learning tasks.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion</title>
      <p>In this paper, we showed how a modular and contrastive approach to term
extraction can be usefully exploited in the legal domain to tackle the well–
known epistemological promiscuity problem. To our knowledge, it is the first
time that such a problem has been addressed in the terminology extraction
literature with successful results. In the proposed modular approach to term
extraction, candidate single and multi–word terms are first identified using
state–of–the–art statistical measures and are subsequently filtered by
applying a contrastive reranking method aimed at discriminating between acquired
legal terms and regulated–domain terms. The evaluation of achieved results,
carried out with the help of domain experts, showed that the proposed
approach is really effective in dealing with particularly challenging text types,
such as legislative texts.</p>
    </sec>
    <sec id="sec-7">
      <title>7. Acknowledgments</title>
      <p>The research reported in the paper has been partly supported by the
Italian FIRB project “Piattaforma di servizi integrati per l’Accesso semantico e
plurilingue ai contenuti culturali italiani nel web”. The authors would like to
thank Angela D’Angelo of the Scuola Superiore Sant’Anna of Pisa and Paolo
Plini of the Institute of Atmospheric Pollution, Environmental Terminology
Unit (CNR, Rome) who contributed as domain experts to the evaluation
process.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>Basili</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moschitti</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pazienza</surname>
          </string-name>
          , M.T., and
          <string-name>
            <surname>Zanzotto</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2001</year>
          ),
          <article-title>A contrastive approach to term extraction</article-title>
          ,
          <source>in Proceedings of the 4th Conference on Terminology and Artificial Intelligence (TIA-2001)</source>
          , Nancy.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <surname>Biagioli</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>Modelli funzionali delle leggi</article-title>
          .
          <source>Verso testi legislativi autoesplicativi, Series in Legal Information and Communication technologies</source>
          , vol.
          <volume>6</volume>
          , European Press Academic Publishing.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <surname>Bonin</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>2010</year>
          ),
          <article-title>A Contrastive Approach to Multi-word Term Extraction from Domain Corpora</article-title>
          ,
          <source>in Proceedings of the “7th International Conference on Language Resources and Evaluation (LREC</source>
          <year>2010</year>
          )
          <article-title>”</article-title>
          ,
          <string-name>
            <surname>La</surname>
            <given-names>Valletta</given-names>
          </string-name>
          , Malta,
          <fpage>19</fpage>
          -21 May, pp.
          <fpage>3222</fpage>
          -
          <lpage>3229</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <surname>Breuker</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          and
          <string-name>
            <surname>Hoekstra</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          (
          <year>2004</year>
          ),
          <article-title>Epistemology and Ontology in Core Ontologies: FOLaw and LRI-Core, two core ontologies for law</article-title>
          ,
          <source>in Proceedings of the “Workshop on Core Ontologies in Ontology Engineering” (EKAW04)</source>
          , Northamptonshire, UK, pp.
          <fpage>15</fpage>
          -
          <lpage>27</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <surname>Buitelaar</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Cimiano</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Magnini</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2005</year>
          )
          <article-title>Ontology Learning from Text: an Overview</article-title>
          , In Buitelaar et al. (eds.),
          <source>Ontology Learning from Text: Methods, Evaluation and Applications</source>
          Volume
          <volume>123</volume>
          ,
          <source>Frontiers in Artificial Intelligence and Applications</source>
          , pp.
          <fpage>3</fpage>
          -
          <lpage>12</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <surname>Casellas</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          (
          <year>2008</year>
          ),
          <article-title>Modelling Legal Knowledge through Ontologies. OPJK: the Ontology of Professional Judicial Knoweldge</article-title>
          ,
          <source>Ph.D. thesis</source>
          , Institute of Law and Technology, Autonomous University of Barcelona.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Chung</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Nation</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2004</year>
          ),
          <article-title>Identifying technical vocabulary</article-title>
          , in System,
          <volume>32</volume>
          , pp.
          <fpage>251</fpage>
          -
          <lpage>263</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <string-name>
            <surname>Dell'Orletta</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          (
          <year>2009</year>
          ),
          <article-title>Ensemble system for Part-of-Speech tagging</article-title>
          ,
          <source>in Proceedings of “Evalita'09”</source>
          , Reggio Emilia, December.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <surname>Francesconi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          (
          <year>2010</year>
          ),
          <article-title>Legal Rules Learning based on a Semantic Model for Legislation</article-title>
          ,
          <source>in Proceedings of the “Workshop on Semantic Processing of Legal Texts” (SPLeT-2010)</source>
          <article-title>, held in conjunction with the 7th Conference on Language Resources</article-title>
          &amp;
          <string-name>
            <surname>Evaluation (LREC 2010) La</surname>
            <given-names>Valletta</given-names>
          </string-name>
          , Malta, 23rd May, (in press).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Frantzi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          , and
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          (
          <year>1999</year>
          ),
          <article-title>The C-value / NC Value domain independent method for multi-word term extraction</article-title>
          ,
          <source>in Journal of Natural Language Processing</source>
          ,
          <volume>6</volume>
          (
          <issue>3</issue>
          ), pp.
          <fpage>145</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Lame</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          (
          <year>2005</year>
          ),
          <article-title>Using NLP techniques to identify legal ontology components: concepts and relations</article-title>
          , in Benjamins et al. (eds.),
          <article-title>Law and the Semantic Web</article-title>
          . Legal Ontologies, Methodologies,
          <source>Legal Information Retrieval, and Applications, Lecture Notes in Computer Science</source>
          , Volume
          <volume>3369</volume>
          , pp.
          <fpage>169</fpage>
          -
          <lpage>184</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>