<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Patricia Mart n-Chozas[</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Challenges of Terminology Extraction from Legal Spanish Corpora</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ontology Engineering Group, Universidad Politecnica de Madrid</institution>
        </aff>
      </contrib-group>
      <pub-date>
        <year>0000</year>
      </pub-date>
      <volume>0002</volume>
      <fpage>73</fpage>
      <lpage>83</lpage>
      <abstract>
        <p>Untangling the complexities of legal documentation is an imperative need for non practitioners of the legal profession. The terminology used in the domain is complex and it usually requires expert knowledge to be fully understood, since the legal framework is constantly being updated and the meaning of terms vary accordingly. Non-proprietary Automatic Terminology Extraction (ATE) tools are required in this particular domain in which documents contain private and sensitive data. This paper describes methods for obtaining accurate legal terms from labour law corpora, overcoming the di culties present in the area, and also analyses the peculiarities of the legal jargon, speci cally, in Spanish language. The performed experiments, executed with JATE, a wellknown open source library in the ATE literature, are still preliminary, but promising.</p>
      </abstract>
      <kwd-group>
        <kwd>Legal terminology</kwd>
        <kwd>Automatic Term Extraction</kwd>
        <kwd>Language Processing</kwd>
        <kwd>Semantic Web Technologies</kwd>
        <kwd>Natural</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>As evidenced by legal summaries published in Eurlex1, legal documentation such
as laws, directives, decrees and even regular notices are often complex to
understand by general public: legal terminology can be a real headache for people
that are not used to this type of language. At the same time, many citizens and
businesses across the European Union have to deal with signi cant compliance
problems: breach of contracts, overdue debts, excessive working hours, etc.</p>
      <p>With the aim of softening such complications, this paper proposes to retrieve
legal terms from Spanish corpora through Automatic Term Extraction (ATE)
techniques. Such legal terms are understood as words and multi-word
expressions with a speci c meaning within a legal text; collections of such terms are
considered terminologies. These terminologies could be afterwords interlinked
with other language resources to share information, which contributes to obtain
de nitions, translations and context, easing the comprehension of legal
documentation.</p>
      <p>ATE is a well-known technique in Natural Language Processing (NLP) that
supports important tasks such as machine translation, speech recognition or
information retrieval, to mention but a few. Several tools are already o ering this
kind of technology; however, many of them still present unresolved limitations,
such as noise generation, disambiguation issues or performance, delays with big
corpora, which are some of the most frequent. Here, two additional limitations
have been identi ed as crucial when dealing with legal information: domain
speci city and data privacy. Current ATE tools nd di culties when
extracting highly-speci c legal expressions and, on the other hand, they might include
personal data such as proper names and identity numbers in the resulting term
lists.</p>
      <p>
        For this reason, this contribution proposes to con gure the tool JATE [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
to extract terminology from the legal domain by analysing the peculiarities of
the legal language and adapting the extraction patterns to this jargon.
      </p>
      <p>The rst use case has been developed within the Lynx project, which has
provided the Spanish legal corpus2 used in the experiments. Lynx project3 is an
H2020 Innovation Action towards the creation of a Knowledge Graph of legal and
regulatory data from di erent jurisdictions and languages. This Legal Knowledge
Graph interlinks multilingual legal information and provides explanations and
context for legal expressions appearing in each document.</p>
      <p>The terms extracted with the model developed here contribute to the creation
of this platform that help European citizens understand legal documentation
without having to invest time and money in speci c legal consulting.</p>
      <p>JATE was originally developed for English terminology extraction; since this
use case deals with Spanish corpora, it was also required an extension of the tool
to cover Spanish language.</p>
      <p>This paper is organised as follows: Section 3 presents the related work on
term extraction technologies, Section 2 exposes the motivation behind this
contribution and the analysis of the problem, Section 4 describes the experiments
performed during this work and nally Section 5 includes conclusions and future
work to be performed in next stages.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Motivation</title>
      <p>Some hints of the motivation behind this project have already been presented
in the introduction: legal documentation is an unsolvable puzzle for laymen and
non-practitioners of the legal domain.</p>
      <p>In addition, many of the available legal language resources are not in machine
readable formats yet. The major part is published as PDF, which hinders their
look-up, and some of them are still distributed in physical format.</p>
      <p>The work proposed here tackles this situation by researching on a most
accurate terminology extraction methodology from those machine-readable legal</p>
      <sec id="sec-2-1">
        <title>2 http://data.lynx-project.eu/dataset/llcorpuses 3 http://lynx-project.eu/</title>
        <p>documents with the aim of creating new language resources that can be processed
by the newest technologies.</p>
        <p>A higher level of accuracy in the extracted terms means less time and money
invested in human post processing of the resulting term list. However, the most
important advantage lays on the e cacy of Semantic Web technologies: accurate
legal terms can easily be linked with other legal language resources, o ering users
more information such as translations, synonyms, context and related terms.
This additional information can help them improve their comprehension of the
legal documentation.</p>
        <p>Available tools present several limitations, and only a few of them are open
source. This means a major drawback since one of the ideas of this contribution
is to avoid using proprietary software to address privacy issues.</p>
        <p>Also, a common shortcoming of web based applications is personal data
management. Legal documents contain private data and it may not be safe to upload
them on a web-based tool. In the Lynx project, both public and sensitive
documents are being handled, thus, data privacy is an important factor to keep in
mind.</p>
        <p>Since JATE is an open source framework, it allows the use of a given
algorithm selected by the user and even the creation of new ones. This work includes
the extension of the tool to cover terminology extraction from Spanish corpora,
the analysis of the labour law corpora provided by Lynx partners to discover
speci c patterns of legal language and the con guration of such patterns in the
tool with the aim of extracting more accurate terms.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Related Work</title>
      <p>Prior to the availability of term extraction tools, this activity was carried out
by domain experts and terminology professionals. Despite being the most
accurate manner for terminology extraction, it is also the most expensive and
timeconsuming. Taking into account the amount of information generated nowadays,
human terminology extraction is unpractical.</p>
      <p>
        For this reason, automatic term extraction technologies have been extensively
studied in the literature [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and several tools based on statistical and linguistic
methods have already been developed.
      </p>
      <p>
        In previous work, a comparative evaluation of available ATE tools has been
performed [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. Each of them presents di erent features depending on the
format of the tool, typology of the targeted corpus, supported languages, types of
extracted terms, etc.
      </p>
      <p>Many of the tools analysed are web based applications: Translated.net4,
TermoStat5 and FiveFilters6, for instance, are free tools that can be accessed online.</p>
      <sec id="sec-3-1">
        <title>4 https://labs.translated.net/terminology-extraction/ 5 http://termostat.ling.umontreal.ca/ 6 https:// ve lters.org/term-extraction/</title>
        <p>However, some of the issues found in the evaluation above mentioned include
the extraction of stop words, visualisation of terms only in the website (not
downloadable) and di culties to nd terms from speci c domains.</p>
        <p>
          Other tools o er payable services, such as SketchEngine7, which is a
sophisticated tool with a good performance and additional services [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ]. Still, this
application presents some di culties in dealing with big corpora: at least in the
trial version, les need to be attached one by one.
        </p>
        <p>On the other hand there are other downloadable tools available, such as
TBXTools8 and TermSuite9. The rst one has been developed to o er domain
and language independent services, thus, implemented patterns might be too
general; and the latter does not extract compound terms, the main feature of
legal jargon.</p>
        <p>This contribution uses the open source library JATE10, since it integrates
the most important terminology extraction algorithms and can be integrated
in a local Solr indexer. Furthermore, it can also process large corpora and be
extended for other languages or for di erent purposes, such as the Spanish legal
domain.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>In the rst place, subsection 4.1 describes the extension of JATE for Spanish
language and the initial set of patterns applied. Secondly, subsection 4.2
contains an analysis of the features of legal terminology, speci cally within Spanish
labour law corpora. Afterwards, subsection 4.3 proposes the Spanish language
patterns to be con gured in JATE based on the analysis above mentioned. Lastly,
subsection 4.4 contains a description of the extraction tests performed.
4.1</p>
      <sec id="sec-4-1">
        <title>JATE extension for Spanish</title>
        <p>JATE has implemented ten algorithms: TTF, ATTF, TTF-IDF, RIDF, CValue,
X2, RAKE, Weirdness, GlossEx and TermEx.</p>
        <p>
          In these experiments, only Cvalue [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] and TTF-IDF [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] have been used, since
the former is intended for multi-word term extraction, one of the main features
of legal documents, while the latter measures the signi cance each term of based
on its frequency on the corpus.
        </p>
        <p>
          JATE relies on the OpenNLP library11 for tasks such as tokenisation,
sentence segmentation, part-of-speech (POS) tagging and chunking, being the last
two the most signi cant to process Spanish documents. For the POS tagging in
Spanish, a model trained for version 1.3 has been adapted to the latest version.
The di erent POS tags are coded with Cast3LB format [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], a freely available
7 https://www.sketchengine.eu/
8 https://sourceforge.net/projects/tbxtools/
9 http://termsuite.github.io/
10 https://github.com/ziqizhang/jate
11 http://opennlp.apache.org/
treebank for Spanish (see morphological tagset in Figure 1). For the chunking
in Spanish, there are no trained models available. However, JATE allows the
creation of patterns to identify chunks in natural language based on POS tags.
This chunks will re ect potential candidate terms. The Spanish patterns were
built from a general corpus composed by newspapers and general articles as per
the morphological instructions in Figure 1:
        </p>
        <p>
          In order to create the Spanish patterns, previous work with English tagsets
has been taken as a reference [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]. Such patterns have been translated to the
Spanish POS tags and modi ed according to the grammatical structures of the
Spanish language [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ].
        </p>
        <p>The initial tag patterns used for the rst extraction tests against a general
corpus were structured as in Figure 2.</p>
        <p>Based on this gure, examples of the rst patterns presented are:
{ (nbNCnb): Noun Common (e.g. regulation)
{ (nbNCnb) (nbNCnb) : Noun Common + Noun Common (e.g. family
background ).
{ (nbAQnb) (nbNCnb) : Adjective Qualifying + Noun Common (e.g. national
jurisdiction).</p>
        <p>
          Some peculiarities of the patterns showed in Figure 2 are, for instance, that
only common nouns have been considered to be extracted as simple terms. Verbs
and adjectives of general knowledge tend to be less relevant when building a
domain independent vocabulary. Hence, their tags have not been added to the
patterns to avoid noise generation.
The patterns exposed in the previous section were intended for general
information extraction. However, legal language has its own peculiarities that need to
be considered [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ]:
{ Long and intricate sentences
{ Scarce punctuation marks
{ Expressions in foreign languages (usually Latin) (e.g. inter alia)
{ Rare and complex expressions only used in a legal context:
        </p>
        <p>Legal terms of art: technical words with exact meaning that cannot be
replaced by other terms (e.g. comodato, meaning \bailment")
Legal jargon: terms and expressions used by lawyers, often archaic and
obsolete words (e.g. lo antedicho, meaning \the aforesaid")
Terms from the general language with a di erent meaning in the legal
domain (e.g. furnish, meaning \to provide something or send something")
These terms should be identi ed since, although in a general context they
may have little signi cance, they are relevant in a legal context.</p>
        <p>The labour law corpus used for this experiment is composed by 20
documents containing information on national company agreements from di erent
regions in Spain. Keeping the previous characteristics in mind, this corpus was
analysed before the extraction and the following considerations regarding legal
terminology were raised:
{ The major part of the terms are multiword expressions (e.g. convenio
colectivo, Bolet n O cial del Estado, grupo profesional )
{ Many of these multiword expressions are built with prepositions and
contractions connecting their elements (e.g. comite de empresa, prevencion de
riesgos laborales, estatuto del trabajador )
{ Some types of words that in a general context are not considered terms,
such verbs and adjectives, in the legal domain do need to be extracted as
terms since they have speci c meaning (e.g. vigente, enunciativo, devengar,
retribuir )
{ For the purposes of these experiments, proper names are not considered
legal terms and must be avoided in the extraction stage (e.g. Comunidad de
Madrid, Principado de Asturias, empresa Hermanos Fernandez )
{ Similarly, URLs, numbers and dates must also be kept out from the resulting
termlists (e.g.\https://www.boe.es/")
{ Finally, terms including ordinal adjectives that in other jurisdictions may
have a unique meaning, such as the \Third Amendment", do not exist in
Spanish legislation, so these types of words should not be extracted
4.3</p>
      </sec>
      <sec id="sec-4-2">
        <title>Proposal of Spanish legal language patterns</title>
        <p>Thus, the rst version of the Spanish general patterns was modi ed accordingly.
Some of the patterns were removed and other new tag con gurations were added:</p>
        <p>Following the legal language analysis described in Section 4.3 and the
morphological tagset showed in Section 4.1, some examples of the patterns added
are:
{ (nbAQnb): Adjective Qualifying (e.g. vigente, meaning \in force")
{ (nbVMnb): Verb Main (e.g. devengar, meaning \accrure")
{ (nbNCnb) (nbAQnb) (nbSPnb) (nbDAnb) (nbNCnb): Noun Common +
Adjective Qualifying + Simple Preposition + Determiner Article +
Noun Common (e.g. Bolet n O cial del Estado, meaning \O cial Bulletin
of the State")</p>
        <p>On the other hand, other patterns have been removed to avoid noise
generation, since they were not considered relevant for legal terminology:
{ (nbAOnb) (nbNCnb): Adjective Ordinal + Noun Common (e.g. Third
Amendment )
{ (nbNCnb) (nbNCnb): Noun Common + Noun Common. This pattern was
deleted since it was observed that the POS tagger sometimes tags proper
nouns as NC (common nouns), and the tool extracts structures that are not
real terms, such as empresa Apple.
{ Patterns that link several common nouns or several adjectives have also
been removed, since they are very extended in English but not in Spanish
grammar which, normally, uses prepositions to separate each component of
the term.
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>Extraction tests</title>
        <p>The testing corpus provided by Lynx partners contains 20 les comprised of
21,475 tokens in TXT format that have been automatically converted apart
from PDF les from the labour law domain in Spanish. Two tests have been
performed over the Lynx corpus, one per pattern set:
{ Extraction tests with Cvalue and TTF-IDF algorithms applying the General
Spanish patterns.</p>
        <p>Samples of the most relevant extracted terms are convenio colectivo
(collective agreement), direccion de la empresa (company management), empresa
(company) and trabajador (worker).
{ Extraction tests with Cvalue and TTF-IDF algorithms applying the Legal
Spanish patterns.</p>
        <p>Most of the main terms recognised with the general Spanish patterns still
remain in the same positions with similar scores. However, legal patterns have
introduced new relevant terms in the legal domain such as vacaciones por
antiguedad (seniority holidays), asambleas convocadas por el comite
(committeeorganised assemblies), documentos relativos a la liquidacion (liquidation
documents), jubilado (retired) or disciplinarios (discplinary).</p>
        <p>From the resulting lists of terms, sorted by relevance, the rst 200 terms12
have been considered as the most meaningful, since they have signi cance in their
scores. In this context, \relevant terms" are those legal expressions that can be
used to annotate and classify documents by topic or typology, this is, terms that
represent the legal domain. Table 1 collects some of the new extracted by JATE,
applying the Spanish Legal Patterns13.
12 http://doi.org/10.5281/zenodo.2385437
13 https://github.com/oeg-upm/terminology-extractor
New terms</p>
        <p>Cvalue
trabajador tendra (worker will have)
conocimientos adquiridos en el desempen~o
(acquire knowledge during the performance)
ley de prevencion de riesgos (risk prevention law)
representacion legal de los trabajadores
(legal representation of workers)</p>
        <p>rma del presente (signing the present)
texto refundido de la ley (combined text of law)
bocmbolet n o cial de la comunidad
(bocmo cial bulletin of the community)</p>
        <p>entrada en vigor del presente
(implementation of the present contract)</p>
        <p>miembros del comite de empresa
(members of the company committee)</p>
        <p>comision mixta de interpretacion
(mixed interpretation commission)</p>
        <p>bolet n o cial de la comunidad
(o cial bulletin of the community)
bolet n o cial de la junta (o cial bulletin of the council)</p>
        <p>Table 1. New terms extracted by the legal patterns
New terms</p>
        <p>TTF-IDF
anonima [AQ]
(anonymous)</p>
        <p>exible [AQ]
( exible)</p>
        <p>dura [AQ]
(severe)</p>
        <p>sanitario [AQ]
(sanitary)</p>
        <p>discontinuo [AQ]
(discontinuous)</p>
        <p>grave [AQ]
(serious)</p>
        <p>mixta [AQ]
(mixed)</p>
        <p>From this table, the following considerations can be educed:
{ New terms retrieved by CValue algorithm are comprised of multiword terms.</p>
        <p>The algorithm has been enriched with more complex nominal chunks that
are relevant in the corpus. However, for the TTF-IDF algorithm, the main
results retrieved are based on the individual terms, in this case adjectives.
{ The tool presents some tagging mistakes: the term trabajador tendra has not
correctly been extracted, since the last component is a verb and there is not
such pattern in the set. From the con guration of the patterns, the tool has
tagged this verb as a common noun or a qualifying adjective (see Figure 3).
{ Also, there might be some tokenisation mistakes in the source corpus, since
the term bocmbolet n o cial de la comunidad has not been correctly extracted
either. Another possibility is that the PDF to TXT conversion inserted
mistakes such this one in the source les. The quality of the source le must be
reviewed and improved.
5</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and future work</title>
      <p>Results present little variations: 6% of new terms applying legal patterns with
Cvalue and 3.5% with TTF-IDF. This situation is mainly given by POS tagging
mistakes: the tool tends to tag any unknown word as common noun since it is
the most frequent type of word in texts. Thus, some proper names, URLs, verbs
and adjectives are extracted as common nouns, spoiling the patterns. Training
the model again with larger corpora from the domain would avoid many of these
mistaken tags.</p>
      <p>Also, not all the extracted terms are relevant for the legal domain. Some
terms (e.g. sanitario, dura, mixta) do not belong to the legal terminology, so it
would be required to generate a list with terms from the general usage to escape
such extractions in future experiments.</p>
      <p>
        On the other hand, many of the wrong extractions are caused by tokenisation
mistakes in the original corpus: clean and well structured source documents
would avoid this issue. Moreover, other libraries such as IXA Pipes [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] cover
better NLP tasks for Spanish Language. It is proved that its POS tagger retrieves
better results for Spanish corpora and it uses EAGLE tags 14 for the word classes,
which contain more information about them (e.g, gender, number, form). Even
though patterns would require to be coded again, they could represent much
more grammatical content.
      </p>
      <p>Another immediate convenient step to improve and generate a sound set
of legal patterns is to consult grammatical peculiarities of legal language with
actual legal experts: lawyers, prosecutors, judges, law students, etc., since these
professionals are the best knowledge source of this domain.</p>
      <p>On the whole, these experiments have been performed to highlight the
importance of developing terminology extractors without the need of using online
platforms for domains that deal with sensible data that cannot be distributed
to third parties.</p>
      <p>Regarding the use of JATE, the best approach here is to use several
algorithms (in this case, Cvalue and TTF-IDF) with di erent features and
performances to get more comprehensive results and to get a better overview of terms
of the corpus. For instance, in the experiments, the results have shown the
importance of the nominal chunks in the corpus although there are adjectives and
nouns that are importance by themselves.</p>
      <p>
        Also, since the tool allows the implementation and creation of new
algorithms, another interesting experiment is the testing of other existing algorithms
such KEA [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], used by other available tools, and the generation of a customised
algorithm for legal language to test its accuracy.
      </p>
      <p>Finally, although in these experiments only public data have been handled, a
future line of work would be focused on identifying sensitive data to identify and
remove named entities of persons and organizations in a given corpus to treat
personal and private data. Furthermore, entity linking processes of the retrieved
terms are also considered in order to validate accuracy of the terms, and search
for related terms in relevant knowledge bases.
14 http://www.lsi.upc.es/ nlp/tools/parole-sp.html</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Agerri</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bermudez</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rigau</surname>
          </string-name>
          , G.:
          <article-title>Ixa pipeline: E cient and ready to use multilingual nlp tools</article-title>
          .
          <source>In: LREC</source>
          . vol.
          <year>2014</year>
          , pp.
          <volume>3823</volume>
          {
          <issue>3828</issue>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>A methodology for automatic term recognition</article-title>
          .
          <source>In: Proceedings of the 15th conference on Computational linguistics-Volume</source>
          <volume>2</volume>
          . pp.
          <volume>1034</volume>
          {
          <fpage>1038</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Civit</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mart</surname>
            ,
            <given-names>M.A.</given-names>
          </string-name>
          :
          <article-title>Building cast3lb: A spanish treebank</article-title>
          .
          <source>Research on Language and Computation</source>
          <volume>2</volume>
          (
          <issue>4</issue>
          ),
          <volume>549</volume>
          {
          <fpage>574</fpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Costa</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zaretskaya</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pastor</surname>
            ,
            <given-names>G.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Seghiri</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Nine terminology extraction tools: Are they useful for translators? Multilingual (</article-title>
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5. Espan~ola, R.A.:
          <article-title>Nueva gramatica de la lengua espan~ola (</article-title>
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Haigh</surname>
            ,
            <given-names>R.: Legal</given-names>
          </string-name>
          <string-name>
            <surname>English. Routledge</surname>
          </string-name>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Handschuh</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>QasemiZadeh</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>The acl rd-tec: a dataset for benchmarking terminology extraction and classi cation in computational linguistics</article-title>
          .
          <source>In: COLING</source>
          <year>2014</year>
          : 4th International Workshop on Computational Terminology (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Hidalgo</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>La ambiguedad en el lenguaje jur dico: su diagnostico e interpretacion a traves de la lingu stica forense</article-title>
          . Anuari de Filologia. Estudis de Lingu
          <source>stica (7)</source>
          ,
          <volume>73</volume>
          {
          <fpage>96</fpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Justeson</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Katz</surname>
            ,
            <given-names>S.M.:</given-names>
          </string-name>
          <article-title>Technical terminology: some linguistic properties and an algorithm for identi cation in text</article-title>
          .
          <source>Natural language engineering 1</source>
          (
          <issue>1</issue>
          ),
          <volume>9</volume>
          {
          <fpage>27</fpage>
          (
          <year>1995</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Kilgarri</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Baisa</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Busta</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , Jakub cek, M.,
          <string-name>
            <surname>Kovar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Michelfeit</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rychly</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suchomel</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          :
          <article-title>The sketch engine: ten years on</article-title>
          .
          <source>Lexicography</source>
          <volume>1</volume>
          (
          <issue>1</issue>
          ),
          <volume>7</volume>
          {
          <fpage>36</fpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Mart</surname>
            n Chozas,
            <given-names>P.</given-names>
          </string-name>
          :
          <article-title>Towards a Linked Open Data Cloud of language resources in the legal domain</article-title>
          .
          <source>UPM</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Witten</surname>
            ,
            <given-names>I.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paynter</surname>
            ,
            <given-names>G.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Frank</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gutwin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nevill-Manning</surname>
            ,
            <given-names>C.G.</given-names>
          </string-name>
          :
          <article-title>Kea: Practical automated keyphrase extraction</article-title>
          . In:
          <article-title>Design and Usability of Digital Libraries: Case Studies in the Asia Paci c</article-title>
          , pp.
          <volume>129</volume>
          {
          <fpage>152</fpage>
          .
          <string-name>
            <given-names>IGI</given-names>
            <surname>Global</surname>
          </string-name>
          (
          <year>2005</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Zhang</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ciravegna</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Jate 2.0: Java automatic term extraction with apache solr</article-title>
          .
          <source>In: LREC</source>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>