<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>A Linguistic Failure Analysis of Classification of Medical Publications: A Study on Stemming vs Lemmatization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Giorgio Maria Di Nunzio</string-name>
          <email>dinunzio@dei.unipd.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Federica Vezzani</string-name>
          <email>federica.vezzani@phd.unipd.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Dept. of Information Engineering, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Dept. of Languages and Literary Studies, University of Padua</institution>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>English. Technology-Assisted Review (TAR) systems are essential to minimize the effort of the user during the search and retrieval of relevant documents for a specific information need. In this paper, we present a failure analysis based on terminological and linguistic aspects of a TAR system for systematic medical reviews. In particular, we analyze the results of the worst performing topics in terms of recall using the dataset of the CLEF 2017 eHealth task on TAR in Empirical Medicine. Italiano. I sistemi TAR (TechnologyAssisted Review) sono fondamentali per ridurre al minimo lo sforzo dell'utente che intende ricercare e recuperare i documenti rilevanti per uno specifico bisogno informativo. In questo articolo, presentiamo una failure analysis basata su aspetti terminologici e linguistici di un sistema TAR per le revisioni sistematiche in campo medico. In particolare, analizziamo i topic per i quali abbiamo ottenuto dei risultati peggiori in termini di recall utilizzando il dataset di CLEF 2017 eHealth task on TAR in Empirical Medicine.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>
        The Cross Language Evaluation Forum
(CLEF)
        <xref ref-type="bibr" rid="ref5">(Goeuriot et al., 2017)</xref>
        Lab on eHealth has
proposed a task on Technology-Assisted Review
(TAR) in Empirical Medicine since 2017. This
task focuses on the problem of systematic reviews
in the medical domain, that is the retrieval of all
the documents presenting some evidence
regarding a certain medical topic. This kind of problem
is also known as total recall (or total sensitivity)
problem since the main goal of the search is to
find possibly all the relevant documents for a
specific topic.
      </p>
      <p>
        In this paper, we present a failure analysis based
on terminological and linguistic aspects of the
system presented by
        <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
        on the CLEF
2017 TAR dataset. This system uses a
continuous active learning approach
        <xref ref-type="bibr" rid="ref2">(Di Nunzio et al.,
2017)</xref>
        together with a variable threshold based on
the geometry of the two-dimensional space of
documents
        <xref ref-type="bibr" rid="ref3">(Di Nunzio, 2014)</xref>
        . Moreover, the system
performs an automatic estimation of the number of
documents that need to be read in order to declare
the review complete.
      </p>
      <p>In particular, 1) we analyze the results of those
topics for which the retrieval system does not
achieve a perfect recall; 2) based on this analysis,
we perform new experiments to compare the
results achieved with the use of either a stemmer or
a lemmatizer. This paper is organized as follows:
in Section 1.1, we give a brief summary of the use
of stemmers and lemmatizers in Information
Retrieval; in Section 3, we describe the failure
analysis carried out on the CLEF 2017 TAR dataset and
the results of the new experiments comparing the
use of stemmers vs lemmatizers. In Section 4, we
give our conclusions.
1.1</p>
      <sec id="sec-1-1">
        <title>Stemming and Lemmatization</title>
        <p>
          Stemming and lemmatization play an important
role in order to increase the recall capabilities of
an information retrieval system
          <xref ref-type="bibr" rid="ref6 ref9">(Kanis and
Skorkovska´, 2010; Kettunen et al., 2005)</xref>
          . The
basic principle of both techniques is to group similar
words which have either the same root or the same
canonical citation form
          <xref ref-type="bibr" rid="ref1">(Balakrishnan and
LloydYemoh, 2014)</xref>
          . Stemming algorithms remove
suffixes as well as inflections, so that word variants
can be conflated into their respective stems. If we
consider the words amusing and amusement, the
stem will be amus. On the other hand,
lemmatization uses vocabularies and morphological
analyses to remove the inflectional endings of a word
and to convert it in its dictionary form.
Considering the example below, the lemma for
amusing and amused will be amuse. Stemmers and
lemmatizers differ in the way they are built and
trained. Statistical stemmers are important
components for text search over languages and can be
trained even with few linguistic resources
          <xref ref-type="bibr" rid="ref17">(Silvello
et al., 2018)</xref>
          . Lemmatizers can be generic, like
the one in the Stanford coreNLP package
          <xref ref-type="bibr" rid="ref11">(Manning et al., 2014)</xref>
          , or optimized for a specific
domain, like BioLemmatizer which incorporates
several published lexical resources in the biomedical
domain
          <xref ref-type="bibr" rid="ref10">(Liu et al., 2012)</xref>
          .
2
        </p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>System</title>
      <p>
        The system we used in this paper is based on a
Technologically Assisted Review (TAR) system
which uses a two-dimensional representation of
probabilities of a document d being relevant R,
or non-relevant, N R respectively P (djR) and
P (djN R)
        <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
        .
      </p>
      <p>
        This system uses an alternative interpretation
of the BM25 weighting schema
        <xref ref-type="bibr" rid="ref14">(Robertson and
Zaragoza, 2009)</xref>
        by splitting the weight of a
document in two parts
        <xref ref-type="bibr" rid="ref3">(Di Nunzio, 2014)</xref>
        :
      </p>
      <p>P (djR) =
P (djN R) =</p>
      <p>X wiBM25;R(tf )
wi2d
X wiBM25;N R(tf )
wi2d
(1)
(2)
The system uses a bag-of-words approach on the
words wi (either stemmed or lemmatized) that
appear in the document and an explicit relevance
feedback approach to continuously update the
probability of the terms in order to select the next
document to show to the user.</p>
      <p>
        In addition, for each topic the system uses a
query expansion approach with two variants per
topic in order to find alternative and valid terms
for the retrieval of relevant documents. Our
approach for the query reformulation is based on
a linguistic analysis performed by means of the
model of terminological record designed in
        <xref ref-type="bibr" rid="ref18">(Vezzani et al., 2018)</xref>
        for the study of medical
language and this method allows the formulation of
two different query variants. The first is a list of
key-words resulting from a systematic semic
analysis
        <xref ref-type="bibr" rid="ref13">(Rastier, 1987)</xref>
        consisting in the
decomposition of the meaning of technical terms (that is the
lexematic or morphological unit) into minimum
unit of meaning that cannot be further segmented.
The second is a human-readable reformulation
using validly attested synonyms and orthographic
alternatives as variants of the medical terms
provided in the original query. The following
examples show our query reformulations given the
initial query provided with the CLEF 2017 TAR
dataset:
      </p>
      <p>Initial query: Physical examination for
lumbar radiculopathy due to disc herniation in
patients with low-back pain;
First variant: Sensitivity, specificity, test,
tests, diagnosis, examination, physical,
straight leg raising, slump, radicular,
radiculopathy, pain, inflammation, compression,
compress, spinal nerve, spine, cervical, root,
roots, sciatica, vertebrae, lumbago, LBP,
lumbar, low, back, sacral, disc, discs, disk,
disks, herniation, hernia, herniated,
intervertebral;
Second variant: Sensitivity and specificity of
physical tests for the diagnosis of nerve
irritation caused by damage to the discs
between the vertebrae in patients presenting
LBP (lumbago).</p>
      <p>Given a set of documents, the stopping strategy
of the system is based on an initial subset (percent
p) of documents that will be read and a maximum
number of documents (threshold t) that an expert
is willing to judge.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>
        The dataset provided by the TAR in
Empirical Medicine Task at CLEF 20171 is based on
50 systematic reviews (or topics) conducted by
Cochrane experts on Diagnostic Test Accuracy
(DTA). For each topic, the set of PubMed
Document Identifiers (PIDs) returned by running the
1https://goo.gl/jyNALo
query proposed by the physicians in MEDLINE as
well as the relevance judgements are made
available
        <xref ref-type="bibr" rid="ref7">(Kanoulas et al., 2017)</xref>
        . The aim of the task is
to retrieve all the documents that have been judged
as relevant by the physicians. The results achieved
by the participating teams to this task showed that
it is possible to get very close to a perfect recall;
however, there are some topics for which most of
the systems did not retrieve all the possible
relevant documents, unless an unfeasible amount of
documents is read by the user.
      </p>
      <p>
        In this paper, i) we present a linguistic and
terminological failure analysis of such topics and,
based on this analysis, ii) the results of a new set of
experiments that compare the use of either a
stemmer or a lemmatizer in order to evaluate a possible
improvement in the performance in terms of
recall. As a baseline for our analyses, we used the
source code provided by
        <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
        . The
two parameters of the system — the percentage
p of initial training documents that the physician
has to read, and the maximum number of
documents t a physician is willing to read — were set
to p = 500 and t = 100; 500; 1000.
3.1
      </p>
      <sec id="sec-3-1">
        <title>Linguistic Failure Analysis</title>
        <p>In order to select the most difficult topics for the
failure analysis, we run the retrieval system with
parameters p = 50% and threshold t = 1000 and
selected those topics for which the system could
not retrieve all the relevant documents, five in
total, shown in Table 1. In order to find out why the
system did not retrieve all the relevant documents
for these topics, we focused on linguistic and
terminological aspects both of technical terms in the
original query and of the abstracts of missing
relevant documents.</p>
        <p>
          We started by reading the abstract of all 19
missing relevant documents and manually
selecting technical terms, defined as as all the terms that
are strictly related to the conceptual and practical
factors of a given discipline or activity
          <xref ref-type="bibr" rid="ref18">(Vezzani
et al., 2018)</xref>
          , in this case the medical discipline.
Then, we compared these terms with those
previously identified in the two query variants encoded
in the retrieval system. From this comparison, we
noticed that most of the relevant terms extracted
from the abstracts were not present in the previous
two reformulation (a minimum of 0 and a
maximum of 8 terms in common), so that some relevant
documents in which such terms were present have
not been retrieved. By focusing on the
morphological point of view, we have been able to
categorize such techincal terms in: 1) acronyms; 2) pairs
of terms, in particular noun-adjective; 3) triad of
terms, in particular noun-adjective-noun.
        </p>
        <p>
          The category of acronyms is not an
unexpected outcome. Medical language is
caracterized by an high level of abbreviations and
acronyms
          <xref ref-type="bibr" rid="ref15">(Rouleau, 2003)</xref>
          and, in order to retrieve
those missing relevant documents, we should have
considered all the orthographic variants of a
technical term as well as its acronym or expansion
according to the case.
        </p>
        <p>Regarding the second and the third category,
that is the pairs noun-adjective (e.g.: bile/biliary,
pancreas/pancreatic,
schizophrenia/schizophrenetic) and the triad of terms noun-adjective-noun
(e.g.: psychiatry/psychiatric/psychiatrist), we
noticed some problems related to the stemming
process. The analysis carried out allowed us to
identify numerous cases of understemming, as
for example the case of psychiatry stemmed as
psychiatri, psychiatric stemmed as psychiatr and
psychiatrist stemmed as psychiatrist, all of them
belonging to the same conceptual group. The fact
that the stemmer recognizes these three words
as different suggests us that the conflation of the
inflected forms of a lemma in the query expansion
procedure may help to retrieve the missed relevant
documents.</p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2 Stemming vs Lemmatization</title>
        <p>
          For the reasons explained in the previous section,
we decided to perform a new set of experiments on
these “difficult” topics to study whether a
lemmatization approach can improve the recall compared
to the stemming approach. We used the standard
algorithms implemented in the two R packages
SnowballC2 and Textstem.3 Both implements the
Porter stemmer
          <xref ref-type="bibr" rid="ref12">(Porter, 1997)</xref>
          , while the second
uses the TreeTagger algorithm
          <xref ref-type="bibr" rid="ref16">(Schmid, 1999)</xref>
          to
select the lemma of a word. To make a fair
comparison for the stemming vs lemmatization part of
the analysis, in our experiments we did not use any
of the two query variants. By reproducing the
results presented in
          <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
          , we
discovered an issue in the original source code
concerning the stemming phase. The R package tm for text
mining4 calls the stemming function of the
Snow2https://goo.gl/n3WexD
3https://goo.gl/hCLGP8
4https://goo.gl/wp859o
ballC with the “english” language instead of the
default “porter” stemmer. This caused a
substantial difference in the terms produced for the index
and those stemmed during the query analysis. For
this reason, all our results are significantly higher
compared to those presented by
          <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
          which makes this approach more effective than the
original work.
        </p>
        <p>
          We studied the performance in terms of recall,
and precision at 100, 500, and 1000 documents
read (p@100, P@500, and P@1000 respectively)
for different values of the threshold t. In
Table 2, we report in the first column of each value
of t the performance of the original experiment
compared to our results (only recall is available
from
          <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
          ). If we observe the
performances on the whole set of test queries, there is
no substantial difference between stemming and
lemmatization. There is some improvement in
terms of recall when threshold t = 100, however
85% of recall is usually considered a ‘low’ score in
total recall tasks. Table 3 compares the number of
relevant documents missed by the stemming and
lemmatization approaches on the difficult topics.
The differences between the original experiments
and these new experiments are minimal apart from
topic CD010339 for which the absence of the two
query reformulations led to a worse performance.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Final Remarks and Future Work</title>
      <p>
        In this work, we have presented a linguistic
failure analysis in the context of medical systematic
reviews. The analysis showed that, for those
topics where the system does not retrieve all the
relevant information, the main issues are related to
abbreviations and pairs noun-adjective and the triad
of terms noun-adjective-noun. We performed a
new set of experiments to see whether
lemmatization could improve over stemming but the results
were not conclusive. The issues remain the same
since the type of relation noun-adjective or
nounadjective-noun, cannot be resolved by a
lemmatizer. For this reason, we are currently studying
an approach that conflates morphosyntactic
variants of medical terms into the same lemma (or
‘conceptual sphere’) by means of medical
terminological records
        <xref ref-type="bibr" rid="ref18">(Vezzani et al., 2018)</xref>
        and the use
of the Medical Subject Headings (MesH)
dictionary. 5 In this way, we expect that the system will
automatically identify all the related forms (such
5https://meshb.nlm.nih.gov/search
as all the derivative nouns, adjectives or adverbs)
of a lemma in order to include them in the retrieval
process of potentially relevant documents.
      </p>
    </sec>
    <sec id="sec-5">
      <title>Acknowledgments</title>
      <p>The authors would like to thank Sara Bosi and
Fiorenza Germana Grilli, students of the
Master Degree in Modern Languages for International
Communication and Cooperation of the
Department of Linguistics and Literary Study of the
University of Padua, who helped us in the linguistic
failure analysis phase.</p>
      <p>
        t = 100
        <xref ref-type="bibr" rid="ref18 ref4">(Di Nunzio, 2018)</xref>
        .645
# original
1
6
2
2
8
# lemma
1
16
1
1
9
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Vimala</given-names>
            <surname>Balakrishnan</surname>
          </string-name>
          and
          <string-name>
            <given-names>Ethel</given-names>
            <surname>Lloyd-Yemoh</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Stemming and lemmatization: A comparison of retrieval performances</article-title>
          .
          <source>Lecture Notes on Software Engineering</source>
          ,
          <volume>2</volume>
          (
          <issue>3</issue>
          ):
          <fpage>262</fpage>
          -
          <lpage>267</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          , Federica Beghini, Federica Vezzani, and Genevie`ve Henrot.
          <year>2017</year>
          .
          <article-title>An Interactive Two-Dimensional Approach to Query Aspects Rewriting in Systematic Reviews. IMS Unipd At CLEF eHealth Task 2</article-title>
          .
          <source>In Working Notes of CLEF 2017 - Conference and Labs of the Evaluation Forum</source>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>A New Decision to Take for Cost-Sensitive Na¨ıve Bayes Classifiers</article-title>
          .
          <source>Information Processing &amp; Management</source>
          ,
          <volume>50</volume>
          (
          <issue>5</issue>
          ):
          <fpage>653</fpage>
          -
          <lpage>674</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Giorgio</given-names>
            <surname>Maria Di Nunzio</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>A study of an automatic stopping strategy for technologically assisted medical reviews</article-title>
          .
          <source>In Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, pages
          <fpage>672</fpage>
          -
          <lpage>677</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Lorraine</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          , Liadh Kelly, Hanna Suominen, Aure´lie Ne´ve´ol, Aude Robert, Evangelos Kanoulas, Rene Spijker,
          <source>Joa˜o Palotti, and Guido Zuccon</source>
          ,
          <year>2017</year>
          .
          <article-title>CLEF 2017 eHealth Evaluation Lab Overview</article-title>
          , pages
          <fpage>291</fpage>
          -
          <lpage>303</lpage>
          . Springer International Publishing, Cham.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Jakub</given-names>
            <surname>Kanis</surname>
          </string-name>
          and Lucie Skorkovska´.
          <year>2010</year>
          .
          <article-title>Comparison of different lemmatization approaches through the means of information retrieval performance</article-title>
          . In Petr Sojka, Alesˇ Hora´k, Ivan Kopecˇek, and Karel Pala, editors,
          <source>Text, Speech and Dialogue</source>
          , pages
          <fpage>93</fpage>
          -
          <lpage>100</lpage>
          , Berlin, Heidelberg. Springer Berlin Heidelberg.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <given-names>Evangelos</given-names>
            <surname>Kanoulas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Dan</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Leif</given-names>
            <surname>Azzopardi</surname>
          </string-name>
          , and Rene Spijker, editors.
          <year>2017</year>
          .
          <article-title>CLEF 2017 Technologically Assisted Reviews in Empirical Medicine Overview</article-title>
          . In Working Notes of CLEF 2017 -
          <article-title>Conference and Labs of the Evaluation forum</article-title>
          , Dublin, Ireland,
          <source>September 11-14</source>
          ,
          <year>2017</year>
          .,
          <source>CEUR Workshop Proceedings. CEUR-WS.org.</source>
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source># stem 1 15</source>
          <volume>1 1 7</volume>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Kimmo</given-names>
            <surname>Kettunen</surname>
          </string-name>
          , Tuomas Kunttu, and Kalervo Ja¨rvelin.
          <year>2005</year>
          .
          <article-title>To stem or lemmatize a highly inflectional language in a probabilistic ir environment?</article-title>
          <source>Journal of Documentation</source>
          ,
          <volume>61</volume>
          (
          <issue>4</issue>
          ):
          <fpage>476</fpage>
          -
          <lpage>496</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <surname>Haibin</surname>
            <given-names>Liu</given-names>
          </string-name>
          , Tom Christiansen, William A.
          <string-name>
            <surname>Baumgartner</surname>
            , and
            <given-names>Karin</given-names>
          </string-name>
          <string-name>
            <surname>Verspoor</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>Biolemmatizer: a lemmatization tool for morphological processing of biomedical text</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          ):3,
          <string-name>
            <surname>Apr</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <surname>Christopher D. Manning</surname>
            , Mihai Surdeanu, John Bauer, Jenny Finkel,
            <given-names>Steven J.</given-names>
          </string-name>
          <string-name>
            <surname>Bethard</surname>
          </string-name>
          , and
          <string-name>
            <surname>David McClosky</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>The Stanford CoreNLP natural language processing toolkit. In Association for Computational Linguistics (ACL) System Demonstrations</article-title>
          , pages
          <fpage>55</fpage>
          -
          <lpage>60</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Martin F.</given-names>
            <surname>Porter</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>An algorithm for suffix stripping</article-title>
          .
          <source>In Karen Sparck Jones and Peter Willett</source>
          , editors,
          <source>Readings in Information Retrieval</source>
          , pages
          <fpage>313</fpage>
          -
          <lpage>316</lpage>
          . Morgan Kaufmann Publishers Inc., San Francisco, CA, USA.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>Franc¸ois Rastier</source>
          .
          <year>1987</year>
          .
          <article-title>Se´mantique interpre´tative. Formes se´miotiques</article-title>
          . Presses universitaires de France.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Stephen E.</given-names>
            <surname>Robertson</surname>
          </string-name>
          and
          <string-name>
            <given-names>Hugo</given-names>
            <surname>Zaragoza</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>The Probabilistic Relevance Framework: BM25 and Beyond</article-title>
          .
          <source>Foundations and Trends in Information Retrieval</source>
          ,
          <volume>3</volume>
          (
          <issue>4</issue>
          ):
          <fpage>333</fpage>
          -
          <lpage>389</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Maurice</given-names>
            <surname>Rouleau</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>La terminologie me´dicale et ses proble`mes</article-title>
          .
          <source>Tribuna</source>
          , Vol. IV, n. 12.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Helmut</given-names>
            <surname>Schmid</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Improvements in part-ofspeech tagging with an application to german</article-title>
          .
          <source>In Natural Language Processing Using Very Large Corpora</source>
          , pages
          <fpage>13</fpage>
          -
          <lpage>25</lpage>
          . Springer Netherlands, Dordrecht.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Gianmaria</given-names>
            <surname>Silvello</surname>
          </string-name>
          , Riccardo Bucco, Giulio Busato, Giacomo Fornari, Andrea Langeli, Alberto Purpura, Giacomo Rocco, Alessandro Tezza, and
          <string-name>
            <given-names>Maristella</given-names>
            <surname>Agosti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Statistical stemmers: A reproducibility study</article-title>
          .
          <source>In Advances in Information Retrieval - 40th European Conference on IR Research</source>
          , ECIR
          <year>2018</year>
          , Grenoble, France, March 26-29,
          <year>2018</year>
          , Proceedings, pages
          <fpage>385</fpage>
          -
          <lpage>397</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Federica</given-names>
            <surname>Vezzani</surname>
          </string-name>
          ,
          <source>Giorgio Maria Di Nunzio, and Genevie`ve Henrot</source>
          .
          <year>2018</year>
          .
          <article-title>TriMED: A Multilingual Terminological Database</article-title>
          . In Nicoletta Calzolari (Conference chair),
          <source>Khalid Choukri</source>
          , Christopher Cieri, Thierry Declerck, Sara Goggi, Koiti Hasida, Hitoshi Isahara, Bente Maegaard, Joseph Mariani, He´le`ne Mazo, Asuncion Moreno, Jan Odijk, Stelios Piperidis, and Takenobu Tokunaga, editors,
          <source>Proceedings of the Eleventh International Conference on Language Resources and Evaluation (LREC</source>
          <year>2018</year>
          ), Miyazaki, Japan, May 7-
          <issue>12</issue>
          ,
          <year>2018</year>
          .
          <string-name>
            <given-names>European</given-names>
            <surname>Language Resources Association (ELRA).</surname>
          </string-name>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>