=Paper= {{Paper |id=Vol-1495/paper_20 |storemode=property |title=Syntagmatic Behaviors of Verbs in Medical Texts: Expert Communication vs. Forums of Patients |pdfUrl=https://ceur-ws.org/Vol-1495/paper_20.pdf |volume=Vol-1495 |dblpUrl=https://dblp.org/rec/conf/tia/OrnellaGH15 }} ==Syntagmatic Behaviors of Verbs in Medical Texts: Expert Communication vs. Forums of Patients== https://ceur-ws.org/Vol-1495/paper_20.pdf
                 Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                             99




                 Syntagmatic Behaviors of Verbs in Medical Texts :
                   Expert Communication vs. Forums of Patients
     Ornella Wandji Tchami, Natalia Grabar                                     Ulrich Heid
         STL UMR 8163 CNRS, Université Lille 3                      IWIST, Universität Hildesheim
            59653 Villeneuve d’Ascq, France                                   Germany
               ornwandji@yahoo.fr                                  heidul@uni-hildesheim.de
         natalia.grabar@univ-lille3.fr


                      Abstract                                simplification of the medical doctors’ vocabulary.
                                                              Researchers in NLP went further, proposing the
     In this paper, we propose an automatic con-              creation of lexicons which relate expert terminol-
     trastive analysis of the behavior of verbs,              ogy with expressions used by lay people (Zeng-
     with regard to the semantic features of their
                                                              Treiler and Tse (2006), Deléger and Zweigen-
     arguments (subject, direct object, indirect
     object), within and across medical subcor-
                                                              baum (2008), Grabar and Hamon (2014)).
     pora. We compare four medical subcor-                       In line with the studies mentioned above, we
     pora with texts whose authors and intended               are interested in the written communication be-
     readership have different levels of expertise.           tween medical experts and non-experts. We pro-
     The semantic annotation of the subcorpora                pose a comparative analysis of the distributions
     is based on semantic information provided
                                                              of argument structures (and semantic patterns) in
     by a medical terminology. Our results indi-
     cate that the proposed procedures and tools              French medical texts which have been classified
     could be used for the automatic detection of             and grouped according to their discursive speci-
     different ways of expressing medical con-                ficity (Pearson (1998)) and the respective level of
     cepts and conceptual relations, according to             expertise of the target public. More specifically,
     the types of texts.                                      we compare verbal arguments in four types of sub-
                                                              corpora, focusing on lexical preference and mak-
                                                              ing different hypotheses. We assume that medi-
1   Introduction
                                                              cal experts use more specific and specialised ver-
Research has shown that despite the growing body              bal configurations (frames, co-occurrences, col-
of literature available to patients, communication            locations (i.e preferred co-occurrences)) in order
between medical practitioners and patients is not             to express medical concepts and the relations be-
always easy and successful. This situation is to              tween them, while non-experts tend to use less
some extent due to linguistic complexity in med-              specific configurations. Also we verify to which
ical care texts (Putz (2008)). Indeed, the avail-             extent the semantic categories of the Snomed ter-
ability of medical information does not guarantee             minology allow to distinguish these different con-
its readability and correct understanding. Stan-              figurations. Our study is an extension to a pre-
dard medical language contains specific terminol-             vious work where we looked at the syntactic and
ogy and specialised phraseology which is hard to              semantic features of the elements surrounding the
understand for non-expert users (McCray (2005),               verbs in the expert and forum subcorpora, with-
Zeng-Treiler et al. (2007)), and which can there-             out taking into consideration the intermediary sub-
fore render the communication difficult (Jucks and            corpora and the dependency relationships between
Bromme (2007), Tran et al. (2009)). Research                  the verbs and their arguments. This work is in-
into this issue has been conducted in sociology               tended to highlight the relationship between ver-
(Kharrazi (2009), Chy et al. (2012)), in Medical              bal argument structures and the different ways of
Informatics (Kokkinakis and Toporowska Gronos-                expressing specialised concepts in texts written by
taj (2006), Smith and Wicks (2008)) and in Nat-               people who have different levels of specialised
ural Language Processing (Zeng-Treiler and Tse                medical knowledge. In fact, lexical preferences,
(2006), Chmielik and Grabar (2011)) in order to               collocations, semantic category preferences and
identify the specificities of this communication.             verb frames share the ability to express concepts
As one could expect, these studies suggested the              and/or relations between concepts.
                Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                           100




2   Studies of argument structures in                       that verb senses are closely related to types of dis-
    corpora                                                 course, in such a way that both determine the fre-
                                                            quency of the different subcatgorisation schemes
Investigations into the distribution of argument            of the verbs in the corpora.
structures of verbs have helped describe and un-               Although they all look at verbal argument struc-
derstand the relationship between the verbs, the            tures within different types of texts, none of the
argument structures they occur in and the seman-            above-mentioned studies proposes the kind of ap-
tic classes to which they belong. These studies             proach we are trying to develop. We propose
have shown the tendency of particular verbs to              a study of subcategorisation schemes in medical
select a particular type of arguments, and the at-          corpora that are differentiated according to their
traction of certain argument structures for partic-         levels of specialization, and we use a medical ter-
ular verbs (Gries and Stefanowitsch (2004), Gries           minology for the semantic annotation of the texts,
and Stefanowitsch (2010)). Some studies focus-              to detect selectional restrictions and lexical prefer-
ing on verb valency patterns and their frequencies          ences.
have revealed that verbs show certain preferences
with respect to their valency schemes and alter-            3        Material
nations (Köhler (2005), Engelberg (2009), Cosma
and Engelberg (2013)). Other researchers have au-           The study is based on two types of material: cor-
tomatically induced verb classes from data on the           pora distinguished by the levels of expertise of
distribution of valency patterns (Schulte im Walde          their authors and intended readers (section 3.1)
(2003), Schulte im Walde (2009)).                           and a semantic resource (section 3.2), used for the
   Quantitative data on argument structures are             semantic annotation of the corpora.
also used for the construction of lexical classes,          3.1       Corpora
or to build a lexical organisation which predicts
much of the behaviour of a new word by associat-            The corpus is made up of a set of four medical
ing it with an appropriate class. As far as English         subcorpora of written French, which are distin-
is concerned, several studies were conducted for            guished by their discursive specificities (Pearson,
the acquisition of subcategorisation information            1998) and the respective levels of expertise of their
from raw corpora (Briscoe and Carroll (1997);               readership. The first three subcorpora come from
Preiss et al. (2007)). Some of these studies like           the portal CISMeF1 , which indexes medical texts
Korhonen and Briscoe (2004) use subcategorisa-              according to three different categories: texts for
tion frames for the extension of lexical-semantic           medical experts, texts for medical students, texts
classifications. Others use them as main fea-               for patients or non-experts. The fourth subcorpus
tures for the classification of verbs in specialised        is made of texts written by non-experts. It con-
texts from the biomedical domain (Korhonen et               tains discussions between patients and/or persons
al. (2008)). Only recently, French has become               participating in a forum called Doctissimo, Hyper-
the target of such research. Chesley and Salmon-            tension, Problèmes Cardiaques (Doctissimo, Hy-
Alt (2006) carried out an exploratory study of 104          pertension, heart problems)2 .
common verbs that allowed them to identify 27                   Corpus               Size   Verb occ.   pron. occ.   description
                                                                C1 / expert     1,285,665      52529         1349    scientific publications
subcategorisation schemes. More recently, Mes-                                                                       and reports
                                                                C2 / student     384,381       22092          920    didactic supports
siant et al. (2010) have implemented a method to                                                                     created for students
                                                                C3 / patient     253,968       19421         1176    documentation
automatically acquire a syntactic lexicon of sub-                                                                    and brochures
                                                                C4 / forum      1,588,697    184843          8261    forum messages
categorisation frames for French verbs from large                                                                    from participants
corpora.
   It has been shown that the neighborhood of a                              Table 1: Size of the subcorpora used
verb can be different according to the type of text
in which the verb appears (Helbig (1985), Wandji              Table 1 indicates the size of the four subcorpora
Tchami et al. (2013), Wandji Tchami and Grabar              (number of tokens) and the number of verbal oc-
(2014)). Roland and Jurafsky (1998) analyse how                  1
                                                                http://www.cismef.org/
the frequency of verb subcatgorisation schemes is                2
                                                                http://forum.doctissimo.fr/sante/hypertension-
affected by corpus choice. This study has revealed          problemes-cardiaques/liste sujet-1.htm
                   Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                              101




 currences per subcorpus; the rightmost column in- C: Chemical products (e.g., médicament (medi-
 dicates how many verbal occurrences per subcor-          cation), sodium, héparine (heparin), bleu de
 pus have pronominal arguments (which will not            méthylène (methylene blue));
 be resolved and thus not counted in this study). A: Physical agents and artefacts (e.g., cathéter
 As can be seen, the expert and forum corpora are         (catheter), prothèse (prosthesis), tube (tube)).
 almost equal in size, while the student and the lay
 persons’ corpora are much smaller, but also simi-         In our approach, the semantic categories of the
 lar in size. We make the assumption that the au- Snomed International terminology are considered
 thors of the four subcorpora represent actors of the as ontological categories used for the characterisa-
 medical domain, who have different levels of ex- tion of the verbal arguments. The used version of
 pertise as far as the use of specialised medical lan- Snomed contains 144 267 entries (mainly French
 guage is concerned.                                    nouns, noun phrases and adjectives). We used it
                                                        for the semantic annotation of our corpus. The
 3.2 Semantic resource
                                                        Snomed entries may not necessarily cover all do-
 We use the Snomed International Terminology main notions in our texts (Chute et al., 1996). For
 (Côté (1996)) which groups medical terms into this reason, in a previous study, we attempted to
 eleven semantic categories, of which nine are con- complete the coverage of the terminology in re-
 sidered in this study3 . This terminology was cho- lation with the corpus used (Wandji Tchami and
 sen because it is one of the largest medical termi- Grabar (2014)). We computed the plural forms
 nologies available for French.                         of Snomed’s single word terms, and we tried to
                                                        detect misspellings of the terms by means of the
T : Topography or anatomical locations (e.g., coeur string edit distance (Levenshtein, 1966). In both
    (heart), cardiaque (cardiac), digestif (digestive), cases, the computed forms inherit the semantic
    vaisseau (vessel));                                 type of the terms from the Snomed. In this way,
S: Social status (e.g., mari (husband), soeur (sister), 14 035 entries were added to the terminology.
    mère (mother), ancien fumeur (former smoker),
    donneur (donor));                                   4 Method
P: Procedures (e.g., césarienne (caesarean), trans-            The method applied in this study aims at de-
   ducteur ultrasons (ultrasound transducer), télé-           scribing and comparing the argument structures of
   expertise (tele-expertise));                                 verbs in different types of subcorpora, with a par-
                                                                ticular focus on selectional restrictions and lexical
L: Living organisms, such as bacteries and
                                                                preferences. The tools and procedures used allow
   viruses (e.g., Bacillus, Enterobacter, Klebsiella,
                                                                us to detect collocations and different ways of ex-
   Salmonella); plants (e.g., fougère (fern), pomme
                                                                pressing concepts and conceptual relations. In or-
   de terre (potato)), but also animals (e.g., singe
                                                                der to achieve our aim, we follow 3 main steps:
   (monkey), chien dalmatien (dalmatian dog));
                                                                the corpus pre-processing and annotation (syntac-
J : Professional occupations (e.g., équipe de SAMU             tic and semantic) (section 4.1), the extraction of
    (ambulance team), anesthésiste (anesthesiologist),         verbal argument structures and co-occurrence data
    assureur (insurer), magasinier (storekeeper));              (section 4.2), both performed automatically and
F: Functions and dysfunctions of the organ-                     followed by a manual analysis (section 4.3) which
   ism (e.g., pression artérielle (arterial pres-              aims at contrasting and interpreting the automati-
   sure), métabolique (metabolic), protéinurie (pro-          cally extracted data.
   teinuria), détresse (distress), insuffisance (defi-
                                                                4.1   Corpus pre-processing and annotation
   ciency));
                                                                The subcorpora have all been downloaded from
D: Disorders and pathologies (e.g., obésité (obe-             the above-mentioned online sources, converted
   sity), hypertension artérielle (arterial hyperten-          into plain text and recoded in UTF-8 format. The
   sion), cancer (cancer), maladie (disease));                  syntactic analysis of sentences is performed with
    3
     The two semantic classes containing modifiers are not      the Cordial dependency parser (Dominique et al.,
 taken into consideration in this study.                        2009). Its output contains sentences in a tabulated
                   Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                              102




format similar to the CONLL format (Buchholz                    helps to identify semantic groups of verbs express-
and Marsi, 2006). In this format, a sentence con-               ing similar concepts and conceptual relations be-
sists of one or more tokens, each one annotated                 tween the verb arguments.
with thirteen fields, separated by a tab character.                After processing all the verbs found in the
Among these fields, the syntactic function and the              different subcorpora, 11 verbs were selected for
pivot verb are the main information that allow us               a more detailed case study : augmenter (add),
to extract the verbs and their arguments.                       évaluer (evaluate), exposer (expose), subir (un-
   The syntactically annotated sentences are then               dergo), prescrire (prescribe), provoquer (provoke),
processed with Perl programs that perform the se-               accompagner (accompany), suivre (follow), causer
mantic annotation by projecting the resource de-                (cause), baisser (lower), and entraı̂ner (lead to).
scribed in Section 3.2 onto the lemmatised sen-                 These verbs were selected according to two main
tences. The categories of the terminology add                   criteria:
semantic information to the syntactic patterns of
verbs. Hence, at the end of this stage, each                    • Frequency: the verbs should have at least 20
verb argument appearing in the terminology is la-                 occurrences each, in at least two of the subcor-
beled with a semantic category, in addition to its                pora;
syntactic function; such pair constitutes what we               • Types of verbs: we tried to choose not only
call a specialised configuration or frame while a                 verbs that intuitively tend to have specialised
pair whose argument has no Snomed categories is                   usages in specialised domain texts, but also gen-
considered as a non specialised configuration or                  eral language verbs like accompagner, baisser,
frame.                                                            suivre etc.; The tendency to co-occur frequently
                                                                  with particular terms was also taken into con-
4.2    Extraction of verbal argument structures                   sideration, since we focus on lexical preference
       and of verb+noun co-occurrence                             and collocation.
The sets of sentences annotated at the pre-
                                                                4.3    Comparative analysis of verbal behaviors
vious step are processed with Perl programs
that extract argument structures involving the                  The comparative analysis is done manually and
Snomed categories of terms, when provided by                    aims at highlighting the differences and similar-
Snomed, as in Table 2 (V+Su/Scat+DO/Scat,                       ities of the subcorpora with regard to selectional
V+Su/Scat+DO/Scat+IO/Scat) and pairs of                         restrictions and lexical preferences. We compare
V+Su/Scat, V+DO/Scat and V+IO/Scat4 .                           the frequency of verbal configurations (pairs of
   For each verb, the most frequent cooccurring                 verb+argument or frames) across the subcorpora.
objects are automatically extracted and their corre-            This analysis addresses different aspects : the ar-
sponding frequencies are computed from all sub-                 guments (terms) cooccurring with verbs, the verbs
corpora. Indeed, in 5.1 and 5.2, we focus partic-               cooccurring with those arguments, the different
ularly on direct objects, except with the verb ex-              frames verbs frequently appear in, and argument
poser for which we have considered the subject                  structures expressing similar conceptual relations.
(patientS+exposer) and the indirect object (ex-                 The results are discussed in Section 5.1.
poser un risque) (Table 2).
                                                                5     Results and Discussion
   For a given verb A, after extracting its most fre-
quent objects from the corpora, we automatically                5.1    Terms cooccurring with verbs
extract further verbs that frequently combine with              The data provided in Table 2 lead to several ob-
A’s objects, most particularly those which are se-              servations. Some verbs frequently select terms
mantically close to A, and we compute the fre-                  from a particular Snomed category, mostly spe-
quency of all verb+Object pairs (see Tables 3 and               cific terms, in a particular subcorpus, while in
4). These data function as indicators of the phe-               the other subcorpora this co-occurrence never hap-
nomena observed on the medical language of ex-                  pens or only happens scarcely. This phenomenon
perts and non experts. Indeed, this experiment                  is particularly striking with verbs like prescire and
   4
    V=verb, Su=sujet, DO, direct Object, IO=indirect Object     subir. In the forum and sometimes in the lay
Scat=Snomed category                                            subcorpus, these verbs frequently combine with
                       Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                            103



   Verbs                              Nominal cooccurrents
                              Arguments        exp      stu    lay   f or    lation, subir un AVC, suivre un régime) borrowed
  prescrire                  traitement P         3        0     0      7
                                 examenP          0        0     2      7    from the medical experts’ language. According to
                            médicament C         0        0     7     26
    subir                       ablationP         0        0     0     39
                                                                             researchers in Consumer Health Literature, such
                           interventionP
                                     AVCD
                                                  6
                                                  0
                                                           0
                                                           0
                                                                 1
                                                                 2
                                                                       30
                                                                       12
                                                                             mixed phraseology is the result of social and cul-
  augmenter                      tensionF
                       risque/risque deF
                                                  0
                                                 26
                                                           0
                                                           8
                                                                 7
                                                                 5
                                                                       14
                                                                        7
                                                                             tural influence on language and they are acquired
   baisser                       tensionF         0        0     4     18    from formal and informal sources such as the in-
   exposer                     à+risqueF        14        8     0      3
                                  patient S      23        5     1      0    ternet (Zeng-Treiler et al. (2006), Zeng-Treiler and
   suivre     apparition de symptômesF           5        0     0      0
                                  patient S       6        0     0      0    Tse (2006)). The frequent use of these expressions
                                  régimeF        1        0     0      5
                                     conseil      0        0     4     10    makes them progressively become part of every-
                             traitement P         2        2     1     13
   évaluer                       patient S       7        0     0      0    day language. This could be a plausible explana-
                                 indication       6        0     0      0
                                   risqueF        9        2     0      1    tion for the high frequency of expressions like pre-
                                                                             scrire un médicament, subir une ablation or subir
Table 2: Most frequent verb/arg pairs: capital let-
                                                                             un AVC, in the forum texts.
ter=the Snomed category, no capital letter=no category
provided                                                                         Secondly, looking at the results from the expert
                                                                             subcorpus to the forum subcorpus, we notice that
                                                                             sometimes the frequency difference is not very im-
terms belonging to category P (procedures); more                             portant. The explanation given above could once
specifically, prescrire seems to have an attraction                          more apply here. Indeed, medical technical terms
for the terms traitement and examen, while subir                             are quite often used by non-experts to describe
has a strong attraction for intervention and abla-                           medical concepts. On the other hand, when a
tion (which refers to a type of medical intervention                         verbal combination involving a particular Snomed
(hyponym)). Prescrire also combines frequently                               category is very frequent in the expert subcorpus
with names of chemical products (C) and shows                                like exposer + name of a medication (votre pa-
a particular attraction for the term médicament,                            tiente est exposée au ramipril), évaluer + function
while subir prefers terms referring to disorders                             (évaluer un risque) while the verb is totally ab-
and diseases (D), and more precisely the term                                sent or very rare in the other subcorpora, we might
AVC (stroke). These are preferred co-occurrences                             deal with a highly specialised (expert) or expert
which are therefore seen as collocations.                                    language-specific usage of the verb.
   Such collocations may involve polysemous
                                                                             5.2   Lexical preferences of the arguments for
verbs and their different readings. For exam-
                                                                                   verbs
ple, in the expert subcorpus (and sometimes in
the student subcorpus), évaluer and suivre tend                             The results of Section 5.1 give an account of the
to appear frequently with terms referring to func-                           lexical preferences of the verbs within and across
tions of the organism (F ) or to Social status (S).                          the subcorpora. In this section, we investigate the
Évaluer seems to be attracted by risque, indication                         lexical preferences of nominals in the expert and
and patient. Évaluer+F means to mesure, deter-                              forum subcorpora. Tables 3 and 4 give the results
mine, calculate, gauge, quantify, while évaluer+S                           of this experiment. These data were obtained as
means to examine.                                                            described in Section 4.2. The blue color repre-
   The differences in verb/arg pair frequencies can                          sents the processed verb, the entries in the col-
lead to different interpretations. First of all, when                        umn Arguments are the most frequent arguments
the frequency difference is very important from                              of the processed verb, and the red color represents
the forum subcorpus to the expert subcorpus, this                            a semantic group of verbs frequently combining
may signal some specificities of the laypersons’                             with the corresponding argument in the given cor-
language. Indeed, while health care specialists                              pus. The numbers in bracket show the frequency
share foundational domain knowledge based on                                 of each pair verb+arg.
formal education and professional experience, the                               Depending on the corpus, certain terms fre-
patients’ or non experts’ medical language is char-                          quently combine with particular verbs, in order
acterised by the use of common expressions and                               to express a particular concept. For instance, as
collocations, sometimes involving technical medi-                            we can see in Table 2, the terms médicament and
cal terms (prescrire un médicament, subir une ab-                           traitement are prescrire’s favourite cooccurrents
                        Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                          104



         Arguments                      Verbal cooccurrents
                                      Expert                  Forum        mentioned verbs and is quite reccurrent with other
         médicament     indiquer(3), recommander(2)
                                    proposer(2)                            verbs.
         traitement         proposer(8), envisager(7)        prescrire
                         recommander(3), imposer(3)                           The lexical choice difference within subcorpora
         examen              imposer(1), proposer(1)
                         recommander(1), autoriser(1)                      does not only concern terms. Verbs also select
         intervention                     -
         ablation                     faire(1)                 subir       particular terms to combine with, depending on
         AVC             prsenter(4), faire(2), avoir(2)
         tension                          -                   baisser      the subcorpora. For example, in the forum sub-
         régime                          -
         conseil                    considrer(1)              suivre       corpus, the verb suivre frequently co-occurs with
         traitement        recevoir(12), bénéficier(6)
                              faire(6), poursuivre(3),
                                                                           the term conseil, while in the expert subcorpus,
         tension                          -                 augmenter
                                                                           the term conseil does not combine with this verb.
Table 3: Lexical preferences of arguments in the expert                    Instead, suivre combines with indication. The lat-
subcorpus.                                                                 ter and mainly the term recommandation, which
                                                                           are semantically close to conseil, are very fre-
       Arguments                       Verbal cooccurrents                 quent in the expert subcorpus. They appear in
                                      Forum                    Expert
       patient                 traiter(1), voir(1)                         positions where conseil could appear. For exam-
       apparition de                     -                      suivre
       symptome                   expliquer (5)                            ple, recommandation is combined with verbs like
       risque           mesurer(1), juger(1), exposer(23)
       patient                           -                     évaluer    proposer (4), appliquer (4), actualiser (8), publier
       indication                  apprécier(1)
       risque              accroı̂tre(3), multiplier(2)       augmenter    (4), élaborer (2) and faire (3). This seems to show
                                    élever(1),
                                                                           that the experts prefer to talk about recommanda-
Table 4: Lexical preferences of arguments in the forum                     tions and indications which have specific and tech-
subcorpora.                                                                nical meanings, while laypersons are more famil-
                                                                           iar with the term conseil which is a common word.
                                                                              Another observation was made based on the
in the forum and sometimes in the lay subcorpus,                           experiment carried out. In the forum subcorpus
while in the expert subcorpus, the terms frequently                        baisser and augmenter frequently co-occur with
co-occur with the verbs indiquer, recommander,                             the term tension (augmenter la tension (increase
proposer, and envisager, recommander, proposer,                            blood pressure), baisser la tension (reduce blood
imposer, respectively.                                                     pressure) (see Table 2)), expressing different states
                                                                           of the blood pressure. In the expert subcorpus,
1) Ces médicaments ne sont plus recommandés
                                                                           none of these collocations were found. In ad-
   en première intention dans le traitement de
                                                                           dition, among the verbs combining with tension
   l’hypertension (These drugs are no longer recom-
                                                                           in the expert subcorpus, none is semantically re-
   mended as first-line in the treatment of hyperten-
                                                                           lated to the two verbs. However, we have no-
   sion)
                                                                           ticed the presence of verb based nominalisations,
   Although the two groups of verbs combine with                           constructions requiring support verbs or relational
the same terms, in the professionnal language,                             adjectives, which are synonymous with the two
these verbs are not semantically equivalent, they                          above-mentioned collocations : élévation tension-
correspond to different levels of evidence. In-                            nelle (4), and hausse de tension (1) correspond
deed, they are used by medical experts to express                          to augmenter la tension, while réduction tension-
the relevance of prescribing a given drug or treat-                        nelle (2), abaissement tensionnel (2) and baisse de
ment for a given disease. In contrast, patients just                       tension (4) have the same meaning as baisser la
know about the drug or treatment they have been                            tension.
prescribed for their disease but do not necessar-                             This phenomenon is consistent with the results
ily know about these distinctions. These examples                          obtained in a previous study (Wandji Tchami and
highlight a very relevant difference in the way ex-                        Grabar (2014)) and with Condamines and Bouri-
perts and non-experts use verbal configurations :                          gault (1999)’s findings which confirmed the fact
the first choose very specific and technical config-                       that nominal entities tend to be more frequent in
urations while the others use more general ones.                           expert texts than in non-expert texts. The above
   In the expert subcorpus, several sentences are                          data demonstrate that the difference between the
in the passive voice with an omitted agent, as in                          expert and forum texts does not lie in verbs alone,
Example 1. This applies to some of the above-                              but mostly in the different types of constructions
                     Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                                   105




the verbs are involved in (support verb, para-                         As we can see from the data provided in Table 5,
phrase, verb-based nominalisation, etc.).                           in the expert subcorpus, this conceptual relation is
                                                                    frequently expressed with the verbs accompagner
5.3     Verbal frames and conceptual relations
                                                                    and entraı̂ner while in the forum texts the verbs
Table 5 shows frames which represent different                      provoquer and causer are the most used. This re-
ways of expressing the cause-effect conceptual re-                  mark also applies for the other above-mentioned
lation. The data were extracted from the subcor-                    frames. Collocational differences between expert
pora, through the analysis of frames of accompag-                   and forum verb use also involve differences in va-
ner, causer, provoquer, and entraı̂ner which are                    lency and syntactic construction. In Example 2,
causative verbs. We are aware of the fact that                      the verb accompagner is in a pronominal form
some of the numbers presented in this table are                     with a reflexive pronoun se/s’; this construction is
not high enough to draw conclusions. However,                       the most used one in the expert subcorpus, and in
we found it important to report them because they                   the table, it is represented by the presence of the
might highlight phenomena that could be further                     indirect object in the frame.
analysed in future work, with more data.                               Another tendency observed in the expert sub-
       verbs      accompagner   causer    provoquer   entraı̂ner    corpus is the frequent use of the passive voice with
       frames     pro   for     pro for   pro for     pro for
       C s D do    1     0       2 1       0    11     3      1     a syntactically omitted agent, while in the forum
       C s F do    1     0       0 1       1     5     3      0
       D s D io    5     3       0 0       0     0     0      0
                                                                    subcorpus, the active voice is the most used. This
       C s D do    3     0       3 5       0    10     6      0
       D s F do    5     1       1 1       3     8     3      0
                                                                    observation was already underlined in Section 5.2
       F s F do
       F s D do
                   4
                   1
                         3
                         0
                                 4 5
                                 1 0
                                           0
                                           0
                                                32
                                                12
                                                       3
                                                       5
                                                              2
                                                              1
                                                                    with recommander, indiquer and proposer.
       F s P do    0     2       1 0       3     0     2      1
       P s F io    5     0       0 0       0     0     0      0
       P s D do
       P s F do
                   2
                   0
                         0
                         0
                                 0 0
                                 0 0
                                           2
                                           1
                                                 2
                                                 0
                                                       4
                                                       5
                                                              7
                                                              0
                                                                    6    Conclusion and Perspectives
       P s P do    0     1       0 1       0     0     5      0
       F s F io    6     3       0 0       0     1     0      0
                                                                    In this study, we have proposed a method for the
Table 5: Frames: s=subject, do=direct object,                       comparative analysis of verbal argument struc-
io=indirect object; capital letters=Snomed semantic                 tures in medical subcorpora whose authors and in-
categories.                                                         tended readership have different levels of exper-
                                                                    tise, with a focus on lexical preference. The main
   Many frames were identified, Table 5 shows                       difference observed is that medical experts tend
the most frequent ones which are : F D, P F ,                       to choose verbal configurations with very specific
F F , P D, D F , F P , F D, P P , C D, D D.                         and technical meanings which apply to specific
These frames are all found in the four subcorpora                   situations, while non-experts use more generic and
but they tend to choose specific verbs depending                    common verbal configurations. Lexical choice
on the subcorpus. The difference mostly lies on                     differences often come with differences in the syn-
the lexical level with the choice of verbs. In the                  tactic constructions used. Indeed, medical expert
above-mentioned frames, the left side semantic                      writings are characterized by the frequent use of
class provokes or entails an effect or consequence                  a passive form with an omitted agent. The analy-
that is expressed by the right side category. Let us                sis of the two intermediary subcorpora shows that
take for example the relation Functions-Functions                   the expert and student subcorpora are close to each
(F F ), where a function of the organism has an                     other while the lay subcorpus is close to the fo-
effect on another function of the organism.                         rum. As far as the method is concerned, the use of
                                                                    a dependency parser seems to improve the results.
2) Exp: la prise de poidsF s’accompagne d’une
                                                                    However, a detailed evaluation of the parsing qual-
   élévation de la pression artérielleF (weight gain
                                                                    ity is still to be done. We are also planning to carry
      is followed by a rise in blood pressureF )
                                                                    out the analysis exemplified here on more verbs.
3) F or: une diaphorèseF intense accompagne
   souvent la douleurF (the painF is often followed
      by an intense diaphoresisF )                                  References
4) F or:      le     stressF    provoque    des                     Ted Briscoe and John Carroll. 1997. Automatic ex-
   spasmes vasculairesF      (stressF    causes                       traction of subcategorization from corpora. In In
      vascular spasmsF )                                              Proceedings of the ACL, pages 356–363.
                  Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)

                                                             106




Sabine Buchholz and Erwin Marsi. 2006. Conll-                     of verbs in biomedical texts. In Proc. of COLING,
   xshared task on multilingual dependency parsing. In            pages 449–456.
   In Proc. of CoNLL, pages 149–164.                           Vladimir Iosifovich Levenshtein. 1966. Binary codes
Jolanta Chmielik and Natalia Grabar. 2011. Détection             capable of correcting deletions, insertions and rever-
   de la spécialisation scientifique et technique des            sals. Soviet physics. Doklady, 707(10).
   documents biomédicaux grâce aux informations              Alexa McCray. 2005. Promoting health literacy. J of
   morphologiques. TAL, 51(2):151–179.                            Am Med Infor Ass, 12:152–163.
Christopher G. Chute, SP Cohn, KE Campbell,                    Cédric Messiant, Kata Gábor, and Thierry Poibeau.
   DE Oliver, and JR Campbell.             1996.    The           2010. Acquisition de connaissances lexicales à
   content coverage of clinical classifications. for              partir de corpus: la sous-catégorisation verbale en
   the computer-based patient record institute’s work             français. TAL, 51(1):65–96.
   group on codes & structures. J Am Med Inform As-            Jennifer Pearson. 1998. Terms in Context. John Ben-
   soc, 3(3):224–33.                                              jamins, Amsterdam/Philadelphia.
Anne Condamines and Didier Bourigault. 1999.                   Judita Preiss, Ted Briscoe, and Anna Korhonen. 2007.
   Alternance nom/verbe : explorations en corpus                  A system for large-scale acquisition of verbal, nom-
   spécialisés. In Cahiers de l’Elsap, pages 41–48,             inal and adjectival subcategorization frames from
   Caen, France.                                                  corpora. In Proceedings of ACL, volume 45, page
Ruxandra Cosma and Stefan Engelberg, 2013. Sub-                   912.
   jektsätze als alternative Valenzen im Deutschen und        Magdalena Putz. 2008. Approaching linguistic com-
   Rumänischen.                                                  plexity in medical care. International Journal of An-
Roger A. Côté, 1996. Répertoire d’anatomopathologie            thropology, 23(3-4):275–284.
   de la SNOMED internationale, v3.4. Université de           Douglas Roland and Daniel Jurafsky. 1998. How verb
   Sherbrooke, Sherbrooke, Québec.                               subcategorization frequencies are affected by corpus
Louise Deléger and Pierre Zweigenbaum. 2008. Para-               choice. In Proceedings of ACL, Montreal, Quebec,
   phrase acquisition from comparable medical cor-                Canada.
   pora of specialized and lay texts. In AMIA 2008,            Schulte im Walde. 2003. Experiments on the auto-
   pages 146–50.                                                  matic induction of german semantic verb classes.
Laurent Dominique, Sophie Nègre, and Patrick                     Technical report, Universität Stuttgart.
   Séguéla. 2009. L’ analyseur syntaxique Cordial            Catherine Smith and PJ Wicks. 2008. PatientsLikeMe:
   dans Passage. Actes de TALN, 9.                                Consumer health vocabulary as a folksonomy. In
Natalia Grabar and Thierry Hamon. 2014. Automatic                 Proceedings of the AMIA 2008 Symposium, pages
   extraction of layman names for technical medical               682–686.
   terms. In ICHI 2014, Pavia, Italy.                          Thi Mai Tran, H Chekroud, P Thiery, and A Julienne.
Stefan Gries and Anatol Stefanowitsch. 2004. Extend-              2009. Internet et soins : un tiers invisible dans la
   ing collostructional analysis. a corpus-based per-             relation médecine/patient ? Ethica Clinica, 53:34–
   spective on ”alternation”. IJCL, 9(1):97–129.                  43.
Gerhard Helbig. 1985. Valenz und kommunika-                    Ornella Wandji Tchami and Natalia Grabar. 2014. To-
   tion (ein wort zur diskussion). Deutsch als Fremd-             wards automatic distinction between specialized and
   sprache, 22:153–156.                                           non-specialized occurrences of verbs in medical cor-
Regina Jucks and R. Bromme. 2007. Choice of                       pora. In Proceedings of Computerm, pages 114–
   words in doctor-patient communication: an analy-               124, Dublin, Ireland, August.
   sis of health-related internet sites. Health Commun,        Ornella Wandji Tchami, MC L’Homme, and Natalia
   21(3):267–77.                                                  Grabar. 2013. Discovering semantic frames for a
Hadi Kharrazi. 2009. Improving healthy behaviors in               contrastive study of verbs in medical corpora. In
   type 1 diabetic patients by interactive frameworks.            TIA, Villetaneuse.
   In AMIA, pages 322–326.                                     Qing Zeng-Treiler and T Tse. 2006. Exploring and
Reinhard Köhler. 2005. Quantitative untersuchungen               developing consumer health vocabularies. JAMIA,
   zur valenz deutscher verben. Glotometrics, 9:13–               13:24–29.
   20.                                                         Qing Zeng-Treiler, Tony Tse, Guy Divita, Alla Ke-
Dimitrios Kokkinakis and M Toporowska Gronos-                     selman, Jon Crowell, and Allen C Browne. 2006.
   taj. 2006. Comparing lay and professional lan-                 Exploring lexical forms: first-generation consumer
   guage in cardiovascular disorders corpora. In James            health vocabularies. In AMIA 2006, pages 1155–
   Cook University Pham T., editor, WSEAS Transac-                1155.
   tions on Biology and Biomedicine, pages 429–437.
Anna Korhonen, Yuval Krymolowski, and Nigel Col-
   lier. 2008. The choice of features for classification