=Paper=
{{Paper
|id=Vol-1495/paper_20
|storemode=property
|title=Syntagmatic Behaviors of Verbs in Medical Texts: Expert Communication vs. Forums of Patients
|pdfUrl=https://ceur-ws.org/Vol-1495/paper_20.pdf
|volume=Vol-1495
|dblpUrl=https://dblp.org/rec/conf/tia/OrnellaGH15
}}
==Syntagmatic Behaviors of Verbs in Medical Texts: Expert Communication vs. Forums of Patients==
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
99
Syntagmatic Behaviors of Verbs in Medical Texts :
Expert Communication vs. Forums of Patients
Ornella Wandji Tchami, Natalia Grabar Ulrich Heid
STL UMR 8163 CNRS, Université Lille 3 IWIST, Universität Hildesheim
59653 Villeneuve d’Ascq, France Germany
ornwandji@yahoo.fr heidul@uni-hildesheim.de
natalia.grabar@univ-lille3.fr
Abstract simplification of the medical doctors’ vocabulary.
Researchers in NLP went further, proposing the
In this paper, we propose an automatic con- creation of lexicons which relate expert terminol-
trastive analysis of the behavior of verbs, ogy with expressions used by lay people (Zeng-
with regard to the semantic features of their
Treiler and Tse (2006), Deléger and Zweigen-
arguments (subject, direct object, indirect
object), within and across medical subcor-
baum (2008), Grabar and Hamon (2014)).
pora. We compare four medical subcor- In line with the studies mentioned above, we
pora with texts whose authors and intended are interested in the written communication be-
readership have different levels of expertise. tween medical experts and non-experts. We pro-
The semantic annotation of the subcorpora pose a comparative analysis of the distributions
is based on semantic information provided
of argument structures (and semantic patterns) in
by a medical terminology. Our results indi-
cate that the proposed procedures and tools French medical texts which have been classified
could be used for the automatic detection of and grouped according to their discursive speci-
different ways of expressing medical con- ficity (Pearson (1998)) and the respective level of
cepts and conceptual relations, according to expertise of the target public. More specifically,
the types of texts. we compare verbal arguments in four types of sub-
corpora, focusing on lexical preference and mak-
ing different hypotheses. We assume that medi-
1 Introduction
cal experts use more specific and specialised ver-
Research has shown that despite the growing body bal configurations (frames, co-occurrences, col-
of literature available to patients, communication locations (i.e preferred co-occurrences)) in order
between medical practitioners and patients is not to express medical concepts and the relations be-
always easy and successful. This situation is to tween them, while non-experts tend to use less
some extent due to linguistic complexity in med- specific configurations. Also we verify to which
ical care texts (Putz (2008)). Indeed, the avail- extent the semantic categories of the Snomed ter-
ability of medical information does not guarantee minology allow to distinguish these different con-
its readability and correct understanding. Stan- figurations. Our study is an extension to a pre-
dard medical language contains specific terminol- vious work where we looked at the syntactic and
ogy and specialised phraseology which is hard to semantic features of the elements surrounding the
understand for non-expert users (McCray (2005), verbs in the expert and forum subcorpora, with-
Zeng-Treiler et al. (2007)), and which can there- out taking into consideration the intermediary sub-
fore render the communication difficult (Jucks and corpora and the dependency relationships between
Bromme (2007), Tran et al. (2009)). Research the verbs and their arguments. This work is in-
into this issue has been conducted in sociology tended to highlight the relationship between ver-
(Kharrazi (2009), Chy et al. (2012)), in Medical bal argument structures and the different ways of
Informatics (Kokkinakis and Toporowska Gronos- expressing specialised concepts in texts written by
taj (2006), Smith and Wicks (2008)) and in Nat- people who have different levels of specialised
ural Language Processing (Zeng-Treiler and Tse medical knowledge. In fact, lexical preferences,
(2006), Chmielik and Grabar (2011)) in order to collocations, semantic category preferences and
identify the specificities of this communication. verb frames share the ability to express concepts
As one could expect, these studies suggested the and/or relations between concepts.
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
100
2 Studies of argument structures in that verb senses are closely related to types of dis-
corpora course, in such a way that both determine the fre-
quency of the different subcatgorisation schemes
Investigations into the distribution of argument of the verbs in the corpora.
structures of verbs have helped describe and un- Although they all look at verbal argument struc-
derstand the relationship between the verbs, the tures within different types of texts, none of the
argument structures they occur in and the seman- above-mentioned studies proposes the kind of ap-
tic classes to which they belong. These studies proach we are trying to develop. We propose
have shown the tendency of particular verbs to a study of subcategorisation schemes in medical
select a particular type of arguments, and the at- corpora that are differentiated according to their
traction of certain argument structures for partic- levels of specialization, and we use a medical ter-
ular verbs (Gries and Stefanowitsch (2004), Gries minology for the semantic annotation of the texts,
and Stefanowitsch (2010)). Some studies focus- to detect selectional restrictions and lexical prefer-
ing on verb valency patterns and their frequencies ences.
have revealed that verbs show certain preferences
with respect to their valency schemes and alter- 3 Material
nations (Köhler (2005), Engelberg (2009), Cosma
and Engelberg (2013)). Other researchers have au- The study is based on two types of material: cor-
tomatically induced verb classes from data on the pora distinguished by the levels of expertise of
distribution of valency patterns (Schulte im Walde their authors and intended readers (section 3.1)
(2003), Schulte im Walde (2009)). and a semantic resource (section 3.2), used for the
Quantitative data on argument structures are semantic annotation of the corpora.
also used for the construction of lexical classes, 3.1 Corpora
or to build a lexical organisation which predicts
much of the behaviour of a new word by associat- The corpus is made up of a set of four medical
ing it with an appropriate class. As far as English subcorpora of written French, which are distin-
is concerned, several studies were conducted for guished by their discursive specificities (Pearson,
the acquisition of subcategorisation information 1998) and the respective levels of expertise of their
from raw corpora (Briscoe and Carroll (1997); readership. The first three subcorpora come from
Preiss et al. (2007)). Some of these studies like the portal CISMeF1 , which indexes medical texts
Korhonen and Briscoe (2004) use subcategorisa- according to three different categories: texts for
tion frames for the extension of lexical-semantic medical experts, texts for medical students, texts
classifications. Others use them as main fea- for patients or non-experts. The fourth subcorpus
tures for the classification of verbs in specialised is made of texts written by non-experts. It con-
texts from the biomedical domain (Korhonen et tains discussions between patients and/or persons
al. (2008)). Only recently, French has become participating in a forum called Doctissimo, Hyper-
the target of such research. Chesley and Salmon- tension, Problèmes Cardiaques (Doctissimo, Hy-
Alt (2006) carried out an exploratory study of 104 pertension, heart problems)2 .
common verbs that allowed them to identify 27 Corpus Size Verb occ. pron. occ. description
C1 / expert 1,285,665 52529 1349 scientific publications
subcategorisation schemes. More recently, Mes- and reports
C2 / student 384,381 22092 920 didactic supports
siant et al. (2010) have implemented a method to created for students
C3 / patient 253,968 19421 1176 documentation
automatically acquire a syntactic lexicon of sub- and brochures
C4 / forum 1,588,697 184843 8261 forum messages
categorisation frames for French verbs from large from participants
corpora.
It has been shown that the neighborhood of a Table 1: Size of the subcorpora used
verb can be different according to the type of text
in which the verb appears (Helbig (1985), Wandji Table 1 indicates the size of the four subcorpora
Tchami et al. (2013), Wandji Tchami and Grabar (number of tokens) and the number of verbal oc-
(2014)). Roland and Jurafsky (1998) analyse how 1
http://www.cismef.org/
the frequency of verb subcatgorisation schemes is 2
http://forum.doctissimo.fr/sante/hypertension-
affected by corpus choice. This study has revealed problemes-cardiaques/liste sujet-1.htm
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
101
currences per subcorpus; the rightmost column in- C: Chemical products (e.g., médicament (medi-
dicates how many verbal occurrences per subcor- cation), sodium, héparine (heparin), bleu de
pus have pronominal arguments (which will not méthylène (methylene blue));
be resolved and thus not counted in this study). A: Physical agents and artefacts (e.g., cathéter
As can be seen, the expert and forum corpora are (catheter), prothèse (prosthesis), tube (tube)).
almost equal in size, while the student and the lay
persons’ corpora are much smaller, but also simi- In our approach, the semantic categories of the
lar in size. We make the assumption that the au- Snomed International terminology are considered
thors of the four subcorpora represent actors of the as ontological categories used for the characterisa-
medical domain, who have different levels of ex- tion of the verbal arguments. The used version of
pertise as far as the use of specialised medical lan- Snomed contains 144 267 entries (mainly French
guage is concerned. nouns, noun phrases and adjectives). We used it
for the semantic annotation of our corpus. The
3.2 Semantic resource
Snomed entries may not necessarily cover all do-
We use the Snomed International Terminology main notions in our texts (Chute et al., 1996). For
(Côté (1996)) which groups medical terms into this reason, in a previous study, we attempted to
eleven semantic categories, of which nine are con- complete the coverage of the terminology in re-
sidered in this study3 . This terminology was cho- lation with the corpus used (Wandji Tchami and
sen because it is one of the largest medical termi- Grabar (2014)). We computed the plural forms
nologies available for French. of Snomed’s single word terms, and we tried to
detect misspellings of the terms by means of the
T : Topography or anatomical locations (e.g., coeur string edit distance (Levenshtein, 1966). In both
(heart), cardiaque (cardiac), digestif (digestive), cases, the computed forms inherit the semantic
vaisseau (vessel)); type of the terms from the Snomed. In this way,
S: Social status (e.g., mari (husband), soeur (sister), 14 035 entries were added to the terminology.
mère (mother), ancien fumeur (former smoker),
donneur (donor)); 4 Method
P: Procedures (e.g., césarienne (caesarean), trans- The method applied in this study aims at de-
ducteur ultrasons (ultrasound transducer), télé- scribing and comparing the argument structures of
expertise (tele-expertise)); verbs in different types of subcorpora, with a par-
ticular focus on selectional restrictions and lexical
L: Living organisms, such as bacteries and
preferences. The tools and procedures used allow
viruses (e.g., Bacillus, Enterobacter, Klebsiella,
us to detect collocations and different ways of ex-
Salmonella); plants (e.g., fougère (fern), pomme
pressing concepts and conceptual relations. In or-
de terre (potato)), but also animals (e.g., singe
der to achieve our aim, we follow 3 main steps:
(monkey), chien dalmatien (dalmatian dog));
the corpus pre-processing and annotation (syntac-
J : Professional occupations (e.g., équipe de SAMU tic and semantic) (section 4.1), the extraction of
(ambulance team), anesthésiste (anesthesiologist), verbal argument structures and co-occurrence data
assureur (insurer), magasinier (storekeeper)); (section 4.2), both performed automatically and
F: Functions and dysfunctions of the organ- followed by a manual analysis (section 4.3) which
ism (e.g., pression artérielle (arterial pres- aims at contrasting and interpreting the automati-
sure), métabolique (metabolic), protéinurie (pro- cally extracted data.
teinuria), détresse (distress), insuffisance (defi-
4.1 Corpus pre-processing and annotation
ciency));
The subcorpora have all been downloaded from
D: Disorders and pathologies (e.g., obésité (obe- the above-mentioned online sources, converted
sity), hypertension artérielle (arterial hyperten- into plain text and recoded in UTF-8 format. The
sion), cancer (cancer), maladie (disease)); syntactic analysis of sentences is performed with
3
The two semantic classes containing modifiers are not the Cordial dependency parser (Dominique et al.,
taken into consideration in this study. 2009). Its output contains sentences in a tabulated
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
102
format similar to the CONLL format (Buchholz helps to identify semantic groups of verbs express-
and Marsi, 2006). In this format, a sentence con- ing similar concepts and conceptual relations be-
sists of one or more tokens, each one annotated tween the verb arguments.
with thirteen fields, separated by a tab character. After processing all the verbs found in the
Among these fields, the syntactic function and the different subcorpora, 11 verbs were selected for
pivot verb are the main information that allow us a more detailed case study : augmenter (add),
to extract the verbs and their arguments. évaluer (evaluate), exposer (expose), subir (un-
The syntactically annotated sentences are then dergo), prescrire (prescribe), provoquer (provoke),
processed with Perl programs that perform the se- accompagner (accompany), suivre (follow), causer
mantic annotation by projecting the resource de- (cause), baisser (lower), and entraı̂ner (lead to).
scribed in Section 3.2 onto the lemmatised sen- These verbs were selected according to two main
tences. The categories of the terminology add criteria:
semantic information to the syntactic patterns of
verbs. Hence, at the end of this stage, each • Frequency: the verbs should have at least 20
verb argument appearing in the terminology is la- occurrences each, in at least two of the subcor-
beled with a semantic category, in addition to its pora;
syntactic function; such pair constitutes what we • Types of verbs: we tried to choose not only
call a specialised configuration or frame while a verbs that intuitively tend to have specialised
pair whose argument has no Snomed categories is usages in specialised domain texts, but also gen-
considered as a non specialised configuration or eral language verbs like accompagner, baisser,
frame. suivre etc.; The tendency to co-occur frequently
with particular terms was also taken into con-
4.2 Extraction of verbal argument structures sideration, since we focus on lexical preference
and of verb+noun co-occurrence and collocation.
The sets of sentences annotated at the pre-
4.3 Comparative analysis of verbal behaviors
vious step are processed with Perl programs
that extract argument structures involving the The comparative analysis is done manually and
Snomed categories of terms, when provided by aims at highlighting the differences and similar-
Snomed, as in Table 2 (V+Su/Scat+DO/Scat, ities of the subcorpora with regard to selectional
V+Su/Scat+DO/Scat+IO/Scat) and pairs of restrictions and lexical preferences. We compare
V+Su/Scat, V+DO/Scat and V+IO/Scat4 . the frequency of verbal configurations (pairs of
For each verb, the most frequent cooccurring verb+argument or frames) across the subcorpora.
objects are automatically extracted and their corre- This analysis addresses different aspects : the ar-
sponding frequencies are computed from all sub- guments (terms) cooccurring with verbs, the verbs
corpora. Indeed, in 5.1 and 5.2, we focus partic- cooccurring with those arguments, the different
ularly on direct objects, except with the verb ex- frames verbs frequently appear in, and argument
poser for which we have considered the subject structures expressing similar conceptual relations.
(patientS+exposer) and the indirect object (ex- The results are discussed in Section 5.1.
poser un risque) (Table 2).
5 Results and Discussion
For a given verb A, after extracting its most fre-
quent objects from the corpora, we automatically 5.1 Terms cooccurring with verbs
extract further verbs that frequently combine with The data provided in Table 2 lead to several ob-
A’s objects, most particularly those which are se- servations. Some verbs frequently select terms
mantically close to A, and we compute the fre- from a particular Snomed category, mostly spe-
quency of all verb+Object pairs (see Tables 3 and cific terms, in a particular subcorpus, while in
4). These data function as indicators of the phe- the other subcorpora this co-occurrence never hap-
nomena observed on the medical language of ex- pens or only happens scarcely. This phenomenon
perts and non experts. Indeed, this experiment is particularly striking with verbs like prescire and
4
V=verb, Su=sujet, DO, direct Object, IO=indirect Object subir. In the forum and sometimes in the lay
Scat=Snomed category subcorpus, these verbs frequently combine with
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
103
Verbs Nominal cooccurrents
Arguments exp stu lay f or lation, subir un AVC, suivre un régime) borrowed
prescrire traitement P 3 0 0 7
examenP 0 0 2 7 from the medical experts’ language. According to
médicament C 0 0 7 26
subir ablationP 0 0 0 39
researchers in Consumer Health Literature, such
interventionP
AVCD
6
0
0
0
1
2
30
12
mixed phraseology is the result of social and cul-
augmenter tensionF
risque/risque deF
0
26
0
8
7
5
14
7
tural influence on language and they are acquired
baisser tensionF 0 0 4 18 from formal and informal sources such as the in-
exposer à+risqueF 14 8 0 3
patient S 23 5 1 0 ternet (Zeng-Treiler et al. (2006), Zeng-Treiler and
suivre apparition de symptômesF 5 0 0 0
patient S 6 0 0 0 Tse (2006)). The frequent use of these expressions
régimeF 1 0 0 5
conseil 0 0 4 10 makes them progressively become part of every-
traitement P 2 2 1 13
évaluer patient S 7 0 0 0 day language. This could be a plausible explana-
indication 6 0 0 0
risqueF 9 2 0 1 tion for the high frequency of expressions like pre-
scrire un médicament, subir une ablation or subir
Table 2: Most frequent verb/arg pairs: capital let-
un AVC, in the forum texts.
ter=the Snomed category, no capital letter=no category
provided Secondly, looking at the results from the expert
subcorpus to the forum subcorpus, we notice that
sometimes the frequency difference is not very im-
terms belonging to category P (procedures); more portant. The explanation given above could once
specifically, prescrire seems to have an attraction more apply here. Indeed, medical technical terms
for the terms traitement and examen, while subir are quite often used by non-experts to describe
has a strong attraction for intervention and abla- medical concepts. On the other hand, when a
tion (which refers to a type of medical intervention verbal combination involving a particular Snomed
(hyponym)). Prescrire also combines frequently category is very frequent in the expert subcorpus
with names of chemical products (C) and shows like exposer + name of a medication (votre pa-
a particular attraction for the term médicament, tiente est exposée au ramipril), évaluer + function
while subir prefers terms referring to disorders (évaluer un risque) while the verb is totally ab-
and diseases (D), and more precisely the term sent or very rare in the other subcorpora, we might
AVC (stroke). These are preferred co-occurrences deal with a highly specialised (expert) or expert
which are therefore seen as collocations. language-specific usage of the verb.
Such collocations may involve polysemous
5.2 Lexical preferences of the arguments for
verbs and their different readings. For exam-
verbs
ple, in the expert subcorpus (and sometimes in
the student subcorpus), évaluer and suivre tend The results of Section 5.1 give an account of the
to appear frequently with terms referring to func- lexical preferences of the verbs within and across
tions of the organism (F ) or to Social status (S). the subcorpora. In this section, we investigate the
Évaluer seems to be attracted by risque, indication lexical preferences of nominals in the expert and
and patient. Évaluer+F means to mesure, deter- forum subcorpora. Tables 3 and 4 give the results
mine, calculate, gauge, quantify, while évaluer+S of this experiment. These data were obtained as
means to examine. described in Section 4.2. The blue color repre-
The differences in verb/arg pair frequencies can sents the processed verb, the entries in the col-
lead to different interpretations. First of all, when umn Arguments are the most frequent arguments
the frequency difference is very important from of the processed verb, and the red color represents
the forum subcorpus to the expert subcorpus, this a semantic group of verbs frequently combining
may signal some specificities of the laypersons’ with the corresponding argument in the given cor-
language. Indeed, while health care specialists pus. The numbers in bracket show the frequency
share foundational domain knowledge based on of each pair verb+arg.
formal education and professional experience, the Depending on the corpus, certain terms fre-
patients’ or non experts’ medical language is char- quently combine with particular verbs, in order
acterised by the use of common expressions and to express a particular concept. For instance, as
collocations, sometimes involving technical medi- we can see in Table 2, the terms médicament and
cal terms (prescrire un médicament, subir une ab- traitement are prescrire’s favourite cooccurrents
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
104
Arguments Verbal cooccurrents
Expert Forum mentioned verbs and is quite reccurrent with other
médicament indiquer(3), recommander(2)
proposer(2) verbs.
traitement proposer(8), envisager(7) prescrire
recommander(3), imposer(3) The lexical choice difference within subcorpora
examen imposer(1), proposer(1)
recommander(1), autoriser(1) does not only concern terms. Verbs also select
intervention -
ablation faire(1) subir particular terms to combine with, depending on
AVC prsenter(4), faire(2), avoir(2)
tension - baisser the subcorpora. For example, in the forum sub-
régime -
conseil considrer(1) suivre corpus, the verb suivre frequently co-occurs with
traitement recevoir(12), bénéficier(6)
faire(6), poursuivre(3),
the term conseil, while in the expert subcorpus,
tension - augmenter
the term conseil does not combine with this verb.
Table 3: Lexical preferences of arguments in the expert Instead, suivre combines with indication. The lat-
subcorpus. ter and mainly the term recommandation, which
are semantically close to conseil, are very fre-
Arguments Verbal cooccurrents quent in the expert subcorpus. They appear in
Forum Expert
patient traiter(1), voir(1) positions where conseil could appear. For exam-
apparition de - suivre
symptome expliquer (5) ple, recommandation is combined with verbs like
risque mesurer(1), juger(1), exposer(23)
patient - évaluer proposer (4), appliquer (4), actualiser (8), publier
indication apprécier(1)
risque accroı̂tre(3), multiplier(2) augmenter (4), élaborer (2) and faire (3). This seems to show
élever(1),
that the experts prefer to talk about recommanda-
Table 4: Lexical preferences of arguments in the forum tions and indications which have specific and tech-
subcorpora. nical meanings, while laypersons are more famil-
iar with the term conseil which is a common word.
Another observation was made based on the
in the forum and sometimes in the lay subcorpus, experiment carried out. In the forum subcorpus
while in the expert subcorpus, the terms frequently baisser and augmenter frequently co-occur with
co-occur with the verbs indiquer, recommander, the term tension (augmenter la tension (increase
proposer, and envisager, recommander, proposer, blood pressure), baisser la tension (reduce blood
imposer, respectively. pressure) (see Table 2)), expressing different states
of the blood pressure. In the expert subcorpus,
1) Ces médicaments ne sont plus recommandés
none of these collocations were found. In ad-
en première intention dans le traitement de
dition, among the verbs combining with tension
l’hypertension (These drugs are no longer recom-
in the expert subcorpus, none is semantically re-
mended as first-line in the treatment of hyperten-
lated to the two verbs. However, we have no-
sion)
ticed the presence of verb based nominalisations,
Although the two groups of verbs combine with constructions requiring support verbs or relational
the same terms, in the professionnal language, adjectives, which are synonymous with the two
these verbs are not semantically equivalent, they above-mentioned collocations : élévation tension-
correspond to different levels of evidence. In- nelle (4), and hausse de tension (1) correspond
deed, they are used by medical experts to express to augmenter la tension, while réduction tension-
the relevance of prescribing a given drug or treat- nelle (2), abaissement tensionnel (2) and baisse de
ment for a given disease. In contrast, patients just tension (4) have the same meaning as baisser la
know about the drug or treatment they have been tension.
prescribed for their disease but do not necessar- This phenomenon is consistent with the results
ily know about these distinctions. These examples obtained in a previous study (Wandji Tchami and
highlight a very relevant difference in the way ex- Grabar (2014)) and with Condamines and Bouri-
perts and non-experts use verbal configurations : gault (1999)’s findings which confirmed the fact
the first choose very specific and technical config- that nominal entities tend to be more frequent in
urations while the others use more general ones. expert texts than in non-expert texts. The above
In the expert subcorpus, several sentences are data demonstrate that the difference between the
in the passive voice with an omitted agent, as in expert and forum texts does not lie in verbs alone,
Example 1. This applies to some of the above- but mostly in the different types of constructions
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
105
the verbs are involved in (support verb, para- As we can see from the data provided in Table 5,
phrase, verb-based nominalisation, etc.). in the expert subcorpus, this conceptual relation is
frequently expressed with the verbs accompagner
5.3 Verbal frames and conceptual relations
and entraı̂ner while in the forum texts the verbs
Table 5 shows frames which represent different provoquer and causer are the most used. This re-
ways of expressing the cause-effect conceptual re- mark also applies for the other above-mentioned
lation. The data were extracted from the subcor- frames. Collocational differences between expert
pora, through the analysis of frames of accompag- and forum verb use also involve differences in va-
ner, causer, provoquer, and entraı̂ner which are lency and syntactic construction. In Example 2,
causative verbs. We are aware of the fact that the verb accompagner is in a pronominal form
some of the numbers presented in this table are with a reflexive pronoun se/s’; this construction is
not high enough to draw conclusions. However, the most used one in the expert subcorpus, and in
we found it important to report them because they the table, it is represented by the presence of the
might highlight phenomena that could be further indirect object in the frame.
analysed in future work, with more data. Another tendency observed in the expert sub-
verbs accompagner causer provoquer entraı̂ner corpus is the frequent use of the passive voice with
frames pro for pro for pro for pro for
C s D do 1 0 2 1 0 11 3 1 a syntactically omitted agent, while in the forum
C s F do 1 0 0 1 1 5 3 0
D s D io 5 3 0 0 0 0 0 0
subcorpus, the active voice is the most used. This
C s D do 3 0 3 5 0 10 6 0
D s F do 5 1 1 1 3 8 3 0
observation was already underlined in Section 5.2
F s F do
F s D do
4
1
3
0
4 5
1 0
0
0
32
12
3
5
2
1
with recommander, indiquer and proposer.
F s P do 0 2 1 0 3 0 2 1
P s F io 5 0 0 0 0 0 0 0
P s D do
P s F do
2
0
0
0
0 0
0 0
2
1
2
0
4
5
7
0
6 Conclusion and Perspectives
P s P do 0 1 0 1 0 0 5 0
F s F io 6 3 0 0 0 1 0 0
In this study, we have proposed a method for the
Table 5: Frames: s=subject, do=direct object, comparative analysis of verbal argument struc-
io=indirect object; capital letters=Snomed semantic tures in medical subcorpora whose authors and in-
categories. tended readership have different levels of exper-
tise, with a focus on lexical preference. The main
Many frames were identified, Table 5 shows difference observed is that medical experts tend
the most frequent ones which are : F D, P F , to choose verbal configurations with very specific
F F , P D, D F , F P , F D, P P , C D, D D. and technical meanings which apply to specific
These frames are all found in the four subcorpora situations, while non-experts use more generic and
but they tend to choose specific verbs depending common verbal configurations. Lexical choice
on the subcorpus. The difference mostly lies on differences often come with differences in the syn-
the lexical level with the choice of verbs. In the tactic constructions used. Indeed, medical expert
above-mentioned frames, the left side semantic writings are characterized by the frequent use of
class provokes or entails an effect or consequence a passive form with an omitted agent. The analy-
that is expressed by the right side category. Let us sis of the two intermediary subcorpora shows that
take for example the relation Functions-Functions the expert and student subcorpora are close to each
(F F ), where a function of the organism has an other while the lay subcorpus is close to the fo-
effect on another function of the organism. rum. As far as the method is concerned, the use of
a dependency parser seems to improve the results.
2) Exp: la prise de poidsF s’accompagne d’une
However, a detailed evaluation of the parsing qual-
élévation de la pression artérielleF (weight gain
ity is still to be done. We are also planning to carry
is followed by a rise in blood pressureF )
out the analysis exemplified here on more verbs.
3) F or: une diaphorèseF intense accompagne
souvent la douleurF (the painF is often followed
by an intense diaphoresisF ) References
4) F or: le stressF provoque des Ted Briscoe and John Carroll. 1997. Automatic ex-
spasmes vasculairesF (stressF causes traction of subcategorization from corpora. In In
vascular spasmsF ) Proceedings of the ACL, pages 356–363.
Proceedings of the conference Terminology and Artificial Intelligence 2015 (Granada, Spain)
106
Sabine Buchholz and Erwin Marsi. 2006. Conll- of verbs in biomedical texts. In Proc. of COLING,
xshared task on multilingual dependency parsing. In pages 449–456.
In Proc. of CoNLL, pages 149–164. Vladimir Iosifovich Levenshtein. 1966. Binary codes
Jolanta Chmielik and Natalia Grabar. 2011. Détection capable of correcting deletions, insertions and rever-
de la spécialisation scientifique et technique des sals. Soviet physics. Doklady, 707(10).
documents biomédicaux grâce aux informations Alexa McCray. 2005. Promoting health literacy. J of
morphologiques. TAL, 51(2):151–179. Am Med Infor Ass, 12:152–163.
Christopher G. Chute, SP Cohn, KE Campbell, Cédric Messiant, Kata Gábor, and Thierry Poibeau.
DE Oliver, and JR Campbell. 1996. The 2010. Acquisition de connaissances lexicales à
content coverage of clinical classifications. for partir de corpus: la sous-catégorisation verbale en
the computer-based patient record institute’s work français. TAL, 51(1):65–96.
group on codes & structures. J Am Med Inform As- Jennifer Pearson. 1998. Terms in Context. John Ben-
soc, 3(3):224–33. jamins, Amsterdam/Philadelphia.
Anne Condamines and Didier Bourigault. 1999. Judita Preiss, Ted Briscoe, and Anna Korhonen. 2007.
Alternance nom/verbe : explorations en corpus A system for large-scale acquisition of verbal, nom-
spécialisés. In Cahiers de l’Elsap, pages 41–48, inal and adjectival subcategorization frames from
Caen, France. corpora. In Proceedings of ACL, volume 45, page
Ruxandra Cosma and Stefan Engelberg, 2013. Sub- 912.
jektsätze als alternative Valenzen im Deutschen und Magdalena Putz. 2008. Approaching linguistic com-
Rumänischen. plexity in medical care. International Journal of An-
Roger A. Côté, 1996. Répertoire d’anatomopathologie thropology, 23(3-4):275–284.
de la SNOMED internationale, v3.4. Université de Douglas Roland and Daniel Jurafsky. 1998. How verb
Sherbrooke, Sherbrooke, Québec. subcategorization frequencies are affected by corpus
Louise Deléger and Pierre Zweigenbaum. 2008. Para- choice. In Proceedings of ACL, Montreal, Quebec,
phrase acquisition from comparable medical cor- Canada.
pora of specialized and lay texts. In AMIA 2008, Schulte im Walde. 2003. Experiments on the auto-
pages 146–50. matic induction of german semantic verb classes.
Laurent Dominique, Sophie Nègre, and Patrick Technical report, Universität Stuttgart.
Séguéla. 2009. L’ analyseur syntaxique Cordial Catherine Smith and PJ Wicks. 2008. PatientsLikeMe:
dans Passage. Actes de TALN, 9. Consumer health vocabulary as a folksonomy. In
Natalia Grabar and Thierry Hamon. 2014. Automatic Proceedings of the AMIA 2008 Symposium, pages
extraction of layman names for technical medical 682–686.
terms. In ICHI 2014, Pavia, Italy. Thi Mai Tran, H Chekroud, P Thiery, and A Julienne.
Stefan Gries and Anatol Stefanowitsch. 2004. Extend- 2009. Internet et soins : un tiers invisible dans la
ing collostructional analysis. a corpus-based per- relation médecine/patient ? Ethica Clinica, 53:34–
spective on ”alternation”. IJCL, 9(1):97–129. 43.
Gerhard Helbig. 1985. Valenz und kommunika- Ornella Wandji Tchami and Natalia Grabar. 2014. To-
tion (ein wort zur diskussion). Deutsch als Fremd- wards automatic distinction between specialized and
sprache, 22:153–156. non-specialized occurrences of verbs in medical cor-
Regina Jucks and R. Bromme. 2007. Choice of pora. In Proceedings of Computerm, pages 114–
words in doctor-patient communication: an analy- 124, Dublin, Ireland, August.
sis of health-related internet sites. Health Commun, Ornella Wandji Tchami, MC L’Homme, and Natalia
21(3):267–77. Grabar. 2013. Discovering semantic frames for a
Hadi Kharrazi. 2009. Improving healthy behaviors in contrastive study of verbs in medical corpora. In
type 1 diabetic patients by interactive frameworks. TIA, Villetaneuse.
In AMIA, pages 322–326. Qing Zeng-Treiler and T Tse. 2006. Exploring and
Reinhard Köhler. 2005. Quantitative untersuchungen developing consumer health vocabularies. JAMIA,
zur valenz deutscher verben. Glotometrics, 9:13– 13:24–29.
20. Qing Zeng-Treiler, Tony Tse, Guy Divita, Alla Ke-
Dimitrios Kokkinakis and M Toporowska Gronos- selman, Jon Crowell, and Allen C Browne. 2006.
taj. 2006. Comparing lay and professional lan- Exploring lexical forms: first-generation consumer
guage in cardiovascular disorders corpora. In James health vocabularies. In AMIA 2006, pages 1155–
Cook University Pham T., editor, WSEAS Transac- 1155.
tions on Biology and Biomedicine, pages 429–437.
Anna Korhonen, Yuval Krymolowski, and Nigel Col-
lier. 2008. The choice of features for classification