<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Syntagmatic Behaviors of Verbs in Medical Texts : Expert Communication vs. Forums of Patients</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Ornella Wandji Tchami, Natalia Grabar</string-name>
          <email>natalia.grabar@univ-lille3.fr</email>
          <email>ornwandji@yahoo.fr</email>
          <email>ornwandji@yahoo.fr natalia.grabar@univ-lille3.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ulrich Heid</string-name>
          <email>heidul@uni-hildesheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>IWIST, Universita ̈t Hildesheim</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>STL UMR 8163 CNRS, Universite ́ Lille 3</institution>
          ,
          <addr-line>59653 Villeneuve d'Ascq</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <fpage>99</fpage>
      <lpage>106</lpage>
      <abstract>
        <p>In this paper, we propose an automatic contrastive analysis of the behavior of verbs, with regard to the semantic features of their arguments (subject, direct object, indirect object), within and across medical subcorpora. We compare four medical subcorpora with texts whose authors and intended readership have different levels of expertise. The semantic annotation of the subcorpora is based on semantic information provided by a medical terminology. Our results indicate that the proposed procedures and tools could be used for the automatic detection of different ways of expressing medical concepts and conceptual relations, according to the types of texts.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>Research has shown that despite the growing body
of literature available to patients, communication
between medical practitioners and patients is not
always easy and successful. This situation is to
some extent due to linguistic complexity in
medical care texts (Putz (2008)). Indeed, the
availability of medical information does not guarantee
its readability and correct understanding.
Standard medical language contains specific
terminology and specialised phraseology which is hard to
understand for non-expert users (McCray (2005),
Zeng-Treiler et al. (2007)), and which can
therefore render the communication difficult (Jucks and
Bromme (2007), Tran et al. (2009)). Research
into this issue has been conducted in sociology
(Kharrazi (2009), Chy et al. (2012)), in Medical
Informatics (Kokkinakis and Toporowska
Gronostaj (2006), Smith and Wicks (2008)) and in
Natural Language Processing (Zeng-Treiler and Tse
(2006), Chmielik and Grabar (2011)) in order to
identify the specificities of this communication.
As one could expect, these studies suggested the
simplification of the medical doctors’ vocabulary.
Researchers in NLP went further, proposing the
creation of lexicons which relate expert
terminology with expressions used by lay people
(ZengTreiler and Tse (2006), Dele´ger and
Zweigenbaum (2008), Grabar and Hamon (2014)).</p>
      <p>In line with the studies mentioned above, we
are interested in the written communication
between medical experts and non-experts. We
propose a comparative analysis of the distributions
of argument structures (and semantic patterns) in
French medical texts which have been classified
and grouped according to their discursive
specificity (Pearson (1998)) and the respective level of
expertise of the target public. More specifically,
we compare verbal arguments in four types of
subcorpora, focusing on lexical preference and
making different hypotheses. We assume that
medical experts use more specific and specialised
verbal configurations (frames, co-occurrences,
collocations (i.e preferred co-occurrences)) in order
to express medical concepts and the relations
between them, while non-experts tend to use less
specific configurations. Also we verify to which
extent the semantic categories of the Snomed
terminology allow to distinguish these different
configurations. Our study is an extension to a
previous work where we looked at the syntactic and
semantic features of the elements surrounding the
verbs in the expert and forum subcorpora,
without taking into consideration the intermediary
subcorpora and the dependency relationships between
the verbs and their arguments. This work is
intended to highlight the relationship between
verbal argument structures and the different ways of
expressing specialised concepts in texts written by
people who have different levels of specialised
medical knowledge. In fact, lexical preferences,
collocations, semantic category preferences and
verb frames share the ability to express concepts
and/or relations between concepts.</p>
    </sec>
    <sec id="sec-2">
      <title>Studies of argument structures in corpora</title>
      <p>
        Investigations into the distribution of argument
structures of verbs have helped describe and
understand the relationship between the verbs, the
argument structures they occur in and the
semantic classes to which they belong. These studies
have shown the tendency of particular verbs to
select a particular type of arguments, and the
attraction of certain argument structures for
particular verbs (Gries and
        <xref ref-type="bibr" rid="ref11">Stefanowitsch (2004</xref>
        ), Gries
and Stefanowitsch (2010)). Some studies
focusing on verb valency patterns and their frequencies
have revealed that verbs show certain preferences
with respect to their valency schemes and
alternations (Ko¨hler (2005), Engelberg (2009), Cosma
and Engelberg (2013)). Other researchers have
automatically induced verb classes from data on the
distribution of valency patterns (
        <xref ref-type="bibr" rid="ref25">Schulte im Walde
(2003</xref>
        ), Schulte im Walde (2009)).
      </p>
      <p>
        Quantitative data on argument structures are
also used for the construction of lexical classes,
or to build a lexical organisation which predicts
much of the behaviour of a new word by
associating it with an appropriate class. As far as English
is concerned, several studies were conducted for
the acquisition of subcategorisation information
from raw corpora (Briscoe and Carroll (1997);
Preiss et al. (2007)). Some of these studies like
Korhonen and Briscoe (2004) use
subcategorisation frames for the extension of lexical-semantic
classifications. Others use them as main
features for the classification of verbs in specialised
texts from the biomedical domain (Korhonen et
al. (2008)). Only recently, French has become
the target of such research. Chesley and
SalmonAlt (2006) carried out an exploratory study of 104
common verbs that allowed them to identify 27
subcategorisation schemes. More re
        <xref ref-type="bibr" rid="ref20">cently,
Messiant et al. (2010</xref>
        ) have implemented a method to
automatically acquire a syntactic lexicon of
subcategorisation frames for French verbs from large
corpora.
      </p>
      <p>It has been shown that the neighborhood of a
verb can be different according to the type of text
in which the verb appears (Helbig (1985), Wandji
Tchami et al. (2013), Wandji Tchami and Grabar
(2014)). Roland and Jurafsky (1998) analyse how
the frequency of verb subcatgorisation schemes is
affected by corpus choice. This study has revealed
that verb senses are closely related to types of
discourse, in such a way that both determine the
frequency of the different subcatgorisation schemes
of the verbs in the corpora.</p>
      <p>Although they all look at verbal argument
structures within different types of texts, none of the
above-mentioned studies proposes the kind of
approach we are trying to develop. We propose
a study of subcategorisation schemes in medical
corpora that are differentiated according to their
levels of specialization, and we use a medical
terminology for the semantic annotation of the texts,
to detect selectional restrictions and lexical
preferences.
3</p>
    </sec>
    <sec id="sec-3">
      <title>Material</title>
      <p>The study is based on two types of material:
corpora distinguished by the levels of expertise of
their authors and intended readers (section 3.1)
and a semantic resource (section 3.2), used for the
semantic annotation of the corpora.
3.1</p>
      <sec id="sec-3-1">
        <title>Corpora</title>
        <p>
          The corpus is made up of a set of four medical
subcorpora of written French, which are
distinguished by their discursive specificities
          <xref ref-type="bibr" rid="ref21">(Pearson,
1998)</xref>
          and the respective levels of expertise of their
readership. The first three subcorpora come from
the portal CISMeF1, which indexes medical texts
according to three different categories: texts for
medical experts, texts for medical students, texts
for patients or non-experts. The fourth subcorpus
is made of texts written by non-experts. It
contains discussions between patients and/or persons
participating in a forum called Doctissimo,
Hypertension, Proble`mes Cardiaques (Doctissimo,
Hypertension, heart problems)2.
        </p>
        <p>Corpus
C1 / expert
C2 / student
C3 / patient
C4 / forum</p>
        <p>Size
1,285,665
384,381
253,968
1,588,697
currences per subcorpus; the rightmost column in- C: Chemical products (e.g., me´dicament
(medidicates how many verbal occurrences per subcor- cation), sodium, he´parine (heparin), bleu de
pus have pronominal arguments (which will not me´thyle`ne (methylene blue));
be resolved and thus not counted in this study). A: Physical agents and artefacts (e.g., cathe´ter
As can be seen, the expert and forum corpora are (catheter), prothe`se (prosthesis), tube (tube)).
almost equal in size, while the student and the lay
persons’ corpora are much smaller, but also
similar in size. We make the assumption that the
authors of the four subcorpora represent actors of the
medical domain, who have different levels of
expertise as far as the use of specialised medical
language is concerned.
3.2</p>
      </sec>
      <sec id="sec-3-2">
        <title>Semantic resource</title>
        <p>We use the Snomed International Terminology
(Coˆte´ (1996)) which groups medical terms into
eleven semantic categories, of which nine are
considered in this study3. This terminology was
chosen because it is one of the largest medical
terminologies available for French.</p>
        <p>T : Topography or anatomical locations (e.g., coeur
(heart), cardiaque (cardiac), digestif (digestive),
vaisseau (vessel));
S: Social status (e.g., mari (husband), soeur (sister),
me`re (mother), ancien fumeur (former smoker),
donneur (donor));
P: Procedures (e.g., ce´sarienne (caesarean),
transducteur ultrasons (ultrasound transducer),
te´le´expertise (tele-expertise));
L: Living organisms, such as bacteries and
viruses (e.g., Bacillus, Enterobacter, Klebsiella,
Salmonella); plants (e.g., fouge`re (fern), pomme
de terre (potato)), but also animals (e.g., singe
(monkey), chien dalmatien (dalmatian dog));
J : Professional occupations (e.g., e´quipe de SAMU
(ambulance team), anesthe´siste (anesthesiologist),
assureur (insurer), magasinier (storekeeper));
F : Functions and dysfunctions of the
organism (e.g., pression arte´rielle (arterial
pressure), me´tabolique (metabolic), prote´inurie
(proteinuria), de´tresse (distress), insuffisance
(deficiency));
D: Disorders and pathologies (e.g., obe´site´
(obesity), hypertension arte´rielle (arterial
hypertension), cancer (cancer), maladie (disease));
3The two semantic classes containing modifiers are not
taken into consideration in this study.</p>
        <p>
          In our approach, the semantic categories of the
Snomed International terminology are considered
as ontological categories used for the
characterisation of the verbal arguments. The used version of
Snomed contains 144 267 entries (mainly French
nouns, noun phrases and adjectives). We used it
for the semantic annotation of our corpus. The
Snomed entries may not necessarily cover all
domain notions in our texts
          <xref ref-type="bibr" rid="ref4">(Chute et al., 1996)</xref>
          . For
this reason, in a previous study, we attempted to
complete the coverage of the terminology in
relation with the corpus used (Wandji Tchami and
Grabar (2014)). We computed the plural forms
of Snomed’s single word terms, and we tried to
detect misspellings of the terms by means of the
string edit distance
          <xref ref-type="bibr" rid="ref18">(Levenshtein, 1966)</xref>
          . In both
cases, the computed forms inherit the semantic
type of the terms from the Snomed. In this way,
14 035 entries were added to the terminology.
4
        </p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Method</title>
      <p>The method applied in this study aims at
describing and comparing the argument structures of
verbs in different types of subcorpora, with a
particular focus on selectional restrictions and lexical
preferences. The tools and procedures used allow
us to detect collocations and different ways of
expressing concepts and conceptual relations. In
order to achieve our aim, we follow 3 main steps:
the corpus pre-processing and annotation
(syntactic and semantic) (section 4.1), the extraction of
verbal argument structures and co-occurrence data
(section 4.2), both performed automatically and
followed by a manual analysis (section 4.3) which
aims at contrasting and interpreting the
automatically extracted data.
4.1</p>
      <sec id="sec-4-1">
        <title>Corpus pre-processing and annotation</title>
        <p>
          The subcorpora have all been downloaded from
the above-mentioned online sources, converted
into plain text and recoded in UTF-8 format. The
syntactic analysis of sentences is performed with
the Cordial dependency parser
          <xref ref-type="bibr" rid="ref9">(Dominique et al.,
2009)</xref>
          . Its output contains sentences in a tabulated
format similar to the CONLL format
          <xref ref-type="bibr" rid="ref16 ref2 ref30">(Buchholz
and Marsi, 2006)</xref>
          . In this format, a sentence
consists of one or more tokens, each one annotated
with thirteen fields, separated by a tab character.
Among these fields, the syntactic function and the
pivot verb are the main information that allow us
to extract the verbs and their arguments.
        </p>
        <p>The syntactically annotated sentences are then
processed with Perl programs that perform the
semantic annotation by projecting the resource
described in Section 3.2 onto the lemmatised
sentences. The categories of the terminology add
semantic information to the syntactic patterns of
verbs. Hence, at the end of this stage, each
verb argument appearing in the terminology is
labeled with a semantic category, in addition to its
syntactic function; such pair constitutes what we
call a specialised configuration or frame while a
pair whose argument has no Snomed categories is
considered as a non specialised configuration or
frame.
4.2</p>
      </sec>
      <sec id="sec-4-2">
        <title>Extraction of verbal argument structures and of verb+noun co-occurrence</title>
        <p>The sets of sentences annotated at the
previous step are processed with Perl programs
that extract argument structures involving the
Snomed categories of terms, when provided by
Snomed, as in Table 2 (V+Su/Scat+DO/Scat,
V+Su/Scat+DO/Scat+IO/Scat) and pairs of
V+Su/Scat, V+DO/Scat and V+IO/Scat4.</p>
        <p>For each verb, the most frequent cooccurring
objects are automatically extracted and their
corresponding frequencies are computed from all
subcorpora. Indeed, in 5.1 and 5.2, we focus
particularly on direct objects, except with the verb
exposer for which we have considered the subject
(patientS+exposer) and the indirect object
(exposer un risque) (Table 2).</p>
        <p>For a given verb A, after extracting its most
frequent objects from the corpora, we automatically
extract further verbs that frequently combine with
A’s objects, most particularly those which are
semantically close to A, and we compute the
frequency of all verb+Object pairs (see Tables 3 and
4). These data function as indicators of the
phenomena observed on the medical language of
experts and non experts. Indeed, this experiment
4V=verb, Su=sujet, DO, direct Object, IO=indirect Object
Scat=Snomed category
helps to identify semantic groups of verbs
expressing similar concepts and conceptual relations
between the verb arguments.</p>
        <p>After processing all the verbs found in the
different subcorpora, 11 verbs were selected for
a more detailed case study : augmenter (add),
e´valuer (evaluate), exposer (expose), subir
(undergo), prescrire (prescribe), provoquer (provoke),
accompagner (accompany), suivre (follow), causer
(cause), baisser (lower), and entraˆıner (lead to).
These verbs were selected according to two main
criteria:</p>
        <p>Frequency: the verbs should have at least 20
occurrences each, in at least two of the
subcorpora;
Types of verbs: we tried to choose not only
verbs that intuitively tend to have specialised
usages in specialised domain texts, but also
general language verbs like accompagner, baisser,
suivre etc.; The tendency to co-occur frequently
with particular terms was also taken into
consideration, since we focus on lexical preference
and collocation.
4.3</p>
      </sec>
      <sec id="sec-4-3">
        <title>Comparative analysis of verbal behaviors</title>
        <p>The comparative analysis is done manually and
aims at highlighting the differences and
similarities of the subcorpora with regard to selectional
restrictions and lexical preferences. We compare
the frequency of verbal configurations (pairs of
verb+argument or frames) across the subcorpora.
This analysis addresses different aspects : the
arguments (terms) cooccurring with verbs, the verbs
cooccurring with those arguments, the different
frames verbs frequently appear in, and argument
structures expressing similar conceptual relations.
The results are discussed in Section 5.1.
5
5.1</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results and Discussion</title>
      <sec id="sec-5-1">
        <title>Terms cooccurring with verbs</title>
        <p>The data provided in Table 2 lead to several
observations. Some verbs frequently select terms
from a particular Snomed category, mostly
specific terms, in a particular subcorpus, while in
the other subcorpora this co-occurrence never
happens or only happens scarcely. This phenomenon
is particularly striking with verbs like prescire and
subir. In the forum and sometimes in the lay
subcorpus, these verbs frequently combine with
Verbs
prescrire
subir
augmenter
baisser
exposer
suivre
e´valuer
terms belonging to category P (procedures); more
specifically, prescrire seems to have an attraction
for the terms traitement and examen, while subir
has a strong attraction for intervention and
ablation (which refers to a type of medical intervention
(hyponym)). Prescrire also combines frequently
with names of chemical products (C) and shows
a particular attraction for the term me´dicament,
while subir prefers terms referring to disorders
and diseases (D), and more precisely the term
AVC (stroke). These are preferred co-occurrences
which are therefore seen as collocations.</p>
        <p>Such collocations may involve polysemous
verbs and their different readings. For
example, in the expert subcorpus (and sometimes in
the student subcorpus), e´valuer and suivre tend
to appear frequently with terms referring to
functions of the organism (F ) or to Social status (S).
E´valuer seems to be attracted by risque, indication
and patient. E´valuer+F means to mesure,
determine, calculate, gauge, quantify, while e´valuer+S
means to examine.</p>
        <p>The differences in verb/arg pair frequencies can
lead to different interpretations. First of all, when
the frequency difference is very important from
the forum subcorpus to the expert subcorpus, this
may signal some specificities of the laypersons’
language. Indeed, while health care specialists
share foundational domain knowledge based on
formal education and professional experience, the
patients’ or non experts’ medical language is
characterised by the use of common expressions and
collocations, sometimes involving technical
medical terms (prescrire un me´dicament, subir une
ablation, subir un AVC, suivre un re´gime) borrowed
from the medical experts’ language. According to
researchers in Consumer Health Literature, such
mixed phraseology is the result of social and
cultural influence on language and they are acquired
from formal and informal sources such as the
internet (Zeng-Treiler et al. (2006), Zeng-Treiler and
Tse (2006)). The frequent use of these expressions
makes them progressively become part of
everyday language. This could be a plausible
explanation for the high frequency of expressions like
prescrire un me´dicament, subir une ablation or subir
un AVC, in the forum texts.</p>
        <p>Secondly, looking at the results from the expert
subcorpus to the forum subcorpus, we notice that
sometimes the frequency difference is not very
important. The explanation given above could once
more apply here. Indeed, medical technical terms
are quite often used by non-experts to describe
medical concepts. On the other hand, when a
verbal combination involving a particular Snomed
category is very frequent in the expert subcorpus
like exposer + name of a medication (votre
patiente est expose´e au ramipril), e´valuer + function
(e´valuer un risque) while the verb is totally
absent or very rare in the other subcorpora, we might
deal with a highly specialised (expert) or expert
language-specific usage of the verb.
5.2</p>
      </sec>
      <sec id="sec-5-2">
        <title>Lexical preferences of the arguments for verbs</title>
        <p>The results of Section 5.1 give an account of the
lexical preferences of the verbs within and across
the subcorpora. In this section, we investigate the
lexical preferences of nominals in the expert and
forum subcorpora. Tables 3 and 4 give the results
of this experiment. These data were obtained as
described in Section 4.2. The blue color
represents the processed verb, the entries in the
column Arguments are the most frequent arguments
of the processed verb, and the red color represents
a semantic group of verbs frequently combining
with the corresponding argument in the given
corpus. The numbers in bracket show the frequency
of each pair verb+arg.</p>
        <p>Depending on the corpus, certain terms
frequently combine with particular verbs, in order
to express a particular concept. For instance, as
we can see in Table 2, the terms me´dicament and
traitement are prescrire’s favourite cooccurrents
Arguments Verbal cooccurrents</p>
        <p>Expert
me´dicament indiquer(3), recommander(2)</p>
        <p>proposer(2)
traitement proposer(8), envisager(7)</p>
        <p>recommander(3), imposer(3)
examen imposer(1), proposer(1)</p>
        <p>recommander(1), autoriser(1)
intervention
ablation faire(1)
AVC prsenter(4), faire(2), avoir(2)
tension
re´gime
conseil considrer(1)
traitement recevoir(12), be´ne´ficier(6)</p>
        <p>faire(6), poursuivre(3),
tension
Forum
prescrire
subir
baisser
suivre
augmenter
in the forum and sometimes in the lay subcorpus,
while in the expert subcorpus, the terms frequently
co-occur with the verbs indiquer, recommander,
proposer, and envisager, recommander, proposer,
imposer, respectively.
1) Ces me´dicaments ne sont plus recommande´s
en premie`re intention dans le traitement de
l’hypertension (These drugs are no longer
recommended as first-line in the treatment of
hypertension)</p>
        <p>Although the two groups of verbs combine with
the same terms, in the professionnal language,
these verbs are not semantically equivalent, they
correspond to different levels of evidence.
Indeed, they are used by medical experts to express
the relevance of prescribing a given drug or
treatment for a given disease. In contrast, patients just
know about the drug or treatment they have been
prescribed for their disease but do not
necessarily know about these distinctions. These examples
highlight a very relevant difference in the way
experts and non-experts use verbal configurations :
the first choose very specific and technical
configurations while the others use more general ones.</p>
        <p>In the expert subcorpus, several sentences are
in the passive voice with an omitted agent, as in
Example 1. This applies to some of the
abovementioned verbs and is quite reccurrent with other
verbs.</p>
        <p>The lexical choice difference within subcorpora
does not only concern terms. Verbs also select
particular terms to combine with, depending on
the subcorpora. For example, in the forum
subcorpus, the verb suivre frequently co-occurs with
the term conseil, while in the expert subcorpus,
the term conseil does not combine with this verb.
Instead, suivre combines with indication. The
latter and mainly the term recommandation, which
are semantically close to conseil, are very
frequent in the expert subcorpus. They appear in
positions where conseil could appear. For
example, recommandation is combined with verbs like
proposer (4), appliquer (4), actualiser (8), publier
(4), e´laborer (2) and faire (3). This seems to show
that the experts prefer to talk about
recommandations and indications which have specific and
technical meanings, while laypersons are more
familiar with the term conseil which is a common word.</p>
        <p>Another observation was made based on the
experiment carried out. In the forum subcorpus
baisser and augmenter frequently co-occur with
the term tension (augmenter la tension (increase
blood pressure), baisser la tension (reduce blood
pressure) (see Table 2)), expressing different states
of the blood pressure. In the expert subcorpus,
none of these collocations were found. In
addition, among the verbs combining with tension
in the expert subcorpus, none is semantically
related to the two verbs. However, we have
noticed the presence of verb based nominalisations,
constructions requiring support verbs or relational
adjectives, which are synonymous with the two
above-mentioned collocations : e´le´vation
tensionnelle (4), and hausse de tension (1) correspond
to augmenter la tension, while re´duction
tensionnelle (2), abaissement tensionnel (2) and baisse de
tension (4) have the same meaning as baisser la
tension.</p>
        <p>This phenomenon is consistent with the results
obtained in a previous study (Wandji Tchami and
Grabar (2014)) and with Condamines and
Bourigault (1999)’s findings which confirmed the fact
that nominal entities tend to be more frequent in
expert texts than in non-expert texts. The above
data demonstrate that the difference between the
expert and forum texts does not lie in verbs alone,
but mostly in the different types of constructions
the verbs are involved in (support verb,
paraphrase, verb-based nominalisation, etc.).
5.3</p>
      </sec>
      <sec id="sec-5-3">
        <title>Verbal frames and conceptual relations</title>
        <p>Table 5 shows frames which represent different
ways of expressing the cause-effect conceptual
relation. The data were extracted from the
subcorpora, through the analysis of frames of
accompagner, causer, provoquer, and entraˆıner which are
causative verbs. We are aware of the fact that
some of the numbers presented in this table are
not high enough to draw conclusions. However,
we found it important to report them because they
might highlight phenomena that could be further
analysed in future work, with more data.
verbs
frames
C s D do
C s F do
D s D io
C s D do
D s F do
F s F do
F s D do
F s P do
P s F io
P s D do
P s F do
P s P do
F s F io
accompagner
pro for
1 0
1 0
5 3
3 0
5 1
4 3
1 0
0 2
5 0
2 0
0 0
0 1
6 3</p>
        <p>Many frames were identified, Table 5 shows
the most frequent ones which are : F D, P F ,
F F , P D, D F , F P , F D, P P , C D, D D.
These frames are all found in the four subcorpora
but they tend to choose specific verbs depending
on the subcorpus. The difference mostly lies on
the lexical level with the choice of verbs. In the
above-mentioned frames, the left side semantic
class provokes or entails an effect or consequence
that is expressed by the right side category. Let us
take for example the relation Functions-Functions
(F F ), where a function of the organism has an
effect on another function of the organism.
2) Exp: la prise de poidsF s’accompagne d’une
e´le´vation de la pression arte´rielleF (weight gain
is followed by a rise in blood pressureF )
3) F or: une diaphore`seF intense accompagne
souvent la douleurF (the painF is often followed
by an intense diaphoresisF )
4) F or: le stressF
spasmes vasculairesF
vascular spasmsF )
provoque
(stressF</p>
        <p>des
causes</p>
        <p>As we can see from the data provided in Table 5,
in the expert subcorpus, this conceptual relation is
frequently expressed with the verbs accompagner
and entraˆıner while in the forum texts the verbs
provoquer and causer are the most used. This
remark also applies for the other above-mentioned
frames. Collocational differences between expert
and forum verb use also involve differences in
valency and syntactic construction. In Example 2,
the verb accompagner is in a pronominal form
with a reflexive pronoun se/s’; this construction is
the most used one in the expert subcorpus, and in
the table, it is represented by the presence of the
indirect object in the frame.</p>
        <p>Another tendency observed in the expert
subcorpus is the frequent use of the passive voice with
a syntactically omitted agent, while in the forum
subcorpus, the active voice is the most used. This
observation was already underlined in Section 5.2
with recommander, indiquer and proposer.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Conclusion and Perspectives</title>
      <p>In this study, we have proposed a method for the
comparative analysis of verbal argument
structures in medical subcorpora whose authors and
intended readership have different levels of
expertise, with a focus on lexical preference. The main
difference observed is that medical experts tend
to choose verbal configurations with very specific
and technical meanings which apply to specific
situations, while non-experts use more generic and
common verbal configurations. Lexical choice
differences often come with differences in the
syntactic constructions used. Indeed, medical expert
writings are characterized by the frequent use of
a passive form with an omitted agent. The
analysis of the two intermediary subcorpora shows that
the expert and student subcorpora are close to each
other while the lay subcorpus is close to the
forum. As far as the method is concerned, the use of
a dependency parser seems to improve the results.
However, a detailed evaluation of the parsing
quality is still to be done. We are also planning to carry
out the analysis exemplified here on more verbs.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <given-names>Ted</given-names>
            <surname>Briscoe and John Carroll</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Automatic extraction of subcategorization from corpora</article-title>
          .
          <source>In In Proceedings of the ACL</source>
          , pages
          <fpage>356</fpage>
          -
          <lpage>363</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <string-name>
            <given-names>Sabine</given-names>
            <surname>Buchholz</surname>
          </string-name>
          and
          <string-name>
            <given-names>Erwin</given-names>
            <surname>Marsi</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Conllxshared task on multilingual dependency parsing</article-title>
          .
          <source>In In Proc. of CoNLL</source>
          , pages
          <fpage>149</fpage>
          -
          <lpage>164</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <string-name>
            <given-names>Jolanta</given-names>
            <surname>Chmielik</surname>
          </string-name>
          and
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Grabar</surname>
          </string-name>
          .
          <year>2011</year>
          . De´tection de la spe´
          <article-title>cialisation scientifique et technique des documents biome´dicaux graˆce aux informations morphologiques</article-title>
          .
          <source>TAL</source>
          ,
          <volume>51</volume>
          (
          <issue>2</issue>
          ):
          <fpage>151</fpage>
          -
          <lpage>179</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <string-name>
            <given-names>Christopher G.</given-names>
            <surname>Chute</surname>
          </string-name>
          , SP Cohn, KE Campbell, DE Oliver, and JR Campbell.
          <year>1996</year>
          .
          <article-title>The content coverage of clinical classifications. for the computer-based patient record institute's work group on codes &amp; structures</article-title>
          .
          <source>J Am Med Inform Assoc</source>
          ,
          <volume>3</volume>
          (
          <issue>3</issue>
          ):
          <fpage>224</fpage>
          -
          <lpage>33</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <string-name>
            <given-names>Anne</given-names>
            <surname>Condamines</surname>
          </string-name>
          and
          <string-name>
            <given-names>Didier</given-names>
            <surname>Bourigault</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>Alternance nom/verbe : explorations en corpus spe´cialise´s</article-title>
          . In Cahiers de l'Elsap, pages
          <fpage>41</fpage>
          -
          <lpage>48</lpage>
          , Caen, France.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <string-name>
            <given-names>Ruxandra</given-names>
            <surname>Cosma and Stefan Engelberg</surname>
          </string-name>
          ,
          <year>2013</year>
          .
          <article-title>Subjektsa¨tze als alternative Valenzen im Deutschen und Ruma¨nischen.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <string-name>
            <surname>Roger A</surname>
          </string-name>
          .
          <source>Coˆte´</source>
          ,
          <year>1996</year>
          . Re´
          <article-title>pertoire d'anatomopathologie de la SNOMED internationale</article-title>
          ,
          <year>v3</year>
          .
          <fpage>4</fpage>
          . Universite´ de Sherbrooke, Sherbrooke, Que´bec.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>Louise Dele´ger and Pierre Zweigenbaum</source>
          .
          <year>2008</year>
          .
          <article-title>Paraphrase acquisition from comparable medical corpora of specialized and lay texts</article-title>
          .
          <source>In AMIA 2008</source>
          , pages
          <fpage>146</fpage>
          -
          <lpage>50</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Dominique</surname>
          </string-name>
          , Sophie Ne`gre, and Patrick Se´gue´la.
          <year>2009</year>
          . L'
          <article-title>analyseur syntaxique Cordial dans Passage</article-title>
          . Actes de TALN,
          <volume>9</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Grabar</surname>
          </string-name>
          and
          <string-name>
            <given-names>Thierry</given-names>
            <surname>Hamon</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Automatic extraction of layman names for technical medical terms</article-title>
          .
          <source>In ICHI</source>
          <year>2014</year>
          , Pavia, Italy.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Gries</surname>
          </string-name>
          and
          <string-name>
            <given-names>Anatol</given-names>
            <surname>Stefanowitsch</surname>
          </string-name>
          .
          <year>2004</year>
          .
          <article-title>Extending collostructional analysis. a corpus-based perspective on ”alternation”</article-title>
          .
          <source>IJCL</source>
          ,
          <volume>9</volume>
          (
          <issue>1</issue>
          ):
          <fpage>97</fpage>
          -
          <lpage>129</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <string-name>
            <given-names>Gerhard</given-names>
            <surname>Helbig</surname>
          </string-name>
          .
          <year>1985</year>
          .
          <article-title>Valenz und kommunikation (ein wort zur diskussion)</article-title>
          .
          <source>Deutsch als Fremdsprache</source>
          ,
          <volume>22</volume>
          :
          <fpage>153</fpage>
          -
          <lpage>156</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <string-name>
            <given-names>Regina</given-names>
            <surname>Jucks</surname>
          </string-name>
          and
          <string-name>
            <given-names>R.</given-names>
            <surname>Bromme</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>Choice of words in doctor-patient communication: an analysis of health-related internet sites</article-title>
          .
          <source>Health Commun</source>
          ,
          <volume>21</volume>
          (
          <issue>3</issue>
          ):
          <fpage>267</fpage>
          -
          <lpage>77</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          <string-name>
            <given-names>Hadi</given-names>
            <surname>Kharrazi</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Improving healthy behaviors in type 1 diabetic patients by interactive frameworks</article-title>
          .
          <source>In AMIA</source>
          , pages
          <fpage>322</fpage>
          -
          <lpage>326</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          <string-name>
            <given-names>Reinhard</given-names>
            <surname>Ko</surname>
          </string-name>
          ¨hler.
          <year>2005</year>
          .
          <article-title>Quantitative untersuchungen zur valenz deutscher verben</article-title>
          .
          <source>Glotometrics</source>
          ,
          <volume>9</volume>
          :
          <fpage>13</fpage>
          -
          <lpage>20</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <string-name>
            <given-names>Dimitrios</given-names>
            <surname>Kokkinakis</surname>
          </string-name>
          and
          <string-name>
            <given-names>M Toporowska</given-names>
            <surname>Gronostaj</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Comparing lay and professional language in cardiovascular disorders corpora</article-title>
          . In James Cook University Pham T., editor,
          <source>WSEAS Transactions on Biology and Biomedicine</source>
          , pages
          <fpage>429</fpage>
          -
          <lpage>437</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          <string-name>
            <given-names>Anna</given-names>
            <surname>Korhonen</surname>
          </string-name>
          , Yuval Krymolowski, and
          <string-name>
            <given-names>Nigel</given-names>
            <surname>Collier</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>The choice of features for classification of verbs in biomedical texts</article-title>
          .
          <source>In Proc. of COLING</source>
          , pages
          <fpage>449</fpage>
          -
          <lpage>456</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          <string-name>
            <given-names>Vladimir</given-names>
            <surname>Iosifovich Levenshtein</surname>
          </string-name>
          .
          <year>1966</year>
          .
          <article-title>Binary codes capable of correcting deletions, insertions and reversals</article-title>
          .
          <source>Soviet physics. Doklady</source>
          ,
          <volume>707</volume>
          (
          <issue>10</issue>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          <string-name>
            <given-names>Alexa</given-names>
            <surname>McCray</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Promoting health literacy</article-title>
          .
          <source>J of Am Med Infor Ass</source>
          ,
          <volume>12</volume>
          :
          <fpage>152</fpage>
          -
          <lpage>163</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <article-title>Ce´dric Messiant, Kata Ga´bor, and</article-title>
          <string-name>
            <given-names>Thierry</given-names>
            <surname>Poibeau</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Acquisition de connaissances lexicales a` partir de corpus: la sous-cate´gorisation verbale en franc¸ais</article-title>
          . TAL,
          <volume>51</volume>
          (
          <issue>1</issue>
          ):
          <fpage>65</fpage>
          -
          <lpage>96</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <string-name>
            <given-names>Jennifer</given-names>
            <surname>Pearson</surname>
          </string-name>
          .
          <year>1998</year>
          . Terms in Context. John Benjamins, Amsterdam/Philadelphia.
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          <string-name>
            <given-names>Judita</given-names>
            <surname>Preiss</surname>
          </string-name>
          , Ted Briscoe, and
          <string-name>
            <given-names>Anna</given-names>
            <surname>Korhonen</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>A system for large-scale acquisition of verbal, nominal and adjectival subcategorization frames from corpora</article-title>
          .
          <source>In Proceedings of ACL</source>
          , volume
          <volume>45</volume>
          , page 912.
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          <string-name>
            <given-names>Magdalena</given-names>
            <surname>Putz</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>Approaching linguistic complexity in medical care</article-title>
          .
          <source>International Journal of Anthropology</source>
          ,
          <volume>23</volume>
          (
          <issue>3-4</issue>
          ):
          <fpage>275</fpage>
          -
          <lpage>284</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Roland</surname>
          </string-name>
          and
          <string-name>
            <given-names>Daniel</given-names>
            <surname>Jurafsky</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>How verb subcategorization frequencies are affected by corpus choice</article-title>
          .
          <source>In Proceedings of ACL</source>
          , Montreal, Quebec, Canada.
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          <string-name>
            <surname>Schulte im Walde</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Experiments on the automatic induction of german semantic verb classes</article-title>
          .
          <source>Technical report</source>
          , Universita¨t Stuttgart.
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          <string-name>
            <given-names>Catherine</given-names>
            <surname>Smith</surname>
          </string-name>
          and
          <string-name>
            <given-names>PJ</given-names>
            <surname>Wicks</surname>
          </string-name>
          .
          <year>2008</year>
          .
          <article-title>PatientsLikeMe: Consumer health vocabulary as a folksonomy</article-title>
          .
          <source>In Proceedings of the AMIA 2008 Symposium</source>
          , pages
          <fpage>682</fpage>
          -
          <lpage>686</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          <string-name>
            <given-names>Thi</given-names>
            <surname>Mai Tran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H</given-names>
            <surname>Chekroud</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P</given-names>
            <surname>Thiery</surname>
          </string-name>
          , and
          <string-name>
            <given-names>A</given-names>
            <surname>Julienne</surname>
          </string-name>
          .
          <year>2009</year>
          .
          <article-title>Internet et soins : un tiers invisible dans la relation me´decine/patient ?</article-title>
          <source>Ethica Clinica</source>
          ,
          <volume>53</volume>
          :
          <fpage>34</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          <string-name>
            <given-names>Ornella</given-names>
            <surname>Wandji</surname>
          </string-name>
          Tchami and
          <string-name>
            <given-names>Natalia</given-names>
            <surname>Grabar</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Towards automatic distinction between specialized and non-specialized occurrences of verbs in medical corpora</article-title>
          .
          <source>In Proceedings of Computerm</source>
          , pages
          <fpage>114</fpage>
          -
          <lpage>124</lpage>
          , Dublin, Ireland,
          <year>August</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          <string-name>
            <given-names>Ornella</given-names>
            <surname>Wandji</surname>
          </string-name>
          <string-name>
            <given-names>Tchami</given-names>
            ,
            <surname>MC L'Homme</surname>
          </string-name>
          ,
          <string-name>
            <given-names>and Natalia</given-names>
            <surname>Grabar</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Discovering semantic frames for a contrastive study of verbs in medical corpora</article-title>
          .
          <source>In TIA, Villetaneuse.</source>
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <string-name>
            <given-names>Qing</given-names>
            <surname>Zeng-Treiler</surname>
          </string-name>
          and
          <string-name>
            <given-names>T</given-names>
            <surname>Tse</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Exploring and developing consumer health vocabularies</article-title>
          .
          <source>JAMIA</source>
          ,
          <volume>13</volume>
          :
          <fpage>24</fpage>
          -
          <lpage>29</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          <string-name>
            <given-names>Qing</given-names>
            <surname>Zeng-Treiler</surname>
          </string-name>
          , Tony Tse, Guy Divita, Alla Keselman, Jon Crowell, and Allen C Browne.
          <year>2006</year>
          .
          <article-title>Exploring lexical forms: first-generation consumer health vocabularies</article-title>
          .
          <source>In AMIA 2006</source>
          , pages
          <fpage>1155</fpage>
          -
          <lpage>1155</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>