<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>“Confortation”: about a new qualitative category for analyzing biomedical texts</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Delphine Battistelli</string-name>
          <email>delphine.battistelli@paris-sorbonne.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Antonietta Folino</string-name>
          <email>antonietta.folino@unical.it</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Patricia Geretto</string-name>
          <email>pgeretto@hotmail.fr</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Ludivine Kuznik</string-name>
          <email>ludivine.kuznik@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jean-Luc Minel</string-name>
          <email>jean-luc.minel@u-paris10.fr</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Florence Amardeilh</string-name>
          <email>florence.amardeilh@mondeca.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>MoDyCo-UMR 7114, Université Paris X -CNRS</institution>
          ,
          <addr-line>F92001 Nanterre</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Université Paris X</institution>
          ,
          <addr-line>F92001 Nanterre</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In this paper we present a new approach to the expression of certainty and uncertainty in scientific experimental articles. This will permit to ascertain the validity of knowledge extracted from biological literature and used to automatically populate a domain ontology. We argue that lexical terms such as show, find, observe… express a semantic category different from the one characterized by markers such as demonstrate, validate, support… We name the latter category “confortation” as it conveys a notion of strengthening and we propose five other semantic categories: lack of knowledge, objects of study, hypothesis, observations, and general knowledge. This last category and the linguistic phenomenon of reported speech are respectively examined as consensual truth and as knowledge reported from identified scientific sources.</p>
      </abstract>
      <kwd-group>
        <kwd>epistemic modality</kwd>
        <kwd>semantic annotation</kwd>
        <kwd>certainty concept</kwd>
        <kwd>uncertainty concept</kwd>
        <kwd>biomedical articles</kwd>
        <kwd>scientific discourse</kwd>
        <kwd>“confortation”</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>Biologists can no longer keep abreast of developments in miRNA research, making
it impossible for them to identify and monitor information.</p>
      <p>
        The difficulty in finding relevant information because of the large amount of
available articles is also described in [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. There the role of semantic annotation is to
highlight key concepts and to facilitate skimming documents, in order to evaluate
whether it is worth reading them fully.
      </p>
      <p>
        In our context, the aim of semantic annotation consists in designing a tool to
support biologists in their daily tasks for finding relevant textual parts of scientific
articles. For that reason a collaboration between biologists (from the Pasteur Institute
in Montevideo) and computer scientists and linguists (from MoDyCo) has been set
up. An initial domain ontology about miRNAs was modeled by taking as input
various existing Web resources (such as the databases, the Sequence Ontology…) and
a set of expert interviews with the biologists. An information extraction tool aiming at
automatically populating this miRNA ontology was configured to identify and
semantically annotate sentences which contain the name of a miRNA and at least a
Gene or a Mutation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. The biologists pointed out however that this tool was
insufficient to ascertain the validity of the new knowledge
      </p>
      <p>
        One difficulty in the search3 for useful knowledge is that information is scattered
throughout the text. In fact, as explained in [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], temporal, modal and/or enunciative
features indicative of authorial commitment to the identified information in biological
text need to be explicitly annotated and included in the knowledge base. Besides, the
issue of qualifying the epistemic nature of the extracted knowledge concerns various
domains, not only the biomedical one. It can be seen as related to the trustworthiness
and the confidence given to any information found on the Web by humans. Within the
Semantic Web community, content-based trust mechanisms still need to be addressed
[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. The categories presented in this paper are mainly discursive categories resulting
from scientific discourse analysis in the field of biology. They rely on the more
general linguistic (grammatical) categories of enunciation, temporality and modality
described and modelled in an ontology [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], meaning that the discursive categories
refer to markers from these linguistic categories and thus demonstrate their close
interaction.
      </p>
      <p>In this context we4 worked on the concepts of certainty and uncertainty. Our
analysis was performed on three corpora containing respectively ten, twenty and
thirteen scientific papers about miRNAs, coming from the specialized database
PUBMED. Each of them allowed us to study the biomedical domain, the linguistic
phenomenon and to perform manual annotations. From the linguistic analysis, the
concept of consolidation emerged and led us to define the articulation of different
semantic categories. We then performed a quantitative analysis and experimented an
automatic annotation using the GATE platform. We here present the linguistic
analysis and the results of the limited but validating evaluation of the semantic
categories. Most of the papers in our corpora describe the experiments performed by
the authors, others simply present a state of the art in the domain or about a particular
3 Biologists generally search for information in databases such as PUBMED and MEDLINE
4 Most of this study was carried out as a Master’s Project (DEFI Université Paris X) by A.</p>
      <p>Folino, P. Geretto, L. Kuznik and M. Younes-Michiels.
phenomenon while others concern only methods. Since the analysis of the third group
would require a specific study of their role in the scientific and biomedical area, we
decided not to include them in our initial analysis.We focused on papers presenting
authors’ experiments. The objectives, means, results and phases of the research
process are explained in detail in these articles, even if differences appear in the way
they are formally organized: not all papers follow the same regular structure of
introduction, results, discussion (or results and discussion), materials and methods,
and sometimes conclusion.</p>
      <p>In the following section, we report on the linguistic phenomenon of epistemic
modality that led us to define six semantic categories that are expressed in the
experimental biomedical articles of our corpus. In section 3, we present a semantic
map organizing these categories. Finally, some perspectives for future research are
presented.</p>
    </sec>
    <sec id="sec-2">
      <title>2 State of the Art and Methodology</title>
      <sec id="sec-2-1">
        <title>2.1 Background</title>
        <p>There is an abundant literature on the use of the linguistic phenomenon of
modality, particularly epistemic modality in scientific texts.</p>
        <p>
          Our overview of this phenomenon is based on [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] which analyses the concepts of
hedging and modality in relevant previous research such as Hyland’s and
Markkanen’s.
        </p>
        <p>
          The theoretical basis for this study of epistemic modality and its use in academic
discourse is as described in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]: epistemic modality markers are “linguistic items that
explicitly qualify the truth value of a proposition”. As for the classification of
markers, we will specify the differences and similarities between existing approaches
and ours in this section. Concerning the importance of interpreting modality in
biomedical texts, [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] affirm that: “detecting uncertain and negative assertions is
essential in most Text Mining tasks, where in general the aim is to derive factual
knowledge for textual data. [...] these language forms [...] are intended to express
impressions, hypothesised explanations of experimental results or negative findings”.
        </p>
        <p>
          Most studies aim at the automatic recognition and extraction of modal expressions
and their classification according to particular classes. In [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] the authors propose to
classify such expressions according to the type of information they convey: level of
certainty, indicating the degree of certainty expressed by the author and including
classes such as absolute, high, medium and low; point of view, which distinguishes
between the author’s and others’ ideas; knowledge type, which distinguishes
speculations and statements based on experimental evidence, by means of the
following classes: speculative, deductive, demonstrative and sensory. The
demonstrative class contains verbs such as demonstrate, find, show, confirm, etc. As
for epistemic modality, [
          <xref ref-type="bibr" rid="ref8">8</xref>
          ] identify an axis whose extremes are truly factual and
counterfactual events. Between the two extremes there are different modal types:
degrees of possibility, belief, evidentiality, expectation, attempting and command. In
[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] the term unhedgers is introduced for verbs such as demonstrate, show, prove, etc.
which are considered as conveying a strong degree of certainty in positive sentences
and of hedging in negative ones.
        </p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2 Linguistic Analysis</title>
        <p>The analysis of uncertainty in our corpus has required handling the issue of the
degree of truth value in certainty. We consider that lexical terms such as demonstrate,
validate, support 5… convey a different meaning from show, find, result…, hence it
was necessary to separate them into two distinct semantic categories. The first one
conveys a notion of data strengthening, and the second one conveys a notion of data
observation. Since this distinction is one that has not been made by previous linguistic
studies, we have to study the concepts of certainty and uncertainty from a new point
of view. Our categories are linguistically marked6 (even if some ambiguities still
remain) and semantically linked to different states of scientific knowledge. In order to
reduce the lack of knowledge and increase general knowledge, members of the
scientific community perform series of experiments from which they draw results that
can sometimes confirm one another. Therefore, the notions we propose are:
observations, “consolidation”7, objects of study, hypothesis, lack of knowledge and
general knowledge. We have also analyzed reported speech, which introduces other
authors’ knowledge into the articles from referenced sources.</p>
        <p>
          Observations. We have detected within our corpus: reveal, show, find, report,
result, observe, determine, that are considered as suggesting a high degree of
certainty in the above-mentioned studies. Indeed, it is argued in [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] and in
several other studies that statements without certainty markers (such as
certainly…) are more assertive than the same ones containing them. When a
writer wants to place the statement beyond doubt he puts a marker. An epistemic
marker tends therefore to question the truth-value of the expressed content. But the
markers listed above do not convey a weight either of doubt or of certainty. The
sentences in which they appear refer to observed phenomena from experiments. The
fact expressed with or without any term of the list belongs to direct observation:
hencec it is equally as certain, whether introduced by markers or not. Consequently, it
is not possible to decide if the author wishes to nuance the truth value of the
statement. In our corpus, these markers appear to be simply a stylistic device used to
vary the way of expressing facts. Detecting the sentences marked with this kind of
specific marker gives a partial view of what is truth-valued. However, the detection of
5 For ease of reading, the simple forms of the lexical markers are listed.
6 Table 1 (cf. Appendix) shows the markers found in our corpus.
7 Consolidation is the translation of the French neologism “Confortation”: we think that there is
no strictly equivalent term in English, and to our knowledge this semantic notion is
introduced here for the first time.
sentences containing these markers is difficult because of the absence of any explicit
markers. Hence, not all sentences without markers can be categorized as
“observations” and this leads to ambiguity Although it is considered useful to detect
these statements, a point confirmed by [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], it will be necessary to further find out a
possible way of extracting them.
“Confortation”. Terms such as demonstrate, prove, confirm… are present in
biomedical texts. These markers express a heightened degree of confidence in the
biological statements that are made. Given their role in strengthening the claim, we
have designated them by the term “consolidation”. The markers of observations are
related to experiments, whereas these are related to an idea of validation or
confirmation. None of the previously mentioned approaches have attempted this
distinction between the two categories. [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ] gives the same level of certainty to both.
[
          <xref ref-type="bibr" rid="ref7">7</xref>
          ] attributes a level of demonstration to what we name “consolidation” markers but
includes reveal, show or find at that level. “Consolidation” clearly marks
observations (“confirming our results”) as well as uncertainty-marked sentences (“is
consistent with the hypothesis”). “Consolidation” can be internal or external,
marking the authors’ or others’ data.
        </p>
        <p>
          Objects of Studies and Hypothesis. [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] includes whether within the list of lexical
uncertainty markers. This word is present in our corpus and expresses the
undetermined truth value of a fact. Authors use it to present the questions which they
or others set out to answer. It is clearly related to the scope of the investigation: “To
address whether …, we first tested …”. We have detected that the main ways of
expressing the objects of the experiments in our corpus are whether and if. However
another syntactic construction conveys the same meaning: ”to verify that …, we
cloned …”. Further investigation is necessary. Nevertheless, this uncertainty differs
semantically from that conveyed by the other uncertainty markers: suggest, believe,
hypothesis… or modal auxiliaries such as may, might, would…These markers put
forward a speculative idea derived from experiments: “these two findings suggest that
miRs should be ...”. Authors interpret and give a possible explanation for their
observation. Sometimes, observation markers such as findings in the above example
explicitly indicate that the suggestion comes from experimental results. Even in the
absence of these markers, these sentences still convey the same meaning. We
therefore propose two semantic categories, namely “objects of study”, which
represents the aim of the research reported by the authors in the article, and
“hypothesis”, which corresponds to the reasoning derived from the authors
observations during their experiments.
        </p>
        <p>
          Lack of Knowledge. Expressions such as “it is still unknown...”, “it remains
unclear...” etc. are considered as statements conveying lack of knowledge in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
These authors attribute to the sentences containing such expressions the lowest level
of certainty, i.e. complete uncertainty. Our linguistic analyses confirm that such
expressions express lack of knowledge. We do not, however, connect it to
uncertainty. The following sentence is a good example: “however, their biological
functions… remain largely undefined and experimentally untested”.
        </p>
        <p>General Knowledge. There are some sentences expressing facts, but not related to
the study performed by the author nor introduced as another author’s point of view.
We consider that such sentences convey general knowledge, a truth presented as
shared by the scientific community: “It is currently estimated that the expression of…
is…”. Some expressions like “it is well known”, “currently”… are easy to identify.
Moreover, we have observed that, even in the absence of such markers, these
sentences are characterized by the use of the present tense: ‘RNA silencing (RNAi) is a
new gene regulatory mechanism”. Nevertheless, that criterion is not sufficient to
determine unambiguously that an expression belongs to the category in question. The
following sequence illustrates the ambiguity:</p>
        <p>“Vascular endothelial growth factor A (VEGFA or VEGF) is an essential growth
and survival factor for endothelial cells. It plays a major role in physiological and
pathological angiogenesis through its ability to stimulate growth of new blood vessels
from nearby capillaries (Ferrara 2005). Through alternative splicing, the highly
conserved VEGF gene can produce various protein isoforms, with the three principal
forms consisting of 121, 165, and 189 amino acids. VEGF is translated from two start
codons, each of which is regulated by an independent IRES (Huez et al. 2001).”</p>
        <p>The first sentence of the above paragraph contains a verb in the present tense and
conveys a meaning that can be considered generic, according to the meaning we have
given to general knowledge. The second sentence, however, that presents the same
structure and expresses a meaning as general as the first one, contains a bibliographic
reference. So, it cannot be certainly affirmed that the first sentence belongs to the
dimension of general knowledge, because the authors may have reported both
sentences from the cited source, including a reference only after the second one.</p>
        <p>
          Reported Speech. Authors frequently introduce knowledge that refers to
experiments from identified research articles. In reported speech, there can be markers
of hypothesis, observation or “consolidation”: “It has been suggested that … may
involve … [30].”; “VEGF translation has been shown to be… (Akiri et al. 1998); “A
recent report posited that…[
          <xref ref-type="bibr" rid="ref5">5</xref>
          ]. That study demonstrated a region common to the
3'UTRs of...”. This phenomenon has also been categorized as “point of view” in [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ]
which points out the difficulty in determining in some contexts whether authors are
fully committed to the statements. This happens particularly with reported speech
sentences containing markers such as probably, may… In ‘the heterogeneity of
miR34a… was probably due to… as recently reported (Landgraf et al. 2007…)’, the
hypothesis marked by probably, is in reported speech. But this does not ascertain
whether the authors exactly reported the idea contained in the cited article: the
hypothesis could be their own. In the presence of an impersonal subject, as in “It is
suggested that...”, [
          <xref ref-type="bibr" rid="ref6">6</xref>
          ] finds it difficult to detect the paternity of the source and
considers that further contextual evidence is required. On the contrary, we tend to
consider that these constructions are more likely to convey the source’s point of view.
But we agree that the fidelity to the reported facts cannot be certified. In [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ], the
authors consider citations as a common practice in scientific discourse to “indicate a
network of mutually supportive or contrastive works”, and show that hedging cues are
frequent in this context where they have a rhetorical function. According to [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ],
epistemic modality markers in academic discourse are used in an “interactive” and
“persuasive” way. The reported speech is introduced by various means such as:
bibliographic references; expressions like other studies, recent report; authors’ names
followed by verbs like report; direct citations.
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3 Semantic Notions Map</title>
        <p>The articulation of these notions can be visualized with the example of one possible
(inter and intra) discursive situation, as shown in Fig. 1. Notions operate inside an
author’s study (S0 in Fig. 1) and outside it, relating it to other previous (or later)
publications (S-1, S1 in Fig. 1). Therefore, they are organized according to a temporal
publication axis. Authors express the objects of study they are experimenting on. As
explained in the introduction, experiments (the black boxes in the map) are not
developed here. Sometimes it is possible to group and compare two or more
observations (or hypotheses) from different authors. Authors, in order to support their
theories or observations, refer to other hypotheses or to observations from their own
experiments, or from others’ (arrows of reported speech in Fig.1). We can see in Fig.
1 that in study S1, author A1 refers to some observations introduced by
“consolidation” markers. Marking with “consolidation” observations that have
already been marked with “consolidation” conveys a stronger sense of validation than
marking with “consolidation” observations that are not already marked by
“consolidation”. Graphically, this is shown by the different dimensions of the balance.
The more observations or hypotheses are compared, the more “consolidation” of
observations (or hypotheses) becomes important and acquires weight (balance in
Fig.1). Authors can also report parts of general knowledge, which is represented as a
cloud-form at the bottom right of the map. Each of these semantic categories qualifies
the truth value of a proposition differently. Consolidation markers can give a truth
value weight to observations as well as to hypotheses, to the authors' as well as to
others' data, and to any other consolidation act, thus making it tricky to estimate the
qualification of this sort of truth value weight as a validation. Moreover, we think that
the final validation of the information and considerations given in each article
depends on extra data such as the authors' and laboratories' fame, and the knowledge
of the facts described. Final validation rests, in the end, with those who can appreciate
the weight of the hypotheses, observations, and consolidations given in the articles..
In the use of a semantic tool processing, we estimate at this stage of our analysis, that
the final validation would be given by the curators.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3 Towards automatic semantic annotation of texts</title>
      <p>Our automatic annotation tool is still in progress. We have already partially used
some categories as keys for extraction patterns in order to automatically annotate texts
within the platform GATE. The main limitation of our present approach concerns
difficulties in detecting the right scope for each annotation: for sentences with show,
detection of the proposition is needed; for reported speech, the whole sentence is
necessary; in some cases such as for example anaphoric expressions in which this or
these is used, the context before the sentence is essential. We give an example (Fig. 2)
of annotations using our semantic categories. We are engaged in the process of testing
these lists and enhancing them. However, the difficulty pointed out in section 2
(reported speech) will be to determine whether a hypothesis has been formulated by
the cited author or whether on the contrary, it deals with an idea of the citing author,
in which case it would therefore be a reported hypothesis. The table 1 illustrates the
classification of markers.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Conclusion and Perspectives</title>
      <p>In biomedical articles, authors give their interpretation of all pieces of knowledge,
assigning a truth value weight to each item. In reported speech, we are confronted
with the question of determining whom the opinion belongs to. However, the whole
article has finally a direction given by its authors. “Consolidation” introduces the
notion of reinforcement given to the observations or hypotheses made on biological
facts. Deriving from that, the final weight of truth value the authors attribute to each
result or speculation is under question. We have observed that “consolidation” can
mark the observations and the hypothesis of the authors’ own study. This can
therefore change the truth value weight assigned to results in the course of their
explanations in the paper.</p>
      <p>
        How can the final truth value weight of observations, hypotheses, objects of study,
or lack of knowledge be determined? Do the authors alter their initial observations?
Do they reduce the lack of knowledge? Do they transform their hypothesis into
observed results? Do they answer their objects of study? These points need further
investigation. We consider that different states of knowledge can be present in
research articles: biological facts can be marked with “lack of knowledge”, “object of
study”, “general knowledge”, “hypothesis”, “observation” or “consolidation”. Among
the last three categories, some come from the authors’ experiments and some from
others’ (reported speech). But many points remain to be examined. The main one is
the search in sentences without markers for possible linguistic clues that would enable
general knowledge to be distinguished from observations. We will also have to
explore the methods in biomedical articles. As reported by [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], database curators are
often interested in experimental evidence and methods. A monitored evaluation
performed with an automatic extraction tool could provide interesting information,
giving new directions; furthermore the position in the section of the text (introduction,
results, discussion etc.) is a point that will have to be included in further investigation.
The use of verbs must also be examined: in particular we have observed that the use
of different verb tenses, of negative modal verbs and of the passive voice could have
an important role that should be further analyzed.
      </p>
      <p>
        Lastly, we need to integrate this work with the ontology population process already
set up for biologists [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Shotton</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Portwin</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Klyne</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Miles</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <article-title>Adventures in semantic publishing : exemplar semantic enhancement of a research articles. (submitted for publication),</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Jilani</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amardeilh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Enrichissement automatique d'une base de connaissance biologique à l'aide des outils du Web sémantique</article-title>
          .
          <source>In: Proceedings of the 20th French Conference on Knowledge Engineering</source>
          , pp.
          <fpage>169</fpage>
          --
          <lpage>180</lpage>
          . Hammamet, Tunisia, (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Battistelli</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Amardeilh</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Knowledge Claims in Scientific Literature, Uncertainty and Semantic Annotation: A Case Study in the Biological Domain</article-title>
          .
          <source>(accepted at KCAP - SAAKM 09)</source>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bizer</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Oldakowski</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          <article-title>Using context- and content-based trust policies on the semantic web</article-title>
          .
          <source>In: Proceedings of Alternate Track Papers &amp; Posters of WWW</source>
          <year>2004</year>
          , pp.
          <fpage>228</fpage>
          --
          <lpage>229</lpage>
          (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Vold</surname>
          </string-name>
          , E. T.:
          <article-title>Modalité épistémique et discours scientifique : une étude contrastive des modalisateurs épistémiques dans des articles de recherche français, norvégiens et anglais, en linguistique et médecine</article-title>
          .
          <source>Doctoral Thesis</source>
          , University of Bergen, (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Svarzas</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vincze</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Csirik</surname>
            ,
            <given-names>J.:</given-names>
          </string-name>
          <article-title>The BioScope Corpus: annotation for negation, uncertainty, and their scope in biomedical texts</article-title>
          .
          <source>In: Proceedings of BioNLP 2008: Current Trends in Biomedical Natural Language Processing</source>
          , pp.
          <fpage>38</fpage>
          --
          <lpage>45</lpage>
          . Columbus, Ohio, USA, (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Thompson</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Venturi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McNaught</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montemagni</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ananiadou</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Categorizing Modality in Biomedical Texts.
          <source>In: Proceedings of the LREC 2008 Workshop on Building and Evaluating Resources for Biomedical Text Mining. Marrakech</source>
          , Morocco, (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Sauri</surname>
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhagen</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pustejovsky</surname>
            <given-names>J</given-names>
          </string-name>
          .:
          <article-title>Annotating and recognizing event modality in text</article-title>
          .
          <source>In: Proceedings of the 19th International FLAIRS Conference, FLAIRS</source>
          <year>2006</year>
          , Melbourne Beach, Florida, (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Kilicoglu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bergler</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Recognizing speculative language in biomedical research articles: a linguistically motivated perspective</article-title>
          ,
          <source>BMC Bioinformatics</source>
          ,
          <volume>9</volume>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Shatkay</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rzhetsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wilbur</surname>
          </string-name>
          , W.J.
          <article-title>: Multi-dimensional classification of biomedical text: Toward automated, practical provision of high-utility text to diverse users</article-title>
          .
          <source>In: Bioinformatics</source>
          , 24,
          <fpage>2086</fpage>
          --
          <lpage>2093</lpage>
          , (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Wilbur</surname>
            ,
            <given-names>W.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rzhetsky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shatkay</surname>
          </string-name>
          , H.:
          <article-title>New directions in biomedical text annotation: definitions, guidelines, and corpus construction</article-title>
          .
          <source>BMC Bioinformatics 7</source>
          , (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>R.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Di Marco</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kroon</surname>
            ,
            <given-names>F.W.:</given-names>
          </string-name>
          <article-title>The frequency of hedging cues in citation contexts in scientific writings</article-title>
          . In: Tawfik,
          <string-name>
            <given-names>A.Y</given-names>
            ,
            <surname>Goodwin</surname>
          </string-name>
          , S.D. (eds.).
          <source>Canadian AI</source>
          <year>2004</year>
          .
          <article-title>LNCS</article-title>
          , vol.
          <volume>3060</volume>
          , pp.
          <fpage>75</fpage>
          --
          <lpage>88</lpage>
          . Springer, Berlin Heidelberg (
          <year>2004</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>