<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>GDWDS: First Insights from a Student-based Key Phrase Annotation Process of Medical Information Needs on a Novel German Diabetes Web Data Set</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Julia Romberg</string-name>
          <email>romberg@cs.uni-duesseldorf.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Institute of Computer Science Heinrich Heine University Düsseldorf D-40225 Düsseldorf</institution>
          ,
          <country country="DE">Germany</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2018</year>
      </pub-date>
      <abstract>
        <p>The information needs of individuals are at the forefront of various issues. One platform that users use to address their needs is Internet forums. Medical forums in particular are very much shaped by questions and articulated needs. As part of our research, the need for information is to be examined speci cally in the context of diabetes expressed in web forums. For this purpose we introduce GDWDS, a novel German diabetes web data set. Assuming that the information needs can be understood as key phrases, the record was annotated by student annotators. Three tasks were addressed: First the recognition of key phrases in a document. Second, the annotators were requested to summarize key phrase of the same content in one group. Third, every group should be represented by the most meaningful key phrase contained in this group. The main annotation task of identifying the text units that express information needs lead to an average Krippendor 's unitized Alpha of 0:439 which is promising. The tasks of grouping the key phrases and selecting a representative could only be evaluated to a limited extent due to their subjective dependence on the key phrase detection task.</p>
      </abstract>
      <kwd-group>
        <kwd>Information Retrieval</kwd>
        <kwd>Information Needs</kwd>
        <kwd>Keyphrase Extraction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Categories and Subject Descriptors</title>
      <p>H.2.8 [Database Management]: Database Applications:
data mining; I.2.7 [Arti cial Intelligence]: Natural
Language Processing: language parsing and understanding, text
analysis; H.3 [Information Storage and Retrieval]:
Information Search and Retrieval</p>
    </sec>
    <sec id="sec-2">
      <title>1. INTRODUCTION AND MOTIVATION</title>
      <p>Nowadays social media has taken an important place in
most people's lives. Platforms such as Twitter, Facebook
and Instagram are widely used to communicate feelings and
opinions. Besides the just mentioned prominent examples
exists a variety of other mediums which serve as speaking
tube. Especially blogs and forums are used to inform about
speci c themes and to discuss them.</p>
      <p>
        One particular aspect that is increasingly picked out as
a central theme are health-related topics. In [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] Sokolova
et al. have identi ed multiple reasons for the use of medical
forums in several studies from the years 1990 to 2009: On
the one hand, persons who are either su ering theirselves
from a disease or whose beloved ones do, may search for
information that exceeds the information provided by an
attending doctor. Thereby, the information need ranges from
psychological, physical, and social aspects of treatments to
alternative treatments. On the other hand, forums o er a
point of contact for people that seek for emotional support,
especially from other fellow su erers. Furthermore, forums
often provide a feeling of anonymity to members, which helps
them to communicate more openly about their experiences.
      </p>
      <p>A widespread disease is the metabolic disease diabetes
mellitus. In 2017, according to the International Diabetes
Federation1, approximately 425 million adults worldwide have
su ered from Diabetes2, which is more than 5% of the world
population.</p>
      <p>Diabetes appears mainly in two di erent forms, Type 1
and Type 2. While genetics and environmental factors are
mostly held responsible for Type 1, Type 2 is additionally
associated with lifestyle factors. Diabetes is a disease which
often accompanies the a ected persons their entire lives. In
order to facilitate that these persons can live a normal life
nevertheless, a good insulin adjustment and an appropriate
routine in exercise and nutrition may be needed.
Institutions, for example the Deutsches Diabetes-Zentrum3 (German
Diabetes-Center), aim to improve patients' quality of life,
among other things, by focusing on the patient's information
needs and preferences. Patient statistics on these points are
collected using questionnaires. This course of action
unfortunately shows some weaknesses: (i) The number of questions
is limited. (ii) Only a limited number of people can take part
in a survey. (iii) The evaluation is time-consuming and di
1http://www.idf.org/
2www.diabetesatlas.org
3http://ddz.uni-duesseldorf.de/en/
cult, especially when having free-text elds, which currently
require a (manual) qualitative analysis. (iv) The physicians
and researchers developing the questionnaires usually have
another point of view on the diseases and how the
treatments a ect the patients. Finally, the researchers who
develop the questionnaires usually have a di erent perspective on
a disease and on how a treatment a ects the patients than
those a ected. Therefore, patient-relevant questions could
be omitted.</p>
      <p>
        An alternative approach for the analysis of information
needs and preferences of diabetes patients is the use of
information retrieval techniques. A rst intuition would be to
apply natural language processing techniques to the
questionaries' included free-text elds. However, to address all
the problems listed above, the whole course of action should
be changed: Instead of manually posing questions and
manually analyzing them in tedious work, existing resources
can be used, namely medical online forums. At this point,
it is necessary to discuss if and to what extent an online
forum community can represent the general population of
diabetic a ects. In [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] online social networking for diabetes
is examined. The authors found in the study that online
groups on diabetes, using the example of Facebook, cover
a broad spectrum of involved persons, such as patients and
their families. Another interesting fact is a special
technical a nity of diabetes patients, which is due to the current
treatment methods such as app-based monitoring of
diabetes. This suggests that the existing information needs of the
total population are re ected to a large extent in online
health media. At the same time, however, it is important to
remember that older patients or patients who have been in
treatment for a very long time are unlikely using these
channels. It must also be taken into account that the data corpus
of this work refers to the information needs in industrialized
countries using the example of Germany.
      </p>
      <p>In this paper, we focus on the annotation process of a data
corpus based on forums of this kind. Our long-term research
goal is the automated recognition and extraction of
information needs. In order to be able to implement this task
well-founded, a prior focus on an appropriate corpus
annotation is necessary as evaluation is an essential point to keep
in mind. The remainder of this paper is structured as
follows: First, the data set is introduced. The implementation
and nature of the annotation and the di erent steps of the
annotation process are explained. Subsequently, the
quality of the resulting data set is calculated and discussed by
means of an Inter-Annotator Agreement. We then conclude
and describe the use of this study for a further annotation
process.</p>
    </sec>
    <sec id="sec-3">
      <title>RELATED WORK</title>
      <p>There has been previous research in the eld of social
media health and diabetes in the recent years.</p>
      <p>
        Multiple publications have focused on content analysis on
medical social media texts. Denecke et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] compared
different social media health data sources by rst extracting
medical concepts and then pointing out content di erences.
They also focused on the binary classi cation problem of
informative versus a ective statements. In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] medical
support group texts were clustered into topics whereas in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]
clustering was used to analyze user preferences for the use
of information sources and to analyze the users' general
posting behavior. Ravert et al. [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] analyzed the content online
forum messages from adolescents with Type 1 diabetes. In
doing this, a corpus consisting of 340 posts was annotated
with respect to age, gender, date and duration of illness.
They found that diabetes a ected persons visit online
forums mainly for the sake of social support, information and
advice along with shared experiences. These ndings
support our motivation of investigating the information needs
in diabetes online forums. Although the corpus is interesting,
the annotations unfortunately lack reference to information
needs and key phrases.
      </p>
      <p>
        A lot of research has been conducted in sentiment
analysis and opinion mining [
        <xref ref-type="bibr" rid="ref1 ref15 ref16 ref2 ref6">16, 15, 2, 6, 1</xref>
        ]. The used corpora
vary from medical forum data for In Vitro Fertilization and
Hearing Loss over drug reviews to Twitter messages and
message boards. Reader-based as well as author-centric
annotation models were applied. Furthermore, domain
specic lexicons were developed: In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] a lexicon was built from
drug reviews. Sokolova et al. [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ] introduced HealthA ect, a
domain-speci c a ective lexicon. Both papers conclude that
general sentiment and a ective lexicons cannot adequately
serve for social media health texts because of the speci c
terms and language used in this area.
      </p>
      <p>
        Further research was conducted on the detection and
extraction of adverse drug events in social media texts. Karimi
et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] developed CADEC, a annotated corpus of adverse
drug events. Liu et al. [
        <xref ref-type="bibr" rid="ref10 ref11">11, 10</xref>
        ] investigated on identifying
adverse drug events and implemented an information
extraction system for adverse drug events, both on a data set
focused on diabetes. These corpora contain information
speci c to adverse drug events, which at best expresses a subset
of the general need for information.
3.
      </p>
    </sec>
    <sec id="sec-4">
      <title>THE CORPUS</title>
      <p>In this section the creation of an appropriate data corpus,
needed for later research, is discussed. To the best of our
knowledge, there is no existing data corpus consisting of
diabetes forum messages that has been annotated in a sense
we could use for our analysis.</p>
      <p>Our objective is the recognition and extraction of the
information needs of forum users. The following pattern was
recognized in forum posts. A contribution is opened up in
the multiplicity of cases in order to ask of the community
information on a certain topic or an answer to a concrete
question. For this purpose, rst the more detailed circumstances
are explained and then the corresponding questions are
formulated. Subsequently, other users respond to the post with
answers and descriptions of their own experiences.</p>
      <p>The data corpus GDWDS was build on a freely accessible
German diabetes forum. The data set for this initial
study was build by extracting 150 forum contributions from
the corpus. Assuming that a user announces his information
needs when creating a thread, only the initial contributions
were retained while the replies were discarded. Often the
title of a thread also contains important information. In order
to keep this information, the thread title was added to the
document as a heading.
3.1</p>
    </sec>
    <sec id="sec-5">
      <title>Annotation Setup</title>
      <p>We see the problem of recognizing information needs as an
information extraction problem. Key words and key phrases
expressing these needs should be extracted to allow a
summary of the information needs. To form a gold standard for
the evaluation of such techniques, an annotation of the
daPHASE 1</p>
      <p>Alternative
medication for medicament X?</p>
      <p>Hello everybody,
right now I'm thretened with
medicament X 3x20 mg.</p>
      <p>I have been taking this
drug for many years...</p>
      <p>Are there any
alternative medications now?
Any help appreciated!</p>
      <p>Thank you,
Anonymous
PHASE 2</p>
      <p>Group 1
Alternative medication for
medicament X?
Are there any alternative
medications now?</p>
      <p>Group 2
PHASE 3</p>
      <p>Group 1</p>
      <p>Group 2
Alternative medication for
medicament X?
Are there any alternative
medications now?
right now I'm thretened with
medicament X 3x20 mg
right now I'm thretened
with medicament X 3x20 mg
ta set must be made. A text sequence is to be divided into
annotation units, which are then assigned to a class.
According to our task there are two classes: key phrase and no key
phrase.</p>
      <p>
        Twenty- ve student annotators were divided into ve
annotation groups with 4 persons and one group of 5 persons.
The GDWDS's 150 documents were divided among the six
groups so that each group had to handle a workload of 25
documents. The annotators were instructed to carry out the
annotations independently. The annotations were
implemented using MDSWriter [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. This tool, originally developed for
creating multi-document summarization corpora, was used
in a modi ed form. We only use the rst three phases of the
tool: recognizing key phrases, grouping key phrases with the
same content, and identifying a best key phrase within each
group.
      </p>
      <p>
        In an introductory phase the annotators were explained
the guidelines to be ful lled. These guidelines were
developed based on the guidelines of [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ].
      </p>
      <p>It should be noted that the annotation process presented
here is rather unusual. In most cases, in a qualitative
annotation setting with extensive rules, only a subset of the
data is processed by several annotators in order to be able
to estimate the annotation quality. The remaining corpus is
divided on the individual annotators. In the study presented
here, the focus is on testing the admissibility and
completeness of the guidelines that have already been developed.
The results contribute to the nal annotation of the entire
body.
3.1.1</p>
      <p>Phase 1 - Key Phrase Recognition
1. The participants were rst requested to read a
document, i.e. a forum contribution, completely before
starting the annotation. Unknown words should be looked
up or asked in advance to ensure comprehension.
2. Subsequently, the key phrases should be marked. The
following guidelines should be followed:</p>
      <sec id="sec-5-1">
        <title>A key phrase should at least consist of a predicate plus subject or a predicate plus an object.</title>
      </sec>
      <sec id="sec-5-2">
        <title>A key phrase must not exceed a sentence boundary.</title>
      </sec>
      <sec id="sec-5-3">
        <title>A key phrase is intended to contain important</title>
        <p>content related to the information need
expressed in the document. This may refer to an explicit
formulation as a question but also to contextual
information that is important for an accurate
description of the information need.</p>
      </sec>
      <sec id="sec-5-4">
        <title>If a key idea is described several times in the document, all entries must be marked. 3. Finally, the recognized key phrases should be reviewed and checked for clarity, accuracy and content.</title>
        <p>The clari cation of unknown words in (1.) is of particular
importance, since the medical context requires many
technical terms and abbreviations. In addition, there are a few
abbreviations that di er from the conventional vocabulary.
These terms seem to have evolved within the forum
community.
3.1.2</p>
        <sec id="sec-5-4-1">
          <title>Phase 2 - Key Phrase Grouping</title>
          <p>Following the identi cation of the key phrases, the
participants were asked to group phrases of the same content
together. Although the texts to be annotated are on average
only 1187 characters long, re-mentions occur, among other
things caused by the addition of the thread title.
3.1.3</p>
        </sec>
        <sec id="sec-5-4-2">
          <title>Phase 3 - Best Key Phrase Identification</title>
          <p>In the nal annotation phase, participants should select a
representative in each group of key ideas. The representative
should contain the largest possible information content.</p>
          <p>The three annotation phases are illustrated in Figure 1. In
phase 1, the annotator sees the document to be unitized. The
selected key phrases are underlined in green. Subsequently
the key phrases are requested to be grouped according to
their content. In this example two key phrases refer to the
need for information in relation to an alternative medication
to the current one. Hence, they are summed up. The third
key phrase relates to content information, which clari es the
expressed information needs and is equally important. This
phrase builds a second group of key phrases. Finally in phase
3 the annotator must decide for a best key phrase inside
of each group created in the previous phase. The best key
phrase is bold. For group 2 no discussion is needed. The
representative in group 1 is selected based on the request
for the largest possible information content as not only the
question for an alternative but also the name of the currently
used medicament is stated.
3.2</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Inter-Annotator Agreement</title>
      <p>Following the annotation task itself, the resulting
annotations need to be evaluated. For this the Inter-Annotator
Agreement of the persons of the same annotation group is
calculated.
3.2.1</p>
      <sec id="sec-6-1">
        <title>Phase 1 - Key Phrase Recognition</title>
        <p>
          Since annotation phase 1 is a unitizing task with one
category, we use Krippendor 's unitized alpha U (introduced
in [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]) as a measure. U 2 [ 1; 1] describes the
correspondence of di erent annotators' coding units on the same text
document. 1 expresses maximum agreement, 0 shows that no
correlation exists between the units and the classes, and 1
symbolizes a uniform disagreement. The calculations were
carried out with DKPro Agreement [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
        <p>First, for every of the six groups of annotators described
in Section 3.1 the groups' agreement over all 25 documents
was considered. Table 1 shows the agreement within the
annotation groups. Annotation group 2 and 3 obtain the
best agreement having an U above 0:5. Group 1, 5 and 6
agree with a value greater than 0:4. Group 4, however,
performs signi cantly worse achieving only an U of 0:210. One
possible explanation might be the text length of the
documents. The average length of a text document in group 4
was 1645.36 characters. The other groups had on average
shorter texts with at least 400 characters less. The longer a
text is, the more descriptive the information requirement is
described. Likewise, increasingly diverting content may
occur. This makes unitizing key phrases harder. Nevertheless,
the remaining groups achieve encouraging Inter-Annotator
Agreements.</p>
        <p>To evaluate the annotations more accurately and in more
detail, the Inter-Annotator Agreements are further
examined at the document level. The quality of the annotation
results of the individual documents is illustrated in Figure
2. The box plot of each group shows the worst as well as
the best U value achieved for a document assigned to this
group. The boxes illustrate the quartiles and the median.
The mean value shown in Table 1 is illustrated with a cross.
As can be seen, the agreement within the groups is very
variable. The box plots re ect again that group 4 performs
worse than the other groups. However, the values achieved
in every group extend over an interval of length 0:8 to 1:2,
which corresponds to approximately half the value range of
Krippendor 's unitized Alpha. Although the average mean
value of agreement of 0:439 appears acceptable across the
six groups, the large variability of the data indicates that
annotation quality must be considered with caution.</p>
        <p>Figure 3 illustrates the relation between the document
length and the agreement. Unexpectedly, the assumption
that longer documents lead to a worse agreement is not
conrmed here. Although a slight tendency is visible, both the
left and right tail of the distribution represent short
documents. Accordingly, the content of the marginal documents
was analyzed. In documents with poor agreement, it was
noted that the annotators were often in agreement on
important information. However, the distribution of this
information into the di erent key phrases was solved very
differently. Especially in terms of conjunctions like "and, or,
... \ the annotators were divided. Some annotators split into
more granular units than the others. The importance of
context information was also assessed di erently. For example,
in one document a patient described a need for information
against the background of his type 1 illness. He also stated
since when he was a ected by the disease. Here, the
annotators were divided over whether the temporal context is
important for the formulation of the information need. At this
juncture, it should be remembered that the students had no
domain knowledge in the eld of medicine or diabetes,
making it di cult to make a reasoned decision. Furthermore,
annotation errors were observed. In some annotations, the
U
content doubling of a key phrase has not been re-marked.
Individual annotators tended to classify the key phrases so
nely that the selected key phrases individually could not
express a key content of the text.
3.2.2</p>
      </sec>
      <sec id="sec-6-2">
        <title>Phrase 2 - Key Phrase Grouping and Phase 3 - Best Key Phrase Identification</title>
        <p>Following the recognition of key phrases, the subsequent
grouping needs to be revised. Due to the dependency on
phase 1, it is di cult to assess whether the same groupings have
been made. In an optimal scenario, starting from an equal
set of key phrases, an Inter-Annotator Agreement could be
calculated for the same coding units and a set of classes
corresponding to the number of key phrase groups.</p>
        <p>
          In order to be at least partially able to analyze how much
the annotators in phase 2 agree in their decisions, only
documents with an alpha greater than 0:439 (corresponding
to the average mean of phase 1) are considered.
Furthermore, documents whose annotators disagree on the number of
units are excluded from consideration. With these
restrictions, we want to make sure that the same phrases were
detected in phase 1 allowing a small variation tolerance in
order to build a suitable initial data situation for phase 2.
Thus, we can measure the Inter-Annotator Agreement for
these documents. As appropriate measures we use simple
percentage agreement P A on the one hand and Fleiss
Kappa [
          <xref ref-type="bibr" rid="ref5">5</xref>
          ] on the other hand. We use DKPro Agreement for
the calculation again. Unfortunately, only three documents
meet the required criteria. For the rst of them, all
annotators only assigned one key phrase unit. Accordingly, there
is only one group of key phrases and thus, both P A and
are 1. The second document that ful lls the criteria contains
according to phase 1 two key phrase units. Every annotator
summarized them into the same group which leads to a
perfect agreement in terms of both measures. Prevailing phase
3, it is to be noted that also the best nuggets were
equally chosen. The last document consists of three units. While
three of four annotators completely agreed in dividing the
units into two groups and making the same assignments, the
fourth annotator xed on three groups which nally lead to
a P A of 0:66 and = 0:38. The three annotators building
the same groupings did, however, not agree on the best key
phrase per group.
        </p>
        <p>Since it is obvious that the quality of the dependent
annotations can only be analyzed to a limited extent and
therefore not very meaningful, we will not go further into phase
2 and 3 here.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>CONCLUSION AND FURTHER WORK</title>
      <p>In this work we presented an annotation study of
medical information needs on a german diabetes data set.
Student annotators were instructed to detect key phrases, group
them according to similar content and then to nd a
representative key phrase for every group. For this, the students
had to follow guidelines, presented in Section 3.1.</p>
      <p>Subsequently, the obtained annotations were evaluated.
Since there obviously is no gold standard, in the
evaluation part we focused on the Inter-Annotator Agreement. The
results for phase 1 are promising. Although there is an
obvious variance in the data, for almost half of the documents
the annotators agreed with at least 0:5, annotation group 4
excluded. We observed di erent types of problems. The lack
of subject-speci c knowledge was one of the main problems
annotators had to face. A second problem was the di erent
view of a key phrase's granularity level. Finally we
detected some cases in which the annotators did not concentrate
on the given guidelines producing poor annotations.
Phase 2 and 3 could not be investigated meaningfully as phase
1 directly conditions the initial data situation of the other
phases.</p>
      <p>These ndings lead us to the assumption that, in order
to further increase the quality of the data corpus, experts
need to be taken into account. Healthcare professionals are
important but likewise social media experts are of interest
because of the particular vocabulary that is used in forums.
Our data set showed very speci c expressions that are
only used in the online context of diabetes. Another fact we
must address are the guidelines. These need to be revised
with regard to the annotators insecurities concerning key
phrase granularity, the rules for key phrase grouping and for
choosing a representative key phrase. The annotators also
reported that the very subjective nature of the texts was
another di culty.</p>
      <p>Summarized this rst annotation approach on the
GDWDS achieved promising results, especially in the main
phase, phase 1. As the students had only a short introductory
class into the annotation task, this approach can be seen
as a crowd-sourcing attempt. However, to further increase
annotation quality, we see an expert-based approach at an
advantage. In future work this issue will be addressed. Phase
2 and 3 should be neglected until phase 1 produces results
with a quality su cient for the further phases. Nonetheless
the development of appropriate measures for dependent
annotation tasks may be an interesting area of research.</p>
      <p>
        If the annotation quality of the GDWDS is ensured, the
actual process of keyphrase extraction can be started. As a
rst step we plan to apply and evaluate state-of-the-art
algorithms for key phrase extraction on GDWDS. Machine
learning algorithms and deep learning approaches are prevalent
in this eld. For example, in [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] an interesting approach to
keyword extraction from Twitter using recurrent neural
networks is presented. Alternatively, rule-based or graph-based
approaches should be considered. In accordance with the
results it can be evaluated whether these existing techniques
can be applied to our problem. Depending on the results,
new algorithms may then be developed.
5.
      </p>
    </sec>
    <sec id="sec-8">
      <title>ACKNOWLEDGMENTS</title>
      <p>We want to thank the student annotators, namely Deniz
Ates, Bashkim Berzati, Christian Born, Markus Brenneis,
Nurhan Chahrour, Bjorn Ebbinghaus, Julia Fischer,
Andreas Funke, Philipp Grawe, Frederik Grieshaber, Tobias
Alexander Hogrebe, Michael Janschek, Moritz Kanzler, Sergej
Korlakov, Daniel Laps, Johannes Muller, Alexander
Oberstra , Karsten Packeiser, Kevin Robert Pochwyt, Regina
Stodden, Emil Warkentin, Dennis Weber, Susanna Welzel,
Julian Zenz and Milos Lukas Ziolkowski.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Ali</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Schramm</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sokolova</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Inkpen</surname>
          </string-name>
          .
          <article-title>Can i hear you? sentiment analysis on medical forums</article-title>
          .
          <source>In IJCNLP</source>
          , pages
          <volume>667</volume>
          {
          <fpage>673</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>V.</given-names>
            <surname>Bobicev</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sokolova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Jafer</surname>
          </string-name>
          , and
          <string-name>
            <given-names>D.</given-names>
            <surname>Schramm</surname>
          </string-name>
          .
          <article-title>Learning sentiments from tweets with personal health information</article-title>
          .
          <source>In Canadian Conference on Arti cial Intelligence</source>
          , pages
          <fpage>37</fpage>
          {
          <fpage>48</fpage>
          . Springer,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A. T.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Exploring online support spaces: using cluster analysis to examine breast cancer, diabetes and bromyalgia support groups</article-title>
          .
          <source>Patient education and counseling</source>
          ,
          <volume>87</volume>
          (
          <issue>2</issue>
          ):
          <volume>250</volume>
          {
          <fpage>257</fpage>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K.</given-names>
            <surname>Denecke</surname>
          </string-name>
          and
          <string-name>
            <given-names>W.</given-names>
            <surname>Nejdl</surname>
          </string-name>
          .
          <article-title>How valuable is medical social media data? content analysis of the medical web</article-title>
          .
          <source>Information Sciences</source>
          ,
          <volume>179</volume>
          (
          <issue>12</issue>
          ):
          <year>1870</year>
          {
          <year>1880</year>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J. L.</given-names>
            <surname>Fleiss</surname>
          </string-name>
          .
          <article-title>Measuring nominal scale agreement among many raters</article-title>
          .
          <source>Psychological bulletin</source>
          ,
          <volume>76</volume>
          (
          <issue>5</issue>
          ):
          <fpage>378</fpage>
          ,
          <year>1971</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>L.</given-names>
            <surname>Goeuriot</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.-C.</given-names>
            <surname>Na</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. Y.</given-names>
            <surname>Min Kyaing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Khoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-K.</given-names>
            <surname>Chang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.-L.</given-names>
            <surname>Theng</surname>
          </string-name>
          , and
          <string-name>
            <given-names>J.-J.</given-names>
            <surname>Kim</surname>
          </string-name>
          .
          <article-title>Sentiment lexicons for health-related opinion mining</article-title>
          .
          <source>In Proceedings of the 2nd ACM SIGHIT International Health Informatics Symposium</source>
          , pages
          <volume>219</volume>
          {
          <fpage>226</fpage>
          . ACM,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>J. A.</given-names>
            <surname>Greene</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. K.</given-names>
            <surname>Choudhry</surname>
          </string-name>
          , E. Kilabuk, and
          <string-name>
            <given-names>W. H.</given-names>
            <surname>Shrank</surname>
          </string-name>
          .
          <article-title>Online social networking by patients with diabetes: A qualitative evaluation of communication with facebook</article-title>
          .
          <source>Journal of general internal medicine</source>
          ,
          <volume>26</volume>
          (
          <issue>3</issue>
          ):
          <volume>287</volume>
          {
          <fpage>292</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Karimi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Metke-Jimenez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Kemp</surname>
          </string-name>
          , and
          <string-name>
            <given-names>C.</given-names>
            <surname>Wang</surname>
          </string-name>
          .
          <article-title>Cadec: A corpus of adverse drug event annotations</article-title>
          .
          <source>Journal of biomedical informatics</source>
          ,
          <volume>55</volume>
          :
          <fpage>73</fpage>
          {
          <fpage>81</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>K.</given-names>
            <surname>Krippendor</surname>
          </string-name>
          .
          <article-title>On the reliability of unitizing continuous data</article-title>
          .
          <source>Sociological Methodology</source>
          , pages
          <volume>47</volume>
          {
          <fpage>76</fpage>
          ,
          <year>1995</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Azdrugminer: an information extraction system for mining patient-reported adverse drug events in online patient forums</article-title>
          .
          <source>In International Conference on Smart Health</source>
          , pages
          <volume>134</volume>
          {
          <fpage>150</fpage>
          . Springer,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>X.</given-names>
            <surname>Liu</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <article-title>Identifying adverse drug events from patient social media: A case study for diabetes</article-title>
          .
          <source>IEEE Intelligent Systems</source>
          ,
          <volume>30</volume>
          (
          <issue>3</issue>
          ):
          <volume>44</volume>
          {
          <fpage>51</fpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C. M. Meyer</surname>
            , D. Benikova,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Mieskes</surname>
            ,
            <given-names>and I.</given-names>
          </string-name>
          <string-name>
            <surname>Gurevych</surname>
          </string-name>
          . Mdswriter:
          <article-title>Annotation tool for creating high-quality multi-document summarization corpora</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (ACL): System Demonstrations</source>
          , pages
          <volume>97</volume>
          {
          <fpage>102</fpage>
          , Berlin, Germany,
          <year>August 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <surname>C. M. Meyer</surname>
            , M. Miesked,
            <given-names>C.</given-names>
          </string-name>
          <string-name>
            <surname>Stab</surname>
            ,
            <given-names>and I. Gurevych.</given-names>
          </string-name>
          <article-title>Dkpro agreement: An open-source java library for measuring inter-rater agreement</article-title>
          .
          <source>In Proceedings of the 25th International Conference on Computational Linguistics: System Demonstrations (COLING)</source>
          , pages
          <fpage>105</fpage>
          {
          <fpage>109</fpage>
          , Dublin, Ireland,
          <year>August 2014</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>R. D.</given-names>
            <surname>Ravert</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. D.</given-names>
            <surname>Hancock</surname>
          </string-name>
          , and
          <string-name>
            <given-names>G. M.</given-names>
            <surname>Ingersoll</surname>
          </string-name>
          .
          <article-title>Online forum messages posted by adolescents with type 1 diabetes</article-title>
          .
          <source>The Diabetes Educator</source>
          ,
          <volume>30</volume>
          (
          <issue>5</issue>
          ):
          <volume>827</volume>
          {
          <fpage>834</fpage>
          ,
          <year>2004</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sokolova</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Bobicev</surname>
          </string-name>
          .
          <article-title>Sentiments and opinions in health-related web messages</article-title>
          .
          <source>In RANLP</source>
          , pages
          <volume>132</volume>
          {
          <fpage>139</fpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>M.</given-names>
            <surname>Sokolova</surname>
          </string-name>
          and
          <string-name>
            <given-names>V.</given-names>
            <surname>Bobicev</surname>
          </string-name>
          .
          <article-title>What sentiments can be found in medical forums? In RANLP</article-title>
          , volume
          <year>2013</year>
          , pages
          <fpage>633</fpage>
          {
          <fpage>639</fpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>F.</given-names>
            <surname>Sudau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Friede</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Grabowski</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Koschack</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Makedonski</surname>
          </string-name>
          , and
          <string-name>
            <given-names>W.</given-names>
            <surname>Himmel</surname>
          </string-name>
          .
          <article-title>Sources of information and behavioral patterns in online health forums: observational study</article-title>
          .
          <source>Journal of medical Internet research</source>
          ,
          <volume>16</volume>
          (
          <issue>1</issue>
          ):e10,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>X.</given-names>
            <surname>Huang</surname>
          </string-name>
          .
          <article-title>Keyphrase extraction using deep recurrent neural networks on twitter</article-title>
          .
          <source>In Proceedings of the 2016 Conference on Empirical Methods in Natural Language Processing</source>
          , pages
          <volume>836</volume>
          {
          <fpage>845</fpage>
          ,
          <string-name>
            <surname>Austin</surname>
          </string-name>
          , Texas,
          <year>November 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>