GDWDS: First Insights from a Student-based Key Phrase
   Annotation Process of Medical Information Needs on a
           Novel German Diabetes Web Data Set

                                                           Julia Romberg
                                                  Institute of Computer Science
                                               Heinrich Heine University Düsseldorf
                                                 D-40225 Düsseldorf, Germany
                                              romberg@cs.uni-duesseldorf.de

ABSTRACT                                                              1.   INTRODUCTION AND MOTIVATION
The information needs of individuals are at the forefront of             Nowadays social media has taken an important place in
various issues. One platform that users use to address their          most people’s lives. Platforms such as Twitter, Facebook
needs is Internet forums. Medical forums in particular are            and Instagram are widely used to communicate feelings and
very much shaped by questions and articulated needs.                  opinions. Besides the just mentioned prominent examples
   As part of our research, the need for information is to            exists a variety of other mediums which serve as speaking
be examined specifically in the context of diabetes expres-           tube. Especially blogs and forums are used to inform about
sed in web forums. For this purpose we introduce GDWDS,               specific themes and to discuss them.
a novel German diabetes web data set. Assuming that the                  One particular aspect that is increasingly picked out as
information needs can be understood as key phrases, the               a central theme are health-related topics. In [16] Sokolova
record was annotated by student annotators. Three tasks               et al. have identified multiple reasons for the use of medical
were addressed: First the recognition of key phrases in a             forums in several studies from the years 1990 to 2009: On
document. Second, the annotators were requested to sum-               the one hand, persons who are either suffering theirselves
marize key phrase of the same content in one group. Third,            from a disease or whose beloved ones do, may search for
every group should be represented by the most meaningful              information that exceeds the information provided by an
key phrase contained in this group.                                   attending doctor. Thereby, the information need ranges from
   The main annotation task of identifying the text units             psychological, physical, and social aspects of treatments to
that express information needs lead to an average Krippen-            alternative treatments. On the other hand, forums offer a
dorff’s unitized Alpha of 0.439 which is promising. The tasks         point of contact for people that seek for emotional support,
of grouping the key phrases and selecting a representative            especially from other fellow sufferers. Furthermore, forums
could only be evaluated to a limited extent due to their sub-         often provide a feeling of anonymity to members, which helps
jective dependence on the key phrase detection task.                  them to communicate more openly about their experiences.
                                                                         A widespread disease is the metabolic disease diabetes
Categories and Subject Descriptors                                    mellitus. In 2017, according to the International Diabetes Fe-
                                                                      deration1 , approximately 425 million adults worldwide have
H.2.8 [Database Management]: Database Applications:                   suffered from Diabetes2 , which is more than 5% of the world
data mining; I.2.7 [Artificial Intelligence]: Natural Lan-            population.
guage Processing: language parsing and understanding, text               Diabetes appears mainly in two different forms, Type 1
analysis; H.3 [Information Storage and Retrieval]: In-                and Type 2. While genetics and environmental factors are
formation Search and Retrieval                                        mostly held responsible for Type 1, Type 2 is additionally
                                                                      associated with lifestyle factors. Diabetes is a disease which
Keywords                                                              often accompanies the affected persons their entire lives. In
                                                                      order to facilitate that these persons can live a normal life
Information Retrieval, Information Needs, Keyphrase Ex-
                                                                      nevertheless, a good insulin adjustment and an appropriate
traction
                                                                      routine in exercise and nutrition may be needed. Instituti-
                                                                      ons, for example the Deutsches Diabetes-Zentrum 3 (German
                                                                      Diabetes-Center), aim to improve patients’ quality of life,
                                                                      among other things, by focusing on the patient’s information
                                                                      needs and preferences. Patient statistics on these points are
                                                                      collected using questionnaires. This course of action unfortu-
                                                                      nately shows some weaknesses: (i) The number of questions
                                                                      is limited. (ii) Only a limited number of people can take part
                                                                      in a survey. (iii) The evaluation is time-consuming and diffi-
                                                                      1
                                                                        http://www.idf.org/
30th GI-Workshop on Foundations of Databases (Grundlagen von Daten-   2
banken), 22.05.2018 - 25.05.2018, Wuppertal, Germany.                   www.diabetesatlas.org
                                                                      3
Copyright is held by the author/owner(s).                               http://ddz.uni-duesseldorf.de/en/
cult, especially when having free-text fields, which currently    forum messages from adolescents with Type 1 diabetes. In
require a (manual) qualitative analysis. (iv) The physicians      doing this, a corpus consisting of 340 posts was annotated
and researchers developing the questionnaires usually have        with respect to age, gender, date and duration of illness.
another point of view on the diseases and how the treat-          They found that diabetes affected persons visit online fo-
ments affect the patients. Finally, the researchers who deve-     rums mainly for the sake of social support, information and
lop the questionnaires usually have a different perspective on    advice along with shared experiences. These findings sup-
a disease and on how a treatment affects the patients than        port our motivation of investigating the information needs
those affected. Therefore, patient-relevant questions could       in diabetes online forums. Although the corpus is interesting,
be omitted.                                                       the annotations unfortunately lack reference to information
   An alternative approach for the analysis of information        needs and key phrases.
needs and preferences of diabetes patients is the use of in-         A lot of research has been conducted in sentiment analy-
formation retrieval techniques. A first intuition would be to     sis and opinion mining [16, 15, 2, 6, 1]. The used corpora
apply natural language processing techniques to the ques-         vary from medical forum data for In Vitro Fertilization and
tionaries’ included free-text fields. However, to address all     Hearing Loss over drug reviews to Twitter messages and
the problems listed above, the whole course of action should      message boards. Reader-based as well as author-centric an-
be changed: Instead of manually posing questions and ma-          notation models were applied. Furthermore, domain speci-
nually analyzing them in tedious work, existing resources         fic lexicons were developed: In [6] a lexicon was built from
can be used, namely medical online forums. At this point,         drug reviews. Sokolova et al. [16] introduced HealthAffect, a
it is necessary to discuss if and to what extent an online        domain-specific affective lexicon. Both papers conclude that
forum community can represent the general population of           general sentiment and affective lexicons cannot adequately
diabetic affects. In [7] online social networking for diabetes    serve for social media health texts because of the specific
is examined. The authors found in the study that online           terms and language used in this area.
groups on diabetes, using the example of Facebook, cover             Further research was conducted on the detection and ex-
a broad spectrum of involved persons, such as patients and        traction of adverse drug events in social media texts. Karimi
their families. Another interesting fact is a special techni-     et al. [8] developed CADEC, a annotated corpus of adverse
cal affinity of diabetes patients, which is due to the current    drug events. Liu et al. [11, 10] investigated on identifying
treatment methods such as app-based monitoring of diabe-          adverse drug events and implemented an information ex-
tes. This suggests that the existing information needs of the     traction system for adverse drug events, both on a data set
total population are reflected to a large extent in online he-    focused on diabetes. These corpora contain information spe-
alth media. At the same time, however, it is important to         cific to adverse drug events, which at best expresses a subset
remember that older patients or patients who have been in         of the general need for information.
treatment for a very long time are unlikely using these chan-
nels. It must also be taken into account that the data corpus     3.    THE CORPUS
of this work refers to the information needs in industrialized
                                                                     In this section the creation of an appropriate data corpus,
countries using the example of Germany.
                                                                  needed for later research, is discussed. To the best of our
   In this paper, we focus on the annotation process of a data
                                                                  knowledge, there is no existing data corpus consisting of
corpus based on forums of this kind. Our long-term research
                                                                  diabetes forum messages that has been annotated in a sense
goal is the automated recognition and extraction of infor-
                                                                  we could use for our analysis.
mation needs. In order to be able to implement this task
                                                                     Our objective is the recognition and extraction of the in-
well-founded, a prior focus on an appropriate corpus anno-
                                                                  formation needs of forum users. The following pattern was
tation is necessary as evaluation is an essential point to keep
                                                                  recognized in forum posts. A contribution is opened up in
in mind. The remainder of this paper is structured as fol-
                                                                  the multiplicity of cases in order to ask of the community in-
lows: First, the data set is introduced. The implementation
                                                                  formation on a certain topic or an answer to a concrete ques-
and nature of the annotation and the different steps of the
                                                                  tion. For this purpose, first the more detailed circumstances
annotation process are explained. Subsequently, the quali-
                                                                  are explained and then the corresponding questions are for-
ty of the resulting data set is calculated and discussed by
                                                                  mulated. Subsequently, other users respond to the post with
means of an Inter-Annotator Agreement. We then conclude
                                                                  answers and descriptions of their own experiences.
and describe the use of this study for a further annotation
                                                                     The data corpus GDWDS was build on a freely accessible
process.
                                                                  German diabetes forum. The data set for this initial stu-
                                                                  dy was build by extracting 150 forum contributions from
2.   RELATED WORK                                                 the corpus. Assuming that a user announces his information
   There has been previous research in the field of social        needs when creating a thread, only the initial contributions
media health and diabetes in the recent years.                    were retained while the replies were discarded. Often the tit-
   Multiple publications have focused on content analysis on      le of a thread also contains important information. In order
medical social media texts. Denecke et al. [4] compared dif-      to keep this information, the thread title was added to the
ferent social media health data sources by first extracting       document as a heading.
medical concepts and then pointing out content differences.
They also focused on the binary classification problem of         3.1   Annotation Setup
informative versus affective statements. In [3] medical sup-        We see the problem of recognizing information needs as an
port group texts were clustered into topics whereas in [17]       information extraction problem. Key words and key phrases
clustering was used to analyze user preferences for the use       expressing these needs should be extracted to allow a sum-
of information sources and to analyze the users’ general pos-     mary of the information needs. To form a gold standard for
ting behavior. Ravert et al. [14] analyzed the content online     the evaluation of such techniques, an annotation of the da-
                 PHASE 1                                   PHASE 2                                    PHASE 3

             Alternative medicati-
            on for medicament X?                                Group 1                                 Group 1

                                                    • Alternative medication for               • Alternative medication for
               Hello everybody,                       medicament X?                              medicament X?
         right now I’m thretened with               • Are there any alternative                • Are there any alternative medi-
            medicament X 3x20 mg.                     medications now?                           cations now?
             I have been taking this
              drug for many years...
                                                                Group 2                                 Group 2
            Are there any alterna-
            tive medications now?                   • right now I’m thretened with             • right now I’m thretened
                                                      medicament X 3x20 mg                       with medicament X 3x20 mg
             Any help appreciated!
                 Thank you,
                 Anonymous


Figure 1: The three annotation phases (phase 1 - key phrase recognition, phase 2 - key phrase grouping, phase
3 - best key phrase identification) are illustrated by means of an example document.


ta set must be made. A text sequence is to be divided into                     • A key phrase must not exceed a sentence boun-
annotation units, which are then assigned to a class. Accor-                     dary.
ding to our task there are two classes: key phrase and no key                  • A key phrase is intended to contain important
phrase.                                                                          content related to the information need expres-
  Twenty-five student annotators were divided into five an-                      sed in the document. This may refer to an explicit
notation groups with 4 persons and one group of 5 persons.                       formulation as a question but also to contextual
The GDWDS’s 150 documents were divided among the six                             information that is important for an accurate de-
groups so that each group had to handle a workload of 25                         scription of the information need.
documents. The annotators were instructed to carry out the
annotations independently. The annotations were implemen-                      • If a key idea is described several times in the do-
ted using MDSWriter [12]. This tool, originally developed for                    cument, all entries must be marked.
creating multi-document summarization corpora, was used                   3. Finally, the recognized key phrases should be reviewed
in a modified form. We only use the first three phases of the                and checked for clarity, accuracy and content.
tool: recognizing key phrases, grouping key phrases with the
same content, and identifying a best key phrase within each            The clarification of unknown words in (1.) is of particular
group.                                                               importance, since the medical context requires many tech-
  In an introductory phase the annotators were explained             nical terms and abbreviations. In addition, there are a few
the guidelines to be fulfilled. These guidelines were develo-        abbreviations that differ from the conventional vocabulary.
ped based on the guidelines of [12].                                 These terms seem to have evolved within the forum commu-
  It should be noted that the annotation process presented           nity.
here is rather unusual. In most cases, in a qualitative an-
notation setting with extensive rules, only a subset of the           3.1.2     Phase 2 - Key Phrase Grouping
data is processed by several annotators in order to be able             Following the identification of the key phrases, the par-
to estimate the annotation quality. The remaining corpus is          ticipants were asked to group phrases of the same content
divided on the individual annotators. In the study presented         together. Although the texts to be annotated are on average
here, the focus is on testing the admissibility and comple-          only 1187 characters long, re-mentions occur, among other
teness of the guidelines that have already been developed.           things caused by the addition of the thread title.
The results contribute to the final annotation of the entire
body.                                                                 3.1.3     Phase 3 - Best Key Phrase Identification
                                                                       In the final annotation phase, participants should select a
3.1.1    Phase 1 - Key Phrase Recognition                            representative in each group of key ideas. The representative
                                                                     should contain the largest possible information content.
  1. The participants were first requested to read a docu-
     ment, i.e. a forum contribution, completely before star-          The three annotation phases are illustrated in Figure 1. In
     ting the annotation. Unknown words should be looked             phase 1, the annotator sees the document to be unitized. The
     up or asked in advance to ensure comprehension.                 selected key phrases are underlined in green. Subsequently
                                                                     the key phrases are requested to be grouped according to
  2. Subsequently, the key phrases should be marked. The             their content. In this example two key phrases refer to the
     following guidelines should be followed:                        need for information in relation to an alternative medication
                                                                     to the current one. Hence, they are summed up. The third
        • A key phrase should at least consist of a predicate        key phrase relates to content information, which clarifies the
          plus subject or a predicate plus an object.                expressed information needs and is equally important. This
                                                                                    8,000                                   Group 1
                      6                                                                                                     Group 2
                                                                                                                            Group 3
                      5                                                             6,000                                   Group 4
   Annotation Group


                                                                  Document Length
                                                                                                                            Group 5
                      4                                                                                                     Group 6
                                                                                    4,000
                      3

                      2                                                             2,000

                      1
                                                                                       0

                            −1     −0.5       0      0.5     1                          −0.4 −0.2   0   0.2    0.4   0.6   0.8   1
                                             Uα                                                               Uα


Figure 2: Box plots depicting the distribution of the            Figure 3: Plot showing the relation of Uα and the
achieved values (U α) within the individual annotati-            document length.
on groups.


phrase builds a second group of key phrases. Finally in phase        To evaluate the annotations more accurately and in more
3 the annotator must decide for a best key phrase inside         detail, the Inter-Annotator Agreements are further exami-
of each group created in the previous phase. The best key        ned at the document level. The quality of the annotation
phrase is bold. For group 2 no discussion is needed. The         results of the individual documents is illustrated in Figure
representative in group 1 is selected based on the request       2. The box plot of each group shows the worst as well as
for the largest possible information content as not only the     the best U α value achieved for a document assigned to this
question for an alternative but also the name of the currently   group. The boxes illustrate the quartiles and the median.
used medicament is stated.                                       The mean value shown in Table 1 is illustrated with a cross.
                                                                 As can be seen, the agreement within the groups is very
3.2                   Inter-Annotator Agreement                  variable. The box plots reflect again that group 4 performs
  Following the annotation task itself, the resulting anno-      worse than the other groups. However, the values achieved
tations need to be evaluated. For this the Inter-Annotator       in every group extend over an interval of length 0.8 to 1.2,
Agreement of the persons of the same annotation group is         which corresponds to approximately half the value range of
calculated.                                                      Krippendorff’s unitized Alpha. Although the average mean
                                                                 value of agreement of 0.439 appears acceptable across the
3.2.1                     Phase 1 - Key Phrase Recognition       six groups, the large variability of the data indicates that
  Since annotation phase 1 is a unitizing task with one ca-      annotation quality must be considered with caution.
tegory, we use Krippendorff’s unitized alpha U α (introduced         Figure 3 illustrates the relation between the document
in [9]) as a measure. U α ∈ [−1, 1] describes the correspon-     length and the agreement. Unexpectedly, the assumption
dence of different annotators’ coding units on the same text     that longer documents lead to a worse agreement is not con-
document. 1 expresses maximum agreement, 0 shows that no         firmed here. Although a slight tendency is visible, both the
correlation exists between the units and the classes, and −1     left and right tail of the distribution represent short docu-
symbolizes a uniform disagreement. The calculations were         ments. Accordingly, the content of the marginal documents
carried out with DKPro Agreement [13].                           was analyzed. In documents with poor agreement, it was
  First, for every of the six groups of annotators described     noted that the annotators were often in agreement on im-
in Section 3.1 the groups’ agreement over all 25 documents       portant information. However, the distribution of this infor-
was considered. Table 1 shows the agreement within the           mation into the different key phrases was solved very dif-
annotation groups. Annotation group 2 and 3 obtain the           ferently. Especially in terms of conjunctions like and, or,
                                                                                                                     ”
best agreement having an U α above 0.5. Group 1, 5 and 6         ... “ the annotators were divided. Some annotators split into
agree with a value greater than 0.4. Group 4, however, per-      more granular units than the others. The importance of con-
forms significantly worse achieving only an U α of 0.210. One    text information was also assessed differently. For example,
possible explanation might be the text length of the docu-       in one document a patient described a need for information
ments. The average length of a text document in group 4          against the background of his type 1 illness. He also stated
was 1645.36 characters. The other groups had on average          since when he was affected by the disease. Here, the annota-
shorter texts with at least 400 characters less. The longer a    tors were divided over whether the temporal context is im-
text is, the more descriptive the information requirement is     portant for the formulation of the information need. At this
described. Likewise, increasingly diverting content may oc-      juncture, it should be remembered that the students had no
cur. This makes unitizing key phrases harder. Nevertheless,      domain knowledge in the field of medicine or diabetes, ma-
the remaining groups achieve encouraging Inter-Annotator         king it difficult to make a reasoned decision. Furthermore,
Agreements.                                                      annotation errors were observed. In some annotations, the
                              Annotation Group       1       2        3       4        5       6
                              Uα                     0.409   0.505    0.523   0.210    0.443   0.429

                   Table 1: Showing the Inter-Annotator Agreement by group according to U α.


content doubling of a key phrase has not been re-marked.             on part we focused on the Inter-Annotator Agreement. The
Individual annotators tended to classify the key phrases so          results for phase 1 are promising. Although there is an ob-
finely that the selected key phrases individually could not          vious variance in the data, for almost half of the documents
express a key content of the text.                                   the annotators agreed with at least 0.5, annotation group 4
                                                                     excluded. We observed different types of problems. The lack
3.2.2    Phrase 2 - Key Phrase Grouping and Phase 3                  of subject-specific knowledge was one of the main problems
         - Best Key Phrase Identification                            annotators had to face. A second problem was the different
   Following the recognition of key phrases, the subsequent          view of a key phrase’s granularity level. Finally we detec-
grouping needs to be revised. Due to the dependency on pha-          ted some cases in which the annotators did not concentrate
se 1, it is difficult to assess whether the same groupings have      on the given guidelines producing poor annotations. Pha-
been made. In an optimal scenario, starting from an equal            se 2 and 3 could not be investigated meaningfully as phase
set of key phrases, an Inter-Annotator Agreement could be            1 directly conditions the initial data situation of the other
calculated for the same coding units and a set of classes            phases.
corresponding to the number of key phrase groups.                       These findings lead us to the assumption that, in order
   In order to be at least partially able to analyze how much        to further increase the quality of the data corpus, experts
the annotators in phase 2 agree in their decisions, only do-         need to be taken into account. Healthcare professionals are
cuments with an alpha greater than 0.439 (corresponding              important but likewise social media experts are of interest
to the average mean of phase 1) are considered. Furthermo-           because of the particular vocabulary that is used in forums.
re, documents whose annotators disagree on the number of             Our data set showed very specific expressions that are on-
units are excluded from consideration. With these restric-           ly used in the online context of diabetes. Another fact we
tions, we want to make sure that the same phrases were               must address are the guidelines. These need to be revised
detected in phase 1 allowing a small variation tolerance in          with regard to the annotators insecurities concerning key
order to build a suitable initial data situation for phase 2.        phrase granularity, the rules for key phrase grouping and for
Thus, we can measure the Inter-Annotator Agreement for               choosing a representative key phrase. The annotators also
these documents. As appropriate measures we use simple               reported that the very subjective nature of the texts was
percentage agreement P A on the one hand and Fleiss Kap-             another difficulty.
pa κ [5] on the other hand. We use DKPro Agreement for                  Summarized this first annotation approach on the GD-
the calculation again. Unfortunately, only three documents           WDS achieved promising results, especially in the main pha-
meet the required criteria. For the first of them, all annota-       se, phase 1. As the students had only a short introductory
tors only assigned one key phrase unit. Accordingly, there           class into the annotation task, this approach can be seen
is only one group of key phrases and thus, both P A and κ            as a crowd-sourcing attempt. However, to further increase
are 1. The second document that fulfills the criteria contains       annotation quality, we see an expert-based approach at an
according to phase 1 two key phrase units. Every annotator           advantage. In future work this issue will be addressed. Phase
summarized them into the same group which leads to a per-            2 and 3 should be neglected until phase 1 produces results
fect agreement in terms of both measures. Prevailing phase           with a quality sufficient for the further phases. Nonetheless
3, it is to be noted that also the best nuggets were equal-          the development of appropriate measures for dependent an-
ly chosen. The last document consists of three units. While          notation tasks may be an interesting area of research.
three of four annotators completely agreed in dividing the              If the annotation quality of the GDWDS is ensured, the
units into two groups and making the same assignments, the           actual process of keyphrase extraction can be started. As a
fourth annotator fixed on three groups which finally lead to         first step we plan to apply and evaluate state-of-the-art algo-
a P A of 0.66 and κ = 0.38. The three annotators building            rithms for key phrase extraction on GDWDS. Machine lear-
the same groupings did, however, not agree on the best key           ning algorithms and deep learning approaches are prevalent
phrase per group.                                                    in this field. For example, in [18] an interesting approach to
   Since it is obvious that the quality of the dependent an-         keyword extraction from Twitter using recurrent neural net-
notations can only be analyzed to a limited extent and the-          works is presented. Alternatively, rule-based or graph-based
refore not very meaningful, we will not go further into phase        approaches should be considered. In accordance with the re-
2 and 3 here.                                                        sults it can be evaluated whether these existing techniques
                                                                     can be applied to our problem. Depending on the results,
                                                                     new algorithms may then be developed.
4.   CONCLUSION AND FURTHER WORK
  In this work we presented an annotation study of medi-
cal information needs on a german diabetes data set. Stu-            5.   ACKNOWLEDGMENTS
dent annotators were instructed to detect key phrases, group           We want to thank the student annotators, namely Deniz
them according to similar content and then to find a repre-          Ates, Bashkim Berzati, Christian Born, Markus Brenneis,
sentative key phrase for every group. For this, the students         Nurhan Chahrour, Björn Ebbinghaus, Julia Fischer, Andre-
had to follow guidelines, presented in Section 3.1.                  as Funke, Philipp Grawe, Frederik Grieshaber, Tobias Alex-
  Subsequently, the obtained annotations were evaluated.             ander Hogrebe, Michael Janschek, Moritz Kanzler, Sergej
Since there obviously is no gold standard, in the evaluati-          Korlakov, Daniel Laps, Johannes Müller, Alexander Ober-
straß, Karsten Packeiser, Kevin Robert Pochwyt, Regina         [14] R. D. Ravert, M. D. Hancock, and G. M. Ingersoll.
Stodden, Emil Warkentin, Dennis Weber, Susanna Welzel,              Online forum messages posted by adolescents with
Julian Zenz and Milos Lukas Ziolkowski.                             type 1 diabetes. The Diabetes Educator,
                                                                    30(5):827–834, 2004.
                                                               [15] M. Sokolova and V. Bobicev. Sentiments and opinions
6.   REFERENCES                                                     in health-related web messages. In RANLP, pages
 [1] T. Ali, D. Schramm, M. Sokolova, and D. Inkpen. Can            132–139, 2011.
     i hear you? sentiment analysis on medical forums. In      [16] M. Sokolova and V. Bobicev. What sentiments can be
     IJCNLP, pages 667–673, 2013.                                   found in medical forums? In RANLP, volume 2013,
 [2] V. Bobicev, M. Sokolova, Y. Jafer, and D. Schramm.             pages 633–639, 2013.
     Learning sentiments from tweets with personal health      [17] F. Sudau, T. Friede, J. Grabowski, J. Koschack,
     information. In Canadian Conference on Artificial              P. Makedonski, and W. Himmel. Sources of
     Intelligence, pages 37–48. Springer, 2012.                     information and behavioral patterns in online health
 [3] A. T. Chen. Exploring online support spaces: using             forums: observational study. Journal of medical
     cluster analysis to examine breast cancer, diabetes and        Internet research, 16(1):e10, 2014.
     fibromyalgia support groups. Patient education and        [18] Q. Zhang, Y. Wang, Y. Gong, and X. Huang.
     counseling, 87(2):250–257, 2012.                               Keyphrase extraction using deep recurrent neural
 [4] K. Denecke and W. Nejdl. How valuable is medical               networks on twitter. In Proceedings of the 2016
     social media data? content analysis of the medical             Conference on Empirical Methods in Natural Language
     web. Information Sciences, 179(12):1870–1880, 2009.            Processing, pages 836–845, Austin, Texas, November
 [5] J. L. Fleiss. Measuring nominal scale agreement among          2016. Association for Computational Linguistics.
     many raters. Psychological bulletin, 76(5):378, 1971.
 [6] L. Goeuriot, J.-C. Na, W. Y. Min Kyaing, C. Khoo,
     Y.-K. Chang, Y.-L. Theng, and J.-J. Kim. Sentiment
     lexicons for health-related opinion mining. In
     Proceedings of the 2nd ACM SIGHIT International
     Health Informatics Symposium, pages 219–226. ACM,
     2012.
 [7] J. A. Greene, N. K. Choudhry, E. Kilabuk, and W. H.
     Shrank. Online social networking by patients with
     diabetes: A qualitative evaluation of communication
     with facebook. Journal of general internal medicine,
     26(3):287–292, 2011.
 [8] S. Karimi, A. Metke-Jimenez, M. Kemp, and C. Wang.
     Cadec: A corpus of adverse drug event annotations.
     Journal of biomedical informatics, 55:73–81, 2015.
 [9] K. Krippendorff. On the reliability of unitizing
     continuous data. Sociological Methodology, pages
     47–76, 1995.
[10] X. Liu and H. Chen. Azdrugminer: an information
     extraction system for mining patient-reported adverse
     drug events in online patient forums. In International
     Conference on Smart Health, pages 134–150. Springer,
     2013.
[11] X. Liu and H. Chen. Identifying adverse drug events
     from patient social media: A case study for diabetes.
     IEEE Intelligent Systems, 30(3):44–51, 2015.
[12] C. M. Meyer, D. Benikova, M. Mieskes, and
     I. Gurevych. Mdswriter: Annotation tool for creating
     high-quality multi-document summarization corpora.
     In Proceedings of the 54th Annual Meeting of the
     Association for Computational Linguistics (ACL):
     System Demonstrations, pages 97–102, Berlin,
     Germany, August 2016. Association for
     Computational Linguistics.
[13] C. M. Meyer, M. Miesked, C. Stab, and I. Gurevych.
     Dkpro agreement: An open-source java library for
     measuring inter-rater agreement. In Proceedings of the
     25th International Conference on Computational
     Linguistics: System Demonstrations (COLING), pages
     105–109, Dublin, Ireland, August 2014. Association for
     Computational Linguistics.