=Paper= {{Paper |id=Vol-2342/paper3 |storemode=property |title=Narrative Detection in Online Patient Communities |pdfUrl=https://ceur-ws.org/Vol-2342/paper3.pdf |volume=Vol-2342 |authors=Anne Dirkson,Suzan Verberne,Wessel Kraaij |dblpUrl=https://dblp.org/rec/conf/ecir/DirksonVK19 }} ==Narrative Detection in Online Patient Communities== https://ceur-ws.org/Vol-2342/paper3.pdf
       Narrative detection in online patient communities

                        Anne Dirkson                Suzan Verberne                 Wessel Kraaij
                         Leiden Institute of Advanced Computer Science, Leiden University
                                 Niels Bohrweg 1, 2333 CA Leiden, the Netherlands
                           {a.r.dirkson, s.verberne, w.kraaij}@liacs.leidenuniv.nl




                                                        Abstract
                       Although narratives on patient forums are a valuable source of medical
                       information, their systematic detection and analysis has so far been
                       limited to a single study. In this study, we examine whether psycho-
                       linguistic features or document embeddings can aid identification of
                       narratives. We also investigate which features distinguish narratives
                       from other social media posts. This study is the first to automatically
                       identify the topics discussed in narratives on a patient forum. Our re-
                       sults show that for classifying narratives, character 3-grams outperform
                       psycho-linguistic features and document embeddings. We found that
                       narratives are characterized by the use of past tense, health-related
                       words and first-person pronouns, whereas non-narrative text is associ-
                       ated with the future tense, emotional support words and second-person
                       pronouns. Topic analysis of the patient narratives uncovered fourteen
                       different medical topics, ranging from tumor surgery to side effects. Fu-
                       ture work will use these methods to extract experiential patient knowl-
                       edge from social media.




1    Introduction
Nowadays, online patient forums are the main medium by which patients exchange their narratives. These
narratives mainly recount their own experiences with their condition. As such, they contain experiential knowl-
edge [Bor76], defined as the knowledge that patients gain from their own experiences. In recent years, such
experiential knowledge has increasingly been recognized as valuable and complementary to empirical knowledge
[CBC+ 13]. Consequently, more health-related applications are making use of patient forum data, for instance to
track public health trends [SOG+ 16] and to detect adverse drug responses [SGN+ 15]. Experiential knowledge is
also valuable for patients themselves: patients indicate that they strongly rely on experiences and information
provided on patient forums [SHBL16]. This is especially true for patients with a rare disease, for which medical
professionals often lack expertise and the number of studies is limited [AKG08].
   To understand the experiential knowledge on patient forums, forum posts that contain narratives must first be
identified. As of yet, research into systematically distinguishing patient narratives on patient forums is limited
to a single study on Dutch forum data [VBSEng], which uses words as only features. We expand upon this work
using a different data set by examining whether document embeddings and psycho-linguistic features can improve
the identification of patient narratives. We expect so, because these aggregated features are less dependent on

Copyright c 2019 for the individual papers by the paper’s authors. Copying permitted for private and academic purposes. This
volume is published and copyrighted by its editors.
In: A. Jorge, R. Campos, A. Jatowt, S. Bhatia (eds.): Proceedings of the Text2StoryIR’19 Workshop, Cologne, Germany, 14-April-
2019, published at http://ceur-ws.org
individual terms, which may overlap significantly between narratives and factual statements about the same
topic. Secondly, we explore how narratives differ from other types of posts by studying which features are
influential in identifying narratives and which posts are classified incorrectly. Thirdly, we analyze how prevalent
narratives are on a cancer patient forum and which topics these narratives discuss.

2     Related Work
Narratives on patient forums have mainly been studied qualitatively (e.g. [vUKDT+ 09]). The automatic identi-
fication of narratives on a patient forum is limited to the study by Verberne et al. [VBSEng] on a Dutch cancer
forum. They identified narratives with a F1 of 0.911 using only the lower-cased words of the posts as features.
They also found that various linguistic factors (1st person singular, 3rd person and negations) and psychological
processes (social processes and religion) were correlated with the presence of narratives. These psycho-linguistic
features were measured using the Linguistic Inquiry and Word Count (LIWC) method [TP10].
   Additionally, research into self-reported adverse drug responses (ADRs) has led to the development of classifiers
for differentiating between factual statements of ADRs and personal experiences of ADRs on social media [BY12,
NSO+ 15, SG15]. However, these classifiers are highly specific and thus not suitable for identifying patient
narratives in general.
   Another closely related field is the classification of personal health mentions on social media i.e. posts that
mention a person who is affected as well as their specific condition, such as: ‘my granddad has Alzheimer’s’.
Presently, only two studies have investigated this task. The first by Lamb et al. [LPD13] focused on separating flu
awareness from actual flu reports on social media. More recently, Karisani et al. [KA18] introduced WESPAD,
a classifier for personal health mentions, which attains state-of-the-art performance for seven different health
domains including stroke, depression and flu infection. Nonetheless, a personal health mention alone is not
sufficient to consider the post a narrative, and thus these classifiers are also inadequate for our purpose.

3     Methods
3.1     Data
Our data consists of an open, international Facebook forum for patients with Gastrointestinal Stromal Tumor
(GIST)1 . It is moderated by GIST Support International and consists of 36,722 posts with a median length of
20 tokens.

3.2     Preprocessing
The data was lowercased and tokenized with NLTK. Due to the noisy nature of user-generated content, especially
in the spelling of medical terms, we applied a tailored preprocessing pipeline2 to our data. Firstly, an existing
normalization pipeline for social media [Sar17]3 was used to normalize tokens to American English and to
expand generic abbreviations used on social media. Hereafter, domain-specific abbreviations were expanded
with a lexicon of 42 non-ambiguous abbreviations, generated based on 1000 posts and annotated by a domain
expert and the first author. Spelling mistakes were detected using a combination of relative frequency and edit
distance to possible candidates and corrected using weighted Levenshtein distance. Correction candidates were
derived from the corpus itself. Drug names were normalized using the RxNorm database [Nat]. Non-English
posts were removed using langid [LB12]. Punctuation was removed, but stop words were not, as we expect
function words to play a role in the expression of narratives.

3.3     Supervised classification
3.3.1     Manual annotation of example data
We randomly selected 1050 posts for annotation. The annotators were asked to indicate per message whether it
contains a personal experience. They were not provided with its context. Personal experiences did not need to
be about the author but could be about someone else. This definition was based on earlier work by Verberne et
al. [VBSEng] and van Uden-Kraan et al. [vUKDT+ 08]. The first 50 posts were annotated individually by the
    1 https://www.facebook.com/groups/gistsupport/
    2 The preprocessing scripts can be found at: https://github.com/AnneDirkson/LexNorm
    3 https://bitbucket.org/asarker/simplenormalizerscripts
first author and another PhD student to improve the annotation guidelines.4 The remaining 1000 posts were
divided equally into six sets of 200 posts, with 40 posts (20%) overlapping between all sets. The overlap was used
to calculate the pairwise Cohen’s kappa. There were seven annotators in total: six PhD students and one GIST
patient. Each sample was assigned to an annotator, apart from one sample which was divided between two PhD
students. To be able to include the overlapping sample in the classification, we opted to use the annotations of
the GIST patient for these 40 posts.5


3.3.2    Feature sets

Four feature sets were derived from the text data: word unigrams, character n-grams (using the CountVectorizer
function in sklearn), psycho-linguistic features, and document embeddings. For both word unigrams and char-
acter n-grams, we investigated whether TF-IDF weighting would improve performance compared to raw counts.
Additionally, we explored whether stemming or lemmatising the data prior to extracting the unigrams could
improve performance. Psycho-linguistic features were based on the LIWC 2015 [TP10]. Punctuation categories
were discarded, resulting in 82 LIWC features in total. LIWC is a well-known method for investigating psycho-
logical processes in text and includes both linguistic (e.g. first-person pronouns) and psychological categories
(e.g. positive emotions). The last feature set consisted of document embeddings: a doc2vec model [LM14] was
trained on the labeled training data for each fold in the cross-validation. We combine a distributed memory
model with a distributed bag of words model, as recommended by Le and Mikolov [LM14]. We also attempted
to train document embeddings first on the unsupervised data and then re-train on the supervised data, but this
led to nonsensical classification features.


3.3.3    Supervised classification algorithms

Classifiers were evaluated separately for each feature set. We ignored all posts that had been left empty by
the annotator (the annotator chose neither yes nor no): three posts were ignored for this reason. For word
unigrams, character n-grams and psycho-linguistic features, we compared four sklearn classification algorithms:
Multinomial Naive Bayes (MNB), linear Support Vector Classification (LinearSVC), Stochastic Gradient Descent
(SGD) with log loss, and K Nearest Neighbours (KNN). These were chosen according to the following criteria: (1)
known to perform well on text data, (2) recommended for small data sets and (3) able to calculate probabilistic
outcomes. The latter enabled us to use probabilistic ensembles. The doc2vec representations combined with
Logistic Regression were used as classifier in itself: the document representations were tagged with the labels
of the training data. This model was then used to derive vector representations for new documents. To test
if a combination of feature types could improve performance, we evaluated soft voting (argmax of the sums of
the predicted probabilities) of the best individual classifiers for the best performing variants of each feature set.
Significance testing was done with pair-wise t-tests.
   To evaluate the performance, the average F1 score of a 10-fold cross validation was used. For each run,
hyper-parameters were tuned for that specific training set using a 10-fold grid search on the training data. The
tuning grids were based on sklearn documentation: C from 10-3 to 103 (steps of x10) for LinearSVC and Logistic
Regression; number of neighbors from 3 to 11 (steps of 2) for KNN; and max iterations from 2 to 2048 (steps of
x2) and alpha from 10-8 to 10-2 (steps of x10) for SGD. The dimensionality of the document vectors was tuned
with a grid of 100 to 400 (steps of 100).


3.4     Topic modelling of the whole data set

To label the remaining data, the best performing classifier was used with the hyper-parameter settings that were
optimal in the majority of the training sets. To investigate which topics are discussed in the patient narratives, we
used topic modelling with non-Negative Matrix Factorization of the TF-IDF weighted tokens without stopwords.
Topic coherence, measured using TC-W2V [OGCC15], was used to select the number of topics. Topic labels
were assigned manually by exploring the words with the highest weights and the top-ranked (i.e. most relevant)
messages per topic.

  4 The annotation guidelines can be found at: https://github.com/AnneDirkson/NarrativeFilter
  5 The annotated data is available upon request in order to protect the privacy of the patients
4     Results
4.1    Annotated data
The data was slightly imbalanced, with 37.7% of the posts containing a narrative, resulting in a majority baseline
of roughly 0.62. The inter-annotator agreement was substantial (κ = 0.69).

4.2    Classifier evaluation
A Linear SVC on character 3-grams achieves the highest F1 score (Table 1), although character 4-grams (p =
0.526), stemmed unigrams (p = 0.930) and lemmatised unigrams (p = 0.587) do not perform significantly worse.
Character 5- and 6- grams also do not perform worse overall (p = 0.122 and p = 0.169), but their recall is
significantly lower (p = 0.023 and p = 0.029). The classifiers for the best performing document embeddings
(DBOW+DM) and psycho-linguistic features, however, are significantly worse overall than character 3-grams
(p = 0.0055 and p = 0.026 respectively). Employing TF-IDF weighting does not aid any of the unigram or
character n-gram features. Additionally, neither feature selection (F1 =0.761) nor word boundaries (F1 =0.796)
improve the performance of character 3-grams. Using a range of character n-grams, namely 3-to-4 (F1 =0.814),
3-to-5 (F1 =0.814), or 3-to-6 (F1 =0.812), also does not boost performance.
   Ensemble classification did not perform better than character 3-grams alone (see Table 2). Nevertheless, an
ensemble of all four feature types is significantly more precise than all other classifiers (p = 0.0048 compared to the
second best). To further explore why ensemble classification does not manage to improve overall performance,
we investigated the predictions of individual classifiers. As can be seen in Table 3, there is a high degree of
overlap between the predictions based on character 3-grams and the other feature sets (88.3%, 83.8% and 84.4%
respectively). Consequently, the vast majority of the predictions cannot be improved by complementing character
3-grams with these feature sets. Interestingly, 4.7% of the posts are misclassified by all feature sets. Considering
the non-overlapping predictions, the percentage of correct predictions was higher for character 3-grams than
for either document embeddings or psycho-linguistic features in a pair-wise comparison. Thus, it appears that
adding these features would be more detrimental than beneficial to narrative classification.
                   Table 1: Mean Test Score (10-fold CV) For Best Classifiers Per Feature Set

    Feature set                               Size   Classifier       F1 (SD)         Recall (SD)     Precision (SD)
                          Original           4,078   SGD           0.795 (0.025)     0.788 (0.074)     0.811 (0.055)
    Unigrams              Stemmed            3,205   SGD           0.814 (0.031)     0.793 (0.047)     0.840 (0.049)
                          Lemmatised         3,777   SGD           0.808 (0.039)     0.810 (0.059)     0.813 (0.070)
                          3-grams            5,086   SVC          0.815 (0.035)     0.844 (0.047)      0.793 (0.058)
                          4-grams           16,496   SVC           0.811 (0.027)     0.827 (0.068)    0.844 (0.029)
    Character n-grams
                          5-grams           36,349   SGD/SVC       0.796 (0.023)     0.784 (0.059)     0.817 (0.069)
                          6-grams           60,443   SGD           0.793 (0.040)     0.797 (0.042)     0.795 (0.079)
    LIWC                                        82   SVC           0.773 (0.031)     0.805 (0.044)     0.752 (0.077)
                          DBOW                 400   LogReg        0.737 (0.029)     0.751 (0.056)     0.735 (0.066)
    Doc2vec               DM                   400   LogReg        0.762 (0.039)     0.749 (0.062)     0.785 (0.070)
                          DM+DBOW              800   LogReg        0.772 (0.037)     0.803 (0.064)     0.749 (0.055)

           Table 2: Mean Test Score (10-fold CV) For Ensemble Classification. * DM+DBOW variant.

 Feature sets                                                          F1 (SD)         Recall (SD)    Precision (SD)
 3-grams + LIWC + Doc2vec* + Stemmed Unigrams                       0.770 (0.029)     0.703 (0.065)   0.859 (0.053)
 3-grams + LIWC + Doc2vec*                                         0.795 (0.037)     0.772 (0.072)     0.829 (0.065)
 3-grams + LIWC                                                     0.706 (0.032)     0.624 (0.059)    0.828 (0.073)
 3-grams + Doc2vec*                                                 0.755 (0.048)     0.735 (0.089)    0.786 (0.040)

       Table 3: Comparison of Predictions of Classifiers for Different Feature Sets. * DM+DBOW variant.

                                                                 Both                          Difference
                         Compared to                   Correct(%) Incorrect(%)      In Favour of      In Favour of
                                                                                    3-grams(%) Other Method(%)
                         LIWC                             75.0           8.8            8.4               7.7
 Character 3-grams
                         Doc2Vec*                         74.8           9.6            8.6               6.9
                    you                                                           my                              you                                                                            _i_
                    are                                                           wa                              _yo                                                                            my_
                   your                                                           had                             ou_                                                                            _ha
                    will                                                          imatinib                        ur_                                                                            _my
                   drug                                                           it                              our                                                                            was
                   may                                                            year                             re_                                                                           as_
   Non-narrative




                                                                                                     Non-narrative
                prayer                                                            and                             her                                                                            had
                 beauti                                                           have                            _wi                                                                            ad_




                                                                                         Narrative




                                                                                                                                                                                                        Narrative
                patient                                                           now                             are                                                                            i_h
                    gist                                                          take                             _ar                                                                           ed_
                     we                                                           surgeri                         _he                                                                            _no
                    our                                                           but                             _so                                                                            _wa
                   sorri                                                          been                             for                                                                           _an
                     us                                                           on                               wil                                                                           d_i
                   walk                                                           tumor                          _ma                                                                             ini
                  peopl                                                           back                             r_y                                                                           ut_
                   here                                                           in                              pra                                                                            _ye
                  need                                                            they                            ray                                                                            en_
                      or                                                          get                               ist                                                                          nib
                 would                                                            after                           ere                                                                            mor
                            −0.2 −0.1 0.0          0.1     0.2   0.3    0.4                                                     −0.03 −0.02 −0.01 0.00             0.01       0.02      0.03
                                                Weights                                                                                                Weights
                                       (a) Stemmed Unigrams                                                                                    (b) Character 3-grams
                                         Function words            Social
                                         Biological processes      Cognitive Processes
                                         Time Orientation          Informal language
                                         Affect
                   Positive emotions                              Body                                     group                                                                      scan
                            Insight                               Past Focus                                 you                                                                      year
                         Negations                                They                                        join                                                                    surgery
                              Hear                                She/He                                    your                                                                      imatinib
                                We                                Causal                                  happy                                                                       had
                                You                               Negative emotions                    message                                                                        weeks
                                                                                    Non-narrative
Non-narrative




                            Leisure                               Quantifiers                               keep                                                                      night




                                                                                                                                                                                            Narrative
                                Sad                               Motion                               company                                                                        lung
                                                                                     Narrative


                           Religion                               Health                                      lots                                                                    his
                                   I                              Reward                                  button                                                                      two
                            Adverb                                Feel                                     share                                                                      clear
                    Interrogatives                                Time                                       god                                                                      showed
                         Relativity                               Prepositions                              pray                                                                      was
                             Friend                               Male                                  services                                                                      told
                               Verb                               Nonfluencies                       interesting                                                                      400mg
             Impersonal pronouns                                  Biological processes                       free                                                                     13
                      Future focus                                Function words                         support                                                                      remove
                     Conjunctions                                 Assent                                     find                                                                     three
                    Differentiation                               Anger                                     click                                                                     2nd
            Dictionary word count                                 Authenticity                            posted                                                                      took

                                       −0.025 0.000 0.025 0.050                                                           0.0       0.5          1.0   0.0         0.5          1.0
                                               Weights                                                                            Similarity                     Similarity

                                           (c) LIWC                                                                                   (d) Doc2vec DM model

            Figure 1: The 20 Most Influential Features In Individual Classifiers. In (b) underscores represent spaces.

 4.3               Influential features
 Narratives are typically distinguished by terms relating to the past tense (was, had, years), health (imatinib,
 tumor, surgeri ) and first-person narrative (my, i ) (see Figure 1). This is corroborated by the character 3-grams,
 psycho-linguistic features and document embeddings. Some of the important terms for non-narrative texts are
 also health-related (patients, gist) and first-person narrative (we, us), which showcases the difficulty of the task
 at hand. In general, non-narrative texts seem to focus more on emotional support (prayer, share, may), second-
 person narrative (you, your ) and the future (may, will ). The psycho-linguistic features additionally reveal that
 narratives contain more mentions of causality and negative emotions. In contrast, non-narrative texts seem to
 contain more positive emotions. Lastly, as predicted, function words appear important for classifying narratives
 in social media, and it is thus advisable to not remove stopwords.

 4.4               Error analysis for the best performing classifier
 Error analysis reveals that a significant proportion of the errors is due to incorrect annotation: 36.9% of the false
 positives and 36.2% of the false negatives were labelled incorrectly (see Table 4). Specifically, annotators have
 difficulty correctly labelling discussions about personal medical facts or side effects as narratives (e.g ‘i have been
 on imatinib 5 months and lost 1/3 of my hair’ ). Conversely, annotators may incorrectly judge posts that give
 emotional support, external information or advice to be narratives while they are not (e.g. ‘i may be wrong but
             Table 4: Error Analysis for best classifier (Character 3-gram Classification of Narratives)

                     False positives                                                 False negatives
 Reasons for misclassification                     Frequency     Reasons for misclassification                     Frequency
 Mislabelling                                          24        Mislabelling                                          17
 Emotional support/thanks                              15        Unknown                                               12
 Information/advice                                    13        Lack of context                                        7
 Lack of context                                        7        Question                                               5
 Question                                               4        Non-medical narratives                                 3
 Unknown                                                1        Hypothetical                                           1
 Empty post                                             1        Empty post                                             2
 TOTAL                                                 65        TOTAL                                                 47

total gastrectomy sounds very extreme for two small gist’ ).
   The incorrect labelling may have impacted the automated classification such that these categories are also more
difficult for the computer to distinguish. The classifier does, however, appear to outperform human judgement
and to some extent ‘correct’ their mistakes. In fact, its performance may be underestimated by the metrics based
on these incorrect labels. Other types of posts that appears challenging for the computer are posts that lack
context or contain questions. The former are often answers to unknown questions posed earlier in the thread.

4.5     Frequency and content of patient narratives
4.5.1     Automated narrative detection in unsupervised data
The percentage of narratives in the unlabelled data is 37.0 %, which is comparable to the annotated sample.
This results in a total of 13.436 posts for topic modelling.6

4.5.2     Topic modelling
The TC-W2V metric [OGCC15] identifies the optimal number of topics to be fourteen. The resulting topics
relate to different aspects of the medical process for GIST patients (see Table 5). Note that imatinib is the most
commonly used medication.


5     Discussion
The detection of narratives was most optimal when using character 3-grams. Their strength is in their ability
to cluster relevant word types based on suffixes and prefixes. This is especially relevant in the medical domain
e.g. all cancer medication for GIST ends in ‘nib’. In contrast, psycho-linguistic features appear to suffer from
oversimplification, because they aggregate words that define different classes into one category e.g. we and my
into the umbrella category of first person pronouns (see Figure 1). The use of document embeddings may have
been hampered by the small size of the data. An alternative explanation could be that incorrect labelling impacts
these features more strongly than word-based features.
   Narratives could be differentiated most strongly by their use of past tense, first-person narrative and health-
related words. The first two are in line with linguistic definition of a narrative. The stronger focus on health,
however, may indicate that patients prefer to share their own health experiences than health information from
external sources.
   Annotating narratives appears a challenging task, despite providing annotators with a guideline based on
previous work [VBSEng] and validated through initial annotation by two annotators. This is underscored by our
inter-annotator agreement (κ = 0.69) which was comparable to that of Verberne et al. [VBSEng] (κ = 0.71).
Our classifier performed less well that their system (F1 = 0.91), which may be explained by their larger sample
of annotated data (2.051 posts).
   Inevitably, our results depend on the choice of what constitutes a narrative and how annotators interpret this
definition. It appears that especially the line between a medical fact about oneself and a medical experience is
fuzzy for annotators. Future studies could perhaps use this knowledge to develop clearer guidelines.
    6 The code for unsupervised narrative filtering is shared at: https://github.com/AnneDirkson/NarrativeFilter
Table 5: Most Important Topics Discussed In Patient Forum Narratives. Topic labels were assigned manually.
* Cancer medication

 Topic labels            Top 10 words                                  Top-ranked post for the topic
 Tumor location          tumor stomach removed liver small cm          ‘i only had one tumor on my stomach’
                         mitotic metastases rate intestine
 (Emotional) Coping      take get time doctor like also know ima-      ‘i completely understand i started 400 imatinib after
                         tinib* day would                              surgery in and have lots of bad days [...]’
 Duration of Treat-      years imatinib* almost ago 10 taking          ‘about 1 and 1/2 years’
 ment                    two still 11 12
 Types of Scans          scan ct pet results next today last           ‘oops one is a ct scan and one is a pet scan’
                         showed week cat
 Diagnosis of GIST       gist diagnosed cancer specialist oncolo-      ‘that was my gist’
                         gist husband anyone ago surgeon found
 Other Medication        sunitinib* regorafenib* sorafenib* ima-       ‘i have this on sunitinib’
                         tinib* working 37 exon nilotinib* trial
                         stopped drug
 Side Effects            side effects imatinib* effect different fa-   ‘and no side-effects’
                         tigue eyes bad 400mg time
 Tumor Surgery           surgery remove since weeks first post         ‘just had surgery’
                         surgeon second shrink done
 Absence of Tumor        disease evidence still years today post       ‘no evidence of disease no evidence of disease’
 Recurrence              since resection year far
 Recurrence of Work,     back came come hair go went weeks             ‘i started imatinib after i went back to work’
 Medication or Tumor     took coming lost
 Emotional support       good luck news best far hope bad goes         ‘all my best and good luck’
                         well keep pretty
 Dosage of Medication    mg 400 800 imatinib* 600 take day tak-        ‘11 years of imatinib since 2003 at 600 mg and since
                         ing since started                             november 2009 at 800 mg [...]’
 Timing of Scans         months every scans three ct six year          ‘my doctor said 3 years’
                         two first month
 Ingesting imatinib      one year last took imatinib* day an-          ‘take imatinib’
                         other old got time

6   Conclusion
For the detection of patient narratives on social media, psycho-linguistic features and document embeddings are
outperformed by character 3-grams. These narratives are associated with the past tense, health and first-person
pronouns, whereas non-narrative text is associated with the future tense, emotional support and second-person
pronouns. The patient narratives could be subdivided into discussions of fourteen different medical topics,
ranging from surgery to side effects. Future work will develop automated methods for the extraction of patient
knowledge from the narratives.

7   Acknowledgements
This work was financed by the SIDN fonds. The authors also thank H. Vos, G. Wiggers, W. Verschoof, A.
Brandsen, D. Gawehns, P. Dhar, M. Vinkenoog and G. van Oortmerssen of Leiden University for annotating the
data.

References
[AKG08]         Ségolène Aymé, Anna Kole, and Stephen Groft. Empowerment of patients: lessons from the rare
                diseases community. The Lancet, 371(9629):2048–2051, 2008.
[Bor76]         Thomasina Borkman. Experiential Knowledge: A New Concept for the Analysis of Self-Help
                Groups. Social Service Review, 50(3):445–456, 1976.
[BY12]          Jiang Bian and Fan Yu. Towards Large-scale Twitter Mining for Drug-related Adverse Events. In
                SHB12, pages 25–32, 2012.
[CBC+ 13]    Pam Carter, Roger Beech, Domenica Coxon, Martin J. Thomas, and Clare Jinks. Mobilising the
             experiential knowledge of clinicians, patients and carers for applied health-care research. Contem-
             porary Social Science, 8(3):307–320, 2013.
[KA18]       Payam Karisani and Eugene Agichtein. Did you really just have a heart attack? Proceedings of
             the 2018 World Wide Web Conference on World Wide Web - WWW 18, 2018.
[LB12]       Marco Lui and Timothy Baldwin. langid.py: An Off-the-shelf Language Identification Tool. In
             Proceedings of the 50th annual meeting of the association of computational linguistics, pages 25–30,
             2012.
[LM14]       Quoc Le and Tomas Mikolov. Distributed Representations of Sentences and Documents. In
             Proceedings of the 31st intrenational conference on machine learning, 2014.
[LPD13]      Alex Lamb, Michael J Paul, and Mark Dredze. Separating Fact from Fear: Tracking Flu Infections
             on Twitter. In Proceedings of NAACL-HLT, pages 789–795, 2013.
[Nat]        National Library of Medicine (US). Rxnorm.
[NSO+ 15]    Azadeh Nikfarjam, Abeed Sarker, Karen O’Connor, Rachel Ginn, and Graciela Gonzalez. Phar-
             macovigilance from social media: mining adverse drug reaction mentions using sequence labeling
             with word embedding cluster features. Journal of the American Medical Informatics Association:
             JAMIA, 22(3):671–81, 2015.
[OGCC15]     Derek O’Callaghan, Derek Greene, Joe Carthy, and Pádraig Cunningham. An analysis of the
             coherence of descriptors in topic modeling. Expert Systems with Applications, 42(13):5645–5657,
             2015.
[Sar17]      Abeed Sarker. A customizable pipeline for social media text normalization. Social Network Analysis
             and Mining, 7(45), 2017.
[SG15]       Abeed Sarker and Graciela Gonzalez. Portable automatic text classification for adverse drug
             reaction detection via multi-corpus training. Journal of Biomedical Informatics, 53:196–207, 2015.
[SGN+ 15]    Abeed Sarker, Rachel Ginn, Azadeh Nikfarjam, Karen O‘Connor, Karen Smith, Swetha Jayara-
             man, Tejaswi Upadhaya, and Graciela Gonzalez. Utilizing social media data for pharmacovigilance:
             A review. Journal of Biomedical Informatics, 54:202–212, 2015.
[SHBL16]     Edin Smailhodzic, Wyanda Hooijsma, Albert Boonstra, and David J. Langley. Social media use
             in healthcare: A systematic review of effects on patients and on their relationship with healthcare
             professionals. BMC Health Services Research, 16(1):442, 2016.
[SOG+ 16]    Abeed Sarker, Karen O‘Connor, Rachel Ginn, Matthew Scotch, Karen Smith, Dan Malone, and
             Graciela Gonzalez. Social Media Mining for Toxicovigilance: Automatic Monitoring of Prescription
             Medication Abuse from Twitter. Drug Safety, 39(3):231–240, 2016.
[TP10]       Yla R. Tausczik and James W. Pennebaker. The psychological meaning of words: LIWC and
             computerized text analysis methods. Journal of Language and Social Psychology, 29(1):24–54,
             2010.
[VBSEng]     Suzan Verberne, Anika Batenburg, Remco Sanders, and Mies Van Eenbergen. Social processes of
             online empowerment on a cancer patient discussion form: using text mining to analyze linguistic
             patterns of empowerment processes. JMIR Cancer, Forthcoming.
[vUKDT+ 08] Cornelia F. van Uden-Kraan, Constance H. Drossaert, Erik Taal, Bret R Shaw, Erwin R. Seydel,
            and Mart A. F. J. van de Laar. Empowering processes and outcomes of participation in online
            support groups for patients with breast cancer, arthritis, or fibromyalgia. Qualitative Health
            Research, 18(3):405–417, 2008.
[vUKDT+ 09] Cornelia F. van Uden-Kraan, Constance H.C. Drossaert, Erik Taal, Erwin R. Seydel, and Mart A.
            F. J. van de Laar. Participation in online patient support groups endorses patients’ empowerment.
            Patient Education and Counseling, 74(1):61–69, 2009.