<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>frobert.bevan, alessandro.torrisi, danushka, coenen, k.m.atkinsong@liverpool.ac.uk</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Robert Bevany</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alessandro Torrisiy</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danushka Bollegalay</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Frans Coeneny</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Katie Atkinsony</string-name>
        </contrib>
      </contrib-group>
      <fpage>50</fpage>
      <lpage>54</lpage>
      <abstract>
        <p />
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The number of medical negligence claims filed in
the UK each year has increased significantly over
the past decade [NHS, 2018]. When filing a
medical negligence claim, electronic health records act
as a legally valid important source of evidence.
Patients often undergo different and complex
treatments over many months or years, easily
resulting in hundreds of pages of electronically available
medical records. Therefore, it is a non-trivial task
to read all the related electronic health records and
identify the supporting evidence to establish a
legal case. Currently, the process of identifying
evidence is carried out by humans who are experts
in both medical negligence law and medicine. In
this paper, we compare different methods of
automatically extracting relevant statements from
medical negligence claim texts, to move towards
building a method for extracting relevant sections from
electronic health records with the aim of
expediting the litigation process and reducing the
manual efforts involved. Specifically, we annotate a
dataset containing medical negligence claim texts
and train conditional random field (CRF) and long
short-term memory (LSTM) network models for
extracting information relevant to cases. Our
evaluation shows that each model class has its merits in
this task: the CRF models were significantly more
effective in identifying full sequences, while the
LSTMs were significantly better at assigning tags
to tokens. We found both approaches were able to
identify information that is key to the litigation
process.
Medical negligence claims are a significant source of
litigation. For example, in 2018, the national health service (NHS)
in the United Kingdom reported that it paid GBP 1,623
million as compensation for 10,637 claims [NHS, 2018]. Acts of
medical negligence can vary in complexity as well as
severity. Finding the reasons behind medical negligence acts is
important in order to prevent such unfortunate events in the
future [Toyabe, 2012]. Moreover, in the event where a patient
University hospital mistakenly amputated my left leg
despite the fact the cancer was confined within my right
leg. I will now need to undergo another leg amputation
and will be confined to a wheelchair for the rest of my life.
(or a legal representative acting on behalf of a patient), would
like to prosecute the health care provider for medical
negligence, a legal case must be filed based on medical evidence.
An important source of medical evidence for such prevention
efforts or litigation processes is the electronic health records
describing the various treatments undergone by the patient,
the medication prescribed for the patient, and their medical
history. The volume of electronic health records for a
single patient can be significant. It is not uncommon for a
patient to be subjected to medical treatment for many months,
if not years, and typically a much smaller set of relevant
evidence supporting the medical negligence case must be
identified from this vast amount of information. Furthermore,
filtering electronic health records according to the date of the
alleged negligent act is not sufficient when building a body of
evidence due to the non-contiguous distribution of evidence
contained within the records. For example, negative patient
outcomes may occur years after an initial negligent act,
therefore filtering records by date may result in evidence being
discarded.</p>
      <p>The existing process for identifying supporting evidence
from electronic health records is a manual one. Humans
who are knowledgeable in both medical negligence law and
medicine must manually read a collection of medical records
and then carefully select parts that can be used as evidence
in the litigation process. Needless to say, this is both a time
consuming and a costly process. Moreover, the number of
individuals possessing both legal and medical background
knowledge is small, which means a limited number of
medical records can be read and analysed over a given period of
time. These drawbacks in the existing pipeline for
extracting evidence call for automatic methods that can efficiently
“read” large quantities of medical records and accurately
extract the relevant evidence.</p>
      <p>In this paper, given medical negligence claim texts, we
compare methods of automatically extracting expressions that
are relevant to the medical negligence case: the alleged
negligent acts, and any consequential negative patient outcomes.
This can be useful in helping lawyers quickly establish the
key elements of the case, and we conjecture this will be
useful as part of a system for automatically extracting supporting
evidence from medical records.</p>
      <p>Specifically, first we manually annotate a set of medical
negligence claim texts, identifying any statements of
negligent acts and any consequential negative patient outcomes.
An example is shown in Figure 1, where text relating to
negligent acts and negative outcomes are highlighted in red
and blue respectively. Next, we train a Conditional
Random Field (CRF) [Lafferty et al., 2001] model for
predicting BIO (Begin-Inside-Outside) tags for extracting sequences
of tokens in texts belonging to the previously described
categories. We use different types of features such as Part of
Speech (POS), typography, and medical lexicons. One
issue we encounter in this approach is the data sparseness –
the limited overlap of the tokens between the training and
testing data. To overcome this data sparseness issue, we use
pre-trained word embeddings and automatically append
training instances with related features that did not appear in the
original training instances. Our experimental results show
that this feature augmentation approach successfully
overcomes the data sparseness problem. Finally, we train various
Long Short-Term Memory (LSTM) networks [Hochreiter and
Schmidhuber, 1997] for the same task. We experiment with
both regular and Bidirectional LSTMs (BiLSTMs), and make
use of both word and character level features.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>Information extraction has a long and established history as a
task in NLP. In Named Entity Recognition (NER) [Shen et al.,
2018; Kuru et al., 2016; Ritter et al., 2011; Guo et al., 2009;
Rud et al., 2011], the goal is to extract mentions of named
entities such as people, locations, organisations, products
etc. It has been reported that over 70% of web search
queries contain some form of a named entity [Guo et al.,
2009]. Therefore, being able to recognise named entities
enables us to find more relevant results in information
retrieval. Relation Extraction (RE) [Mandya et al., 2017;
Miwa and Bansal, 2016] further extends this process by
identifying the semantic relations that exist between two or more
recognised named entities. For example, a competitor
relation can exist between two companies, which can later
transform into an acquisition relation. In medical contexts,
identifying the adverse reactions associated with drugs (ADRs)
from formal reporting tools, such as the Yellow Card
System, or more informal reporting methods, such as social
media, has received wide attention [Bollegala et al., 2018;
Sloane et al., 2015].</p>
      <p>Our problem: extracting litigation relevant statements from
medical negligence case texts, can be seen as a specific
instance of the above-described information extraction
problem. However, there are some important properties in our
case, which differentiate it from the more popular
information extraction problems such as NER, RE or ADR
extraction. First, compared to, for example, named entities,
evidence related to medical negligence tends to comprise longer</p>
      <sec id="sec-2-1">
        <title>Generic</title>
        <p>word
word suffixes
is upper case
is title
is digit
POS tag
POS tag suffix
is first word
is last word</p>
      </sec>
      <sec id="sec-2-2">
        <title>Statement type negligent act negative outcome</title>
      </sec>
      <sec id="sec-2-3">
        <title>Count</title>
        <p>sequences. For example, the evidence extracted in Figure 1
contains the sequence of words mistakenly amputated my left
leg. Second, unlike relations or entities, it is non-obvious
how to classify negligence related evidence into categories.
This becomes problematic when generalising the extraction
rules from one domain to another. To the best of our
knowledge, the problem of extracting medical negligence related
evidence from free text data has not been studied before.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Evidence Extraction</title>
      <p>CRFs and LSTMs are two classes of models that perform
well, and are often employed, in a range of sequence labelling
tasks [Huang et al., 2015; SHI et al., 2015; McCallum and
Li, 2003]. Both model classes are able to leverage historical
and future sequence information when classifying the current
sequence element. This makes them well suited to natural
language processing tasks. One advantage LSTMs have over
CRFs is their ability to learn feature representations that are
specific to the task at hand. We employ both model classes
in this work and compare their performance in the task of
identifying negligent acts and consequential negative patient
outcomes from medical negligence claim texts.</p>
      <p>The dataset used in this evaluation comprises 2014
medical negligence claim summary texts collected by a law firm
operating in the medical negligence domain. These texts
contain statements describing negligent acts as well as any
consequential negative patient outcomes (Figure 1). The texts
were annotated by a domain expert with BIO tags delineating
negligent act statements and consequential negative patient
outcome statements. Table 1 shows some dataset statistics.
Due to the confidential nature of this dataset, we are unable
to share it publicly.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Experiments</title>
      <p>CRF models were trained using various combinations of the
features listed in Table 2. The features listed in the
lefthand column are common to most text tagging tasks. Those
LSTM settings
listed in the middle column were introduced to address the
problem of data sparseness. The similar word features
require further explanation; these were generated using
pretrained GloVe [Pennington et al., 2014] embeddings: given
a word, the N words with the highest cosine similarity were
included as additional features; the value for N was varied
(N=f1..10g). Similar word suffix features were also
experimented with. The features in the right-hand column are
domain specific. For example, it was observed that negligent act
statements are often present in the first sentence of a claim
text. Also, negligent act statements frequently contain
medical terminology. The listed features were computed for each
token in each sequence as well as the preceding and
following tokens. All CRF models were trained using the
sklearncrfsuite Python package [Korobov, 2017]. The following
hyper-parameters were tuned using a randomised search over
50 iterations: the Elastic net regularisation coefficient, the
minimum feature frequency, and the possible state and
transition features.</p>
      <p>We experimented with various LSTM configurations (see
Table 3). The baseline LSTM comprised a 50-dimensional
word embedding input, a single LSTM layer of 16 hidden
units, and a softmax output. This model was trained both with
random and pre-trained GloVe word embedding initialisation.
A bi-directional variant of the baseline LSTM was also
experimented with. In addition, the baseline model was extended
to include character-level features. This was achieved using
a convolutional layer containing 8 hidden units, with a
16dimensional character embedding input. All LSTM models
were trained using the NCRF++ Python package [Yang and
Zhang, 2018]. Each LSTM was trained for 100 epochs
using stochastic gradient descent with a learning rate of 0.015,
a learning rate decay of 0.05, and a batch size of 32. During
training, models were evaluated at the end of each epoch
using a validation set, and the best performing model (across the
100 epochs) was selected for use in the evaluation. Training
was repeated 5 times for each LSTM configuration in order to
reduce the influence of pathological local minima, but none
were observed, therefore we randomly selected one of the 5
models for the evaluation (for each of the different
configurations).
5</p>
    </sec>
    <sec id="sec-5">
      <title>Results</title>
      <p>The different methods were compared using a 5-Fold Cross
Validation scheme. Performance metrics were computed both
at the sequence level and the token level. Token level metrics
were computed using the negligent act and negative patient</p>
      <sec id="sec-5-1">
        <title>CRF feature set</title>
      </sec>
      <sec id="sec-5-2">
        <title>Base</title>
        <p>Base + stem
Base + stem + suffix
Base + sentiment
Base + in medical lexicon
Base + in first sentence
Base + 7 similar words
Base + 6 similar words + suffix
Prec
0.428
0.427
0.429
0.424
0.417*
0.438
0.445*
0.443*
outcome labels only (i.e. “other” tags were ignored). Neither
evaluation scheme is perfectly suited to identifying the best
performing sequence tagger. For example, evaluating models
at the sequence level only will discount any examples where
the system correctly identifies the vast majority of a sequence,
but misses a single, minimally important term. Similarly,
token level evaluation is imperfect as it can mask pathological
behaviour. For example, a system can correctly identify the
majority of a phrase but fail to identify a single important
component (e.g. “no longer have any mobility in my”) and
still score highly using this scheme. While it is not perfect,
we suggest the phrase level evaluation is likely to be a better
indicator of a model’s usefulness in practice. In order to test
for the statistical significance of the results, we employed the
corrected re-sampled t-test [Nadeau and Bengio, 2001],
coupled with the Bonferroni correction for multiple comparisons
[Dunn, 1961].</p>
        <p>Table 6 compares the best performing CRF and LSTM
models. The CRF model performed significantly better at the
sequence level, while the LSTM offered significantly better
token level performance. Inspecting extractions performed on
a test set can be useful in comparing models. Figure 2 shows
some example extractions performed using these two
models. The outputs of the different models vary considerably:
the two approaches only fully agree on a single instance (12
instances in total). The LSTM repeatedly fails to identify the
beginning of the sequences: it only outputs a single B tag (a B
tag indicates the first term in a sequence) out of a possible 12,
whereas the CRF outputs 9 B tags. The LSTM exhibits
additional undesirable behaviour: it erroneously splits sequences
University hospital mistakenly amputated my left leg despite
the fact the cancer was confined within my right leg. I will
now need to undergo another leg amputation and will be
confined to a wheelchair for the rest of my life.</p>
        <p>I believe the University pharmacy to be negligent as
they misprescribed me with ibuprofen when they should have
given me paracetamol. I felt sick for a week as a result.
I believe the midwife at University hospital was at fault
because she dropped my newborn Son. This caused his arm
to break, and his head is now misshapen. We are unsure if
his head will ever regain its original shape, or if he will have
lasting problems with his arm.</p>
        <p>I believe the GP at the Village Health Centre should
have noticed the lump when I first presented with my
symptoms. My cancer diagnosis has now been delayed by 15
months, and the prognosis is much worse.
in two, often dropping a common word. It appears that the
LSTM is giving too much consideration to the current word,
and the previous sequence information is discounted. Both
approaches make some subtle mistakes that produce
extractions that appear to be correct at a first glance, but are actually
incorrect. For example, in the third example in Figure 2, the
CRF identifies the sequence “lasting problems with his arm”,
when in reality in the statement the author suggests they are
unsure whether the child will have lasting problems with their
arm. Extractions like this could prove to be problematic, if
such a system is used to quickly extract the key case facts
from a statement.</p>
        <p>Tables 4 and 5 compare the different CRF feature sets and
LSTM configurations. The different LSTM configurations
performed similarly well, except for in cases where the word
embeddings were initialised using pre-trained GloVe
vectors – in these instances the models performed significantly
worse than the baseline LSTM. We also found that
training a BiLSTM with character level features significantly
improved recall. Moreover, we found that adding
sparsesnesscounteracting features improved CRF performance – the best
performing CRF model made use of similar word features
(N=7). We also found adding domain specific features to be
helpful: including whether or not a word occurs in the claim
text’s first sentence as a feature significantly improved token
level performance. This feature was strongly associated with
the negligent act class.
6</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>Discussion and Conclusion</title>
      <p>In this set of experiments we found both CRF and LSTM
models were able to extract litigation-relevant information
NA</p>
      <p>O
AVG
B-NA
I-NA
B-O
I-O
AVG</p>
      <p>Sequence level evaluation
CRF
from medical negligence claim texts. We observed that the
CRF was better able to identify entire useful phrases, while
the LSTM was able to assign labels to tokens with higher
precision. The best performing CRF model’s ability to
identify evidence is likely sufficient for it to be useful in practice.
We found that enriching the CRF features with similar words,
computed using pre-trained word embeddings, improved the
CRF’s performance. We also observed including domain
specific features improved the CRF’s performance. While the
evaluation suggests the CRF is better suited to this task than
the LSTM, we recognise it may well be biased in favour of
the CRF. This is because we experimented with few LSTM
architectures, and the architecture is an important
hyperparameter when training neural network models. In future work
we plan to experiment further with the LSTM architecture.
Specifically, we plan to vary the dimensionality of the
various embedding and hidden layers. We also plan to experiment
with a CRF output layer with the view that this will likely
improve the LSTM’s sequence level performance. We also plan
to collect more data, which may benefit both approaches and
further assist with the development of our automated tools for
processing medical negligence documents.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [Bollegala et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Danushka</given-names>
            <surname>Bollegala</surname>
          </string-name>
          , Simon Maskell, Richard Sloane, Joanna Hajne, and
          <string-name>
            <given-names>Munir</given-names>
            <surname>Pirmohamed</surname>
          </string-name>
          .
          <article-title>Causality patterns for detecting adverse drug reactions from social media: Text mining approach</article-title>
          .
          <source>JMIR Public Health and Surveillance</source>
          ,
          <volume>4</volume>
          (
          <issue>2</issue>
          ):e51, May
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <source>[Dunn</source>
          , 1961]
          <article-title>Olive Jean Dunn</article-title>
          .
          <article-title>Multiple comparisons among means</article-title>
          .
          <source>American Statistical Association</source>
          , pages
          <fpage>52</fpage>
          -
          <lpage>64</lpage>
          ,
          <year>1961</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [Guo et al.,
          <year>2009</year>
          ]
          <string-name>
            <given-names>Jiafeng</given-names>
            <surname>Guo</surname>
          </string-name>
          , Gu Xu, Xueqi Cheng, and
          <string-name>
            <given-names>Hang</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Named entity recognition in query</article-title>
          .
          <source>In SIGIR 2009</source>
          , pages
          <fpage>267</fpage>
          -
          <lpage>274</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <source>[Hochreiter and Schmidhuber</source>
          , 1997]
          <article-title>Sepp Hochreiter and Ju¨rgen Schmidhuber. Long short-term memory</article-title>
          .
          <source>Neural Comput.</source>
          ,
          <volume>9</volume>
          (
          <issue>8</issue>
          ):
          <fpage>1735</fpage>
          -
          <lpage>1780</lpage>
          ,
          <year>November 1997</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [Huang et al.,
          <year>2015</year>
          ]
          <string-name>
            <given-names>Zhiheng</given-names>
            <surname>Huang</surname>
          </string-name>
          , Wei Xu,
          <string-name>
            <given-names>and Kai</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Bidirectional LSTM-CRF Models for Sequence Tagging</article-title>
          . arXiv e-prints,
          <source>page arXiv:1508</source>
          .
          <year>01991</year>
          ,
          <year>Aug 2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>[Korobov</source>
          , 2017]
          <string-name>
            <given-names>Mikhail</given-names>
            <surname>Korobov.</surname>
          </string-name>
          sklearn-crfsuite. https: //github.com/TeamHG-Memex/sklearn-crfsuite,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [Kuru et al.,
          <year>2016</year>
          ]
          <string-name>
            <given-names>Onur</given-names>
            <surname>Kuru</surname>
          </string-name>
          , Ozan Arkan Can, and
          <string-name>
            <given-names>Deniz</given-names>
            <surname>Yuret</surname>
          </string-name>
          . Charner:
          <article-title>Character-level named entity recognition</article-title>
          .
          <source>In Proceedings of COLING</source>
          <year>2016</year>
          ,
          <source>the 26th International Conference on Computational Linguistics: Technical Papers</source>
          , pages
          <fpage>911</fpage>
          -
          <lpage>921</lpage>
          , Osaka, Japan,
          <year>December 2016</year>
          .
          <article-title>The COLING 2016 Organizing Committee</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [Lafferty et al.,
          <year>2001</year>
          ] John Lafferty,
          <string-name>
            <surname>Andrew McCallum</surname>
            ,
            <given-names>and Fernando</given-names>
          </string-name>
          <string-name>
            <surname>Pereira</surname>
          </string-name>
          .
          <article-title>Conditional random fields: Probabilistic models for segmenting and labeling sequence data</article-title>
          .
          <source>In ICML 2001</source>
          , pages
          <fpage>282</fpage>
          -
          <lpage>289</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [Mandya et al.,
          <year>2017</year>
          ]
          <string-name>
            <given-names>Angrosh</given-names>
            <surname>Mandya</surname>
          </string-name>
          , Danushka Bollegala, Frans Coenen, and
          <string-name>
            <given-names>Katie</given-names>
            <surname>Atkinson</surname>
          </string-name>
          .
          <article-title>Frame-based semantic patterns for relation extraction</article-title>
          .
          <source>In Proc. of the 15th International Conference of the Pacific Association for Computational Linguistics (PACLING)</source>
          , pages
          <fpage>51</fpage>
          -
          <lpage>62</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <source>[McCallum and Li</source>
          , 2003]
          <string-name>
            <given-names>Andrew</given-names>
            <surname>McCallum</surname>
          </string-name>
          and
          <string-name>
            <given-names>Wei</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <article-title>Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons</article-title>
          .
          <source>In Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003 - Volume 4, CONLL '03</source>
          , pages
          <fpage>188</fpage>
          -
          <lpage>191</lpage>
          , Stroudsburg, PA, USA,
          <year>2003</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>[Miwa and Bansal</source>
          , 2016]
          <string-name>
            <given-names>Makoto</given-names>
            <surname>Miwa</surname>
          </string-name>
          and
          <string-name>
            <given-names>Mohit</given-names>
            <surname>Bansal</surname>
          </string-name>
          .
          <article-title>End-to-end relation extraction using lstms on sequences and tree structures</article-title>
          .
          <source>In Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</source>
          , pages
          <fpage>1105</fpage>
          -
          <lpage>1116</lpage>
          , Berlin, Germany,
          <year>August 2016</year>
          .
          <article-title>Association for Computational Linguistics</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>[Nadeau and Bengio</source>
          , 2001]
          <string-name>
            <given-names>Claude</given-names>
            <surname>Nadeau</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yoshua</given-names>
            <surname>Bengio</surname>
          </string-name>
          .
          <article-title>Inference for the generalization error</article-title>
          .
          <source>Machine Learning</source>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          <source>[NHS</source>
          ,
          <year>2018</year>
          ] NHS.
          <source>Annual report and accounts 2017/18. Technical report, National Health Service (NHS) Resolution</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [Pennington et al.,
          <year>2014</year>
          ] Jeffery Pennington, Richard Socher, and
          <string-name>
            <given-names>Christopher D.</given-names>
            <surname>Manning</surname>
          </string-name>
          . Glove:
          <article-title>global vectors for word representation</article-title>
          .
          <source>In Proc. of EMNLP</source>
          , pages
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [Ritter et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Alan</given-names>
            <surname>Ritter</surname>
          </string-name>
          , Sam Clark, Mausam, and
          <string-name>
            <given-names>Oren</given-names>
            <surname>Etzioni</surname>
          </string-name>
          .
          <article-title>Named entity recognition in tweets: An experimental study</article-title>
          .
          <source>In EMNLP'11</source>
          , pages
          <fpage>1524</fpage>
          -
          <lpage>1534</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [Rud et al.,
          <year>2011</year>
          ]
          <string-name>
            <given-names>Stefan</given-names>
            <surname>Rud</surname>
          </string-name>
          , Massimiliano Ciaramita, Jens Muller, and
          <string-name>
            <given-names>Hinrich</given-names>
            <surname>Schutze</surname>
          </string-name>
          .
          <article-title>Piggyback: Using search engines for robust cross-domain named entity recognition</article-title>
          .
          <source>In ACL'11</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [Shen et al.,
          <year>2018</year>
          ]
          <string-name>
            <given-names>Yanyao</given-names>
            <surname>Shen</surname>
          </string-name>
          , Hyokun Yun, Zachary C. Lipton, Yakov Kronrod, and
          <string-name>
            <given-names>Animashree</given-names>
            <surname>Anandkumar</surname>
          </string-name>
          .
          <article-title>Deep active learning for named entity recognition</article-title>
          .
          <source>In International Conference on Learning Representations</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [SHI et al.,
          <year>2015</year>
          ]
          <string-name>
            <surname>Xingjian</surname>
            <given-names>SHI</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhourong</surname>
            <given-names>Chen</given-names>
          </string-name>
          , Hao Wang,
          <string-name>
            <surname>Dit-Yan</surname>
            <given-names>Yeung</given-names>
          </string-name>
          ,
          <article-title>Wai-kin Wong, and Wang-chun WOO</article-title>
          .
          <article-title>Convolutional lstm network: A machine learning approach for precipitation nowcasting</article-title>
          . In C. Cortes,
          <string-name>
            <given-names>N. D.</given-names>
            <surname>Lawrence</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. D.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Sugiyama</surname>
          </string-name>
          , and R. Garnett, editors,
          <source>Advances in Neural Information Processing Systems</source>
          <volume>28</volume>
          , pages
          <fpage>802</fpage>
          -
          <lpage>810</lpage>
          . Curran Associates, Inc.,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [Sloane et al.,
          <year>2015</year>
          ] Richard Sloane, Orod Osanlou, David Lewis,
          <string-name>
            <given-names>Danushka</given-names>
            <surname>Bollegala</surname>
          </string-name>
          , Simon Maskell, and
          <string-name>
            <given-names>Munir</given-names>
            <surname>Pirmohamed</surname>
          </string-name>
          .
          <article-title>Social media and pharmacovigilance: A review of the opportunities and challenges</article-title>
          .
          <source>British Journal of Clinical Pharmacology</source>
          , pages
          <fpage>910</fpage>
          -
          <lpage>920</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          <source>[Toyabe</source>
          , 2012]
          <article-title>Shin-ichi Toyabe. Detecting inpatient falls by using natural language processing of electronic medical records</article-title>
          .
          <source>BMC Health Services Research</source>
          ,
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <year>Dec 2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          <source>[Yang and Zhang</source>
          , 2018]
          <string-name>
            <given-names>Jie</given-names>
            <surname>Yang</surname>
          </string-name>
          and
          <string-name>
            <given-names>Yue</given-names>
            <surname>Zhang</surname>
          </string-name>
          . Ncrf+
          <article-title>+: An open-source neural sequence labeling toolkit</article-title>
          .
          <source>In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</source>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>