<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Extracting Supporting Evidence from Medical Negligence Claim Texts</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Robert</forename><surname>Bevan</surname></persName>
							<email>robert.bevan@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Alessandro</forename><surname>Torrisi</surname></persName>
							<email>alessandro.torrisi@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Danushka</forename><surname>Bollegala</surname></persName>
							<email>danushka@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Frans</forename><surname>Coenen</surname></persName>
							<email>coenen@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Katie</forename><surname>Atkinson</surname></persName>
							<email>k.m.atkinson@liverpool.ac.uk</email>
							<affiliation key="aff0">
								<orgName type="institution">University of Liverpool</orgName>
							</affiliation>
						</author>
						<title level="a" type="main">Extracting Supporting Evidence from Medical Negligence Claim Texts</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">DDD064F01773B5B95AFDB66F9F8AC397</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-03-24T15:59+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The number of medical negligence claims filed in the UK each year has increased significantly over the past decade [NHS, 2018]. When filing a medical negligence claim, electronic health records act as a legally valid important source of evidence. Patients often undergo different and complex treatments over many months or years, easily resulting in hundreds of pages of electronically available medical records. Therefore, it is a non-trivial task to read all the related electronic health records and identify the supporting evidence to establish a legal case. Currently, the process of identifying evidence is carried out by humans who are experts in both medical negligence law and medicine. In this paper, we compare different methods of automatically extracting relevant statements from medical negligence claim texts, to move towards building a method for extracting relevant sections from electronic health records with the aim of expediting the litigation process and reducing the manual efforts involved. Specifically, we annotate a dataset containing medical negligence claim texts and train conditional random field (CRF) and long short-term memory (LSTM) network models for extracting information relevant to cases. Our evaluation shows that each model class has its merits in this task: the CRF models were significantly more effective in identifying full sequences, while the LSTMs were significantly better at assigning tags to tokens. We found both approaches were able to identify information that is key to the litigation process.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1">Introduction</head><p>Medical negligence claims are a significant source of litigation. For example, in 2018, the national health service (NHS) in the United Kingdom reported that it paid GBP 1,623 million as compensation for 10,637 claims <ref type="bibr">[NHS, 2018]</ref>. Acts of medical negligence can vary in complexity as well as severity. Finding the reasons behind medical negligence acts is important in order to prevent such unfortunate events in the future <ref type="bibr" target="#b6">[Toyabe, 2012]</ref>. Moreover, in the event where a patient University hospital mistakenly amputated my left leg despite the fact the cancer was confined within my right leg. I will now need to undergo another leg amputation and will be confined to a wheelchair for the rest of my life. (or a legal representative acting on behalf of a patient), would like to prosecute the health care provider for medical negligence, a legal case must be filed based on medical evidence. An important source of medical evidence for such prevention efforts or litigation processes is the electronic health records describing the various treatments undergone by the patient, the medication prescribed for the patient, and their medical history. The volume of electronic health records for a single patient can be significant. It is not uncommon for a patient to be subjected to medical treatment for many months, if not years, and typically a much smaller set of relevant evidence supporting the medical negligence case must be identified from this vast amount of information. Furthermore, filtering electronic health records according to the date of the alleged negligent act is not sufficient when building a body of evidence due to the non-contiguous distribution of evidence contained within the records. For example, negative patient outcomes may occur years after an initial negligent act, therefore filtering records by date may result in evidence being discarded.</p><p>The existing process for identifying supporting evidence from electronic health records is a manual one. Humans who are knowledgeable in both medical negligence law and medicine must manually read a collection of medical records and then carefully select parts that can be used as evidence in the litigation process. Needless to say, this is both a time consuming and a costly process. Moreover, the number of individuals possessing both legal and medical background knowledge is small, which means a limited number of medical records can be read and analysed over a given period of time. These drawbacks in the existing pipeline for extracting evidence call for automatic methods that can efficiently "read" large quantities of medical records and accurately extract the relevant evidence.</p><p>In this paper, given medical negligence claim texts, we compare methods of automatically extracting expressions that are relevant to the medical negligence case: the alleged neg-ligent acts, and any consequential negative patient outcomes. This can be useful in helping lawyers quickly establish the key elements of the case, and we conjecture this will be useful as part of a system for automatically extracting supporting evidence from medical records.</p><p>Specifically, first we manually annotate a set of medical negligence claim texts, identifying any statements of negligent acts and any consequential negative patient outcomes. An example is shown in Figure <ref type="figure" target="#fig_0">1</ref>, where text relating to negligent acts and negative outcomes are highlighted in red and blue respectively. Next, we train a Conditional Random Field (CRF) <ref type="bibr" target="#b2">[Lafferty et al., 2001]</ref> model for predicting BIO (Begin-Inside-Outside) tags for extracting sequences of tokens in texts belonging to the previously described categories. We use different types of features such as Part of Speech (POS), typography, and medical lexicons. One issue we encounter in this approach is the data sparsenessthe limited overlap of the tokens between the training and testing data. To overcome this data sparseness issue, we use pre-trained word embeddings and automatically append training instances with related features that did not appear in the original training instances. Our experimental results show that this feature augmentation approach successfully overcomes the data sparseness problem. Finally, we train various Long Short-Term Memory (LSTM) networks <ref type="bibr" target="#b1">[Hochreiter and Schmidhuber, 1997]</ref> for the same task. We experiment with both regular and Bidirectional LSTMs (BiLSTMs), and make use of both word and character level features.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2">Related Work</head><p>Information extraction has a long and established history as a task in NLP. In Named Entity Recognition (NER) <ref type="bibr" target="#b5">[Shen et al., 2018;</ref><ref type="bibr" target="#b2">Kuru et al., 2016;</ref><ref type="bibr" target="#b4">Ritter et al., 2011;</ref><ref type="bibr" target="#b1">Guo et al., 2009;</ref><ref type="bibr" target="#b4">Rud et al., 2011]</ref>, the goal is to extract mentions of named entities such as people, locations, organisations, products etc. It has been reported that over 70% of web search queries contain some form of a named entity <ref type="bibr" target="#b1">[Guo et al., 2009]</ref>. Therefore, being able to recognise named entities enables us to find more relevant results in information retrieval. Relation Extraction (RE) <ref type="bibr" target="#b2">[Mandya et al., 2017;</ref><ref type="bibr" target="#b2">Miwa and Bansal, 2016]</ref> further extends this process by identifying the semantic relations that exist between two or more recognised named entities. For example, a competitor relation can exist between two companies, which can later transform into an acquisition relation. In medical contexts, identifying the adverse reactions associated with drugs (ADRs) from formal reporting tools, such as the Yellow Card System, or more informal reporting methods, such as social media, has received wide attention <ref type="bibr" target="#b0">[Bollegala et al., 2018;</ref><ref type="bibr" target="#b5">Sloane et al., 2015]</ref>.</p><p>Our problem: extracting litigation relevant statements from medical negligence case texts, can be seen as a specific instance of the above-described information extraction problem. However, there are some important properties in our case, which differentiate it from the more popular information extraction problems such as NER, RE or ADR extraction. First, compared to, for example, named entities, evidence related to medical negligence tends to comprise longer</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Statement type</head><p>Count Mean word count negligent act 2551 11 (+/-6) negative outcome 5510 4 (+/-3)  sequences. For example, the evidence extracted in Figure <ref type="figure" target="#fig_0">1</ref> contains the sequence of words mistakenly amputated my left leg. Second, unlike relations or entities, it is non-obvious how to classify negligence related evidence into categories. This becomes problematic when generalising the extraction rules from one domain to another. To the best of our knowledge, the problem of extracting medical negligence related evidence from free text data has not been studied before.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3">Evidence Extraction</head><p>CRFs and LSTMs are two classes of models that perform well, and are often employed, in a range of sequence labelling tasks <ref type="bibr" target="#b2">[Huang et al., 2015;</ref><ref type="bibr" target="#b5">SHI et al., 2015;</ref><ref type="bibr" target="#b2">McCallum and Li, 2003]</ref>. Both model classes are able to leverage historical and future sequence information when classifying the current sequence element. This makes them well suited to natural language processing tasks. One advantage LSTMs have over CRFs is their ability to learn feature representations that are specific to the task at hand. We employ both model classes in this work and compare their performance in the task of identifying negligent acts and consequential negative patient outcomes from medical negligence claim texts.</p><p>The dataset used in this evaluation comprises 2014 medical negligence claim summary texts collected by a law firm operating in the medical negligence domain. These texts contain statements describing negligent acts as well as any consequential negative patient outcomes (Figure <ref type="figure" target="#fig_0">1</ref>). The texts were annotated by a domain expert with BIO tags delineating negligent act statements and consequential negative patient outcome statements. Table <ref type="table" target="#tab_0">1</ref> shows some dataset statistics. Due to the confidential nature of this dataset, we are unable to share it publicly.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4">Experiments</head><p>CRF models were trained using various combinations of the features listed in  listed in the middle column were introduced to address the problem of data sparseness. The similar word features require further explanation; these were generated using pretrained GloVe <ref type="bibr" target="#b4">[Pennington et al., 2014]</ref> embeddings: given a word, the N words with the highest cosine similarity were included as additional features; the value for N was varied (N={1..10}). Similar word suffix features were also experimented with. The features in the right-hand column are domain specific. For example, it was observed that negligent act statements are often present in the first sentence of a claim text. Also, negligent act statements frequently contain medical terminology. The listed features were computed for each token in each sequence as well as the preceding and following tokens. All CRF models were trained using the sklearncrfsuite Python package <ref type="bibr" target="#b2">[Korobov, 2017]</ref>. The following hyper-parameters were tuned using a randomised search over 50 iterations: the Elastic net regularisation coefficient, the minimum feature frequency, and the possible state and transition features.</p><p>We experimented with various LSTM configurations (see Table <ref type="table" target="#tab_3">3</ref>). The baseline LSTM comprised a 50-dimensional word embedding input, a single LSTM layer of 16 hidden units, and a softmax output. This model was trained both with random and pre-trained GloVe word embedding initialisation. A bi-directional variant of the baseline LSTM was also experimented with. In addition, the baseline model was extended to include character-level features. This was achieved using a convolutional layer containing 8 hidden units, with a 16dimensional character embedding input. All LSTM models were trained using the NCRF++ Python package <ref type="bibr" target="#b6">[Yang and Zhang, 2018]</ref>. Each LSTM was trained for 100 epochs using stochastic gradient descent with a learning rate of 0.015, a learning rate decay of 0.05, and a batch size of 32. During training, models were evaluated at the end of each epoch using a validation set, and the best performing model (across the 100 epochs) was selected for use in the evaluation. Training was repeated 5 times for each LSTM configuration in order to reduce the influence of pathological local minima, but none were observed, therefore we randomly selected one of the 5 models for the evaluation (for each of the different configurations).</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5">Results</head><p>The different methods were compared using a 5-Fold Cross Validation scheme. Performance metrics were computed both at the sequence level and the token level. Token level metrics were computed using the negligent act and negative patient outcome labels only (i.e. "other" tags were ignored). Neither evaluation scheme is perfectly suited to identifying the best performing sequence tagger. For example, evaluating models at the sequence level only will discount any examples where the system correctly identifies the vast majority of a sequence, but misses a single, minimally important term. Similarly, token level evaluation is imperfect as it can mask pathological behaviour. For example, a system can correctly identify the majority of a phrase but fail to identify a single important component (e.g. "no longer have any mobility in my") and still score highly using this scheme. While it is not perfect, we suggest the phrase level evaluation is likely to be a better indicator of a model's usefulness in practice. In order to test for the statistical significance of the results, we employed the corrected re-sampled t-test <ref type="bibr" target="#b3">[Nadeau and Bengio, 2001]</ref>, coupled with the Bonferroni correction for multiple comparisons <ref type="bibr" target="#b1">[Dunn, 1961]</ref>.</p><p>Table <ref type="table" target="#tab_6">6</ref> compares the best performing CRF and LSTM models. The CRF model performed significantly better at the sequence level, while the LSTM offered significantly better token level performance. Inspecting extractions performed on a test set can be useful in comparing models. Figure <ref type="figure" target="#fig_1">2</ref> shows some example extractions performed using these two models. The outputs of the different models vary considerably: the two approaches only fully agree on a single instance (12 instances in total). The LSTM repeatedly fails to identify the beginning of the sequences: it only outputs a single B tag (a B tag indicates the first term in a sequence) out of a possible 12, whereas the CRF outputs 9 B tags. The LSTM exhibits additional undesirable behaviour: it erroneously splits sequences University hospital mistakenly amputated my left leg despite the fact the cancer was confined within my right leg. I will now need to undergo another leg amputation and will be confined to a wheelchair for the rest of my life.</p><p>I believe the University pharmacy to be negligent as they misprescribed me with ibuprofen when they should have given me paracetamol. I felt sick for a week as a result.</p><p>I believe the midwife at University hospital was at fault because she dropped my newborn Son. This caused his arm to break, and his head is now misshapen. We are unsure if his head will ever regain its original shape, or if will have lasting problems with his arm.</p><p>I believe the GP at the Village Health Centre should have noticed the lump when I first presented with my symptoms. My cancer diagnosis has now been delayed by 15 months, and the prognosis is much worse. in two, often dropping a common word. It appears that the LSTM is giving too much consideration to the current word, and the previous sequence information is discounted. Both approaches make some subtle mistakes that produce extractions that appear to be correct at a first glance, but are actually incorrect. For example, in the third example in Figure <ref type="figure" target="#fig_1">2</ref>, the CRF identifies the sequence "lasting problems with his arm", when in reality in the statement the author suggests they are unsure whether the child will have lasting problems with their arm. Extractions like this could prove to be problematic, if such a system is used to quickly extract the key case facts from a statement.</p><p>Tables <ref type="table" target="#tab_5">4 and 5</ref> compare the different CRF feature sets and LSTM configurations. The different LSTM configurations performed similarly well, except for in cases where the word embeddings were initialised using pre-trained GloVe vectors -in these instances the models performed significantly worse than the baseline LSTM. We also found that training a BiLSTM with character level features significantly improved recall. Moreover, we found that adding sparsesnesscounteracting features improved CRF performance -the best performing CRF model made use of similar word features (N=7). We also found adding domain specific features to be helpful: including whether or not a word occurs in the claim text's first sentence as a feature significantly improved token level performance. This feature was strongly associated with the negligent act class.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="6">Discussion and Conclusion</head><p>In this set of experiments we found both CRF and LSTM models were able to extract litigation-relevant information from medical negligence claim texts. We observed that the CRF was better able to identify entire useful phrases, while the LSTM was able to assign labels to tokens with higher precision. The best performing CRF model's ability to identify evidence is likely sufficient for it to be useful in practice. We found that enriching the CRF features with similar words, computed using pre-trained word embeddings, improved the CRF's performance. We also observed including domain specific features improved the CRF's performance. While the evaluation suggests the CRF is better suited to this task than the LSTM, we recognise it may well be biased in favour of the CRF. This is because we experimented with few LSTM architectures, and the architecture is an important hyperparameter when training neural network models. In future work we plan to experiment further with the LSTM architecture. Specifically, we plan to vary the dimensionality of the various embedding and hidden layers. We also plan to experiment with a CRF output layer with the view that this will likely improve the LSTM's sequence level performance. We also plan to collect more data, which may benefit both approaches and further assist with the development of our automated tools for processing medical negligence documents.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: An example extraction (performed by a human).</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Example extractions performed by the CRF and LSTM.Tokens underlined with blue were identified by the CRF only; tokens underlined with red were identified by the LSTM only, and tokens underlined with violet were identified by both models.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1 :</head><label>1</label><figDesc>Dataset summary.</figDesc><table><row><cell>Generic</cell><cell>Sparseness</cell><cell>Domain specific</cell></row><row><cell>word</cell><cell>stem</cell><cell>sentiment</cell></row><row><cell>word suffixes</cell><cell>stem suffixes</cell><cell>in medical lexicon</cell></row><row><cell>is upper case</cell><cell>similar words</cell><cell>in first sentence</cell></row><row><cell>is title</cell><cell>similar word suffixes</cell><cell></cell></row><row><cell>is digit</cell><cell></cell><cell></cell></row><row><cell>POS tag</cell><cell></cell><cell></cell></row><row><cell>POS tag suffix</cell><cell></cell><cell></cell></row><row><cell>is first word</cell><cell></cell><cell></cell></row><row><cell>is last word</cell><cell></cell><cell></cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 2 :</head><label>2</label><figDesc>Features used in CRF experiments.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_2"><head>Table 2 .</head><label>2</label><figDesc>The features listed in the lefthand column are common to most text tagging tasks. Those</figDesc><table><row><cell>LSTM settings</cell></row><row><cell>LSTM</cell></row><row><cell>LSTM + GloVe</cell></row><row><cell>LSTM + Char</cell></row><row><cell>BiLSTM</cell></row><row><cell>BiLSTM + Char</cell></row><row><cell>BiLSTM + GloVe + Char</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_3"><head>Table 3 :</head><label>3</label><figDesc>LSTM configurations used in these experiments.</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_4"><head>Table 4 :</head><label>4</label><figDesc>Selected CRF model performance evaluated at the sequence level (micro-averaged). Note: Base refers to the the baseline CRF, which made use of generic features only; best results in bold font; * indicates a significant result (P=0.05, Bonferroni corrected).</figDesc><table><row><cell>CRF feature set</cell><cell>Prec</cell><cell>Rec</cell><cell>F1</cell></row><row><cell>Base</cell><cell cols="3">0.486 0.385 0.428</cell></row><row><cell>Base + stem</cell><cell cols="3">0.486 0.384 0.427</cell></row><row><cell>Base + stem + suffix</cell><cell cols="3">0.492 0.382 0.429</cell></row><row><cell>Base + sentiment</cell><cell cols="3">0.487 0.378 0.424</cell></row><row><cell>Base + in medical lexicon</cell><cell cols="3">0.468 0.379 0.417*</cell></row><row><cell>Base + in first sentence</cell><cell cols="3">0.495 0.396 0.438</cell></row><row><cell>Base + 7 similar words</cell><cell cols="3">0.497 0.406 0.445*</cell></row><row><cell cols="4">Base + 6 similar words + suffix 0.489 0.406 0.443*</cell></row><row><cell>Configuration</cell><cell>Prec</cell><cell>Rec</cell><cell>F1</cell></row><row><cell>LSTM</cell><cell>0.245</cell><cell>0.252</cell><cell>0.248</cell></row><row><cell>LSTM + GloVe</cell><cell cols="3">0.195* 0.215* 0.205*</cell></row><row><cell>LSTM + Char</cell><cell>0.260</cell><cell>0.286</cell><cell>0.272</cell></row><row><cell>BiLSTM</cell><cell>0.230</cell><cell>0.242</cell><cell>0.236*</cell></row><row><cell>BiLSTM + Char</cell><cell>0.256</cell><cell cols="2">0.273* 0.264</cell></row><row><cell cols="4">BiLSTM + Char + GloVe 0.197* 0.219* 0.207*</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_5"><head>Table 5 :</head><label>5</label><figDesc>LSTM performance evaluated at the sequence level (microaveraged). Note: best results in bold font; * indicates a significant result (P=0.05, Bonferroni corrected).</figDesc><table /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_6"><head>Table 6 :</head><label>6</label><figDesc>Comparison of the best performing CRF and LSTM models evaluated at the phrase and token levels. Note: "NA" refers to negligent act; "O" refers to consequential negative outcome; AVG refers to the micro average; best results in bold font; * indicates a significant result (P=0.05, Bonferroni corrected).</figDesc><table><row><cell></cell><cell></cell><cell cols="3">Sequence level evaluation</cell><cell></cell></row><row><cell></cell><cell></cell><cell>Prec</cell><cell></cell><cell>Rec</cell><cell></cell><cell>F1</cell></row><row><cell></cell><cell>CRF</cell><cell cols="2">LSTM CRF</cell><cell cols="2">LSTM CRF</cell><cell>LSTM</cell></row><row><cell>NA</cell><cell cols="2">0.50* 0.32</cell><cell>0.43</cell><cell>0.41</cell><cell cols="2">0.46* 0.36</cell></row><row><cell>O</cell><cell cols="2">0.49* 0.22</cell><cell cols="2">0.39* 0.23</cell><cell cols="2">0.44* 0.23</cell></row><row><cell cols="3">AVG 0.49* 0.26</cell><cell cols="2">0.40* 0.29</cell><cell cols="2">0.44* 0.27</cell></row><row><cell></cell><cell></cell><cell cols="3">Token level evaluation</cell><cell></cell></row><row><cell></cell><cell></cell><cell>Prec</cell><cell></cell><cell>Rec</cell><cell></cell><cell>F1</cell></row><row><cell></cell><cell>CRF</cell><cell cols="2">LSTM CRF</cell><cell cols="2">LSTM CRF</cell><cell>LSTM</cell></row><row><cell cols="2">B-NA 0.68</cell><cell>0.87*</cell><cell>0.59</cell><cell>0.60</cell><cell>0.63</cell><cell>0.71*</cell></row><row><cell cols="2">I-NA 0.63</cell><cell>0.81*</cell><cell>0.67</cell><cell>0.77*</cell><cell>0.65</cell><cell>0.79*</cell></row><row><cell>B-O</cell><cell>0.60</cell><cell>0.75*</cell><cell cols="2">0.48* 0.24</cell><cell cols="2">0.54* 0.37</cell></row><row><cell>I-O</cell><cell>0.62</cell><cell>0.72*</cell><cell cols="2">0.55* 0.50</cell><cell>0.59</cell><cell>0.59</cell></row><row><cell cols="2">AVG 0.63</cell><cell>0.78*</cell><cell>0.60</cell><cell>0.61</cell><cell>0.61</cell><cell>0.67*</cell></row></table></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<title/>
		<author>
			<persName><surname>Bollegala</surname></persName>
		</author>
		<imprint>
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Causality patterns for detecting adverse drug reactions from social media: Text mining approach</title>
		<author>
			<persName><forename type="first">Richard</forename><surname>Maskell</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Joanna</forename><surname>Sloane</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Munir</forename><surname>Hajne</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Pirmohamed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Jean</forename><surname>Olive</surname></persName>
		</author>
		<author>
			<persName><forename type="first">;</forename><surname>Dunn</surname></persName>
		</author>
		<author>
			<persName><surname>Guo</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Multiple comparisons among means. American Statistical Association</title>
				<imprint>
			<date type="published" when="1961">May 2018. 1961. 1961. 2009. 2009. 1997. November 1997</date>
			<biblScope unit="volume">4</biblScope>
			<biblScope unit="page" from="1735" to="1780" />
		</imprint>
	</monogr>
	<note>Neural Comput.</note>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Early results for named entity recognition with conditional random fields, feature induction and web-enhanced lexicons</title>
		<author>
			<persName><surname>Huang</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1508.01991</idno>
		<ptr target="https://github.com/TeamHG-Memex/sklearn-crfsuite" />
	</analytic>
	<monogr>
		<title level="m">Proc. of the 15th International Conference of the Pacific Association for Computational Linguistics (PACLING)</title>
				<meeting>of the 15th International Conference of the Pacific Association for Computational Linguistics (PACLING)<address><addrLine>Osaka, Japan; Stroudsburg, PA, USA; Berlin, Germany</addrLine></address></meeting>
		<imprint>
			<publisher>Association for Computational Linguistics</publisher>
			<date type="published" when="2001">2015. Aug 2015. 2017. 2016. December 2016. 2001. 2001. 2017. 2017. 2003. 2003. August 2016</date>
			<biblScope unit="volume">1</biblScope>
			<biblScope unit="page" from="1105" to="1116" />
		</imprint>
	</monogr>
	<note type="report_type">arXiv e-prints</note>
	<note>: Long Papers)</note>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">Inference for the generalization error</title>
		<author>
			<persName><forename type="first">Bengio</forename><surname>Nadeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Claude</forename><surname>Nadeau</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yoshua</forename><surname>Bengio</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Machine Learning</title>
				<imprint>
			<date type="published" when="2001">2001. 2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Piggyback: Using search engines for robust cross-domain named entity recognition</title>
		<author>
			<persName><surname>Pennington</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">NHS. Annual report and accounts 2017/18</title>
				<imprint>
			<date type="published" when="2011">2018. 2014. 2014. 2011. 2011. 2011. 2011</date>
			<biblScope unit="page" from="1524" to="1534" />
		</imprint>
		<respStmt>
			<orgName>NHS ; National Health Service (NHS) Resolution</orgName>
		</respStmt>
	</monogr>
	<note type="report_type">Technical report</note>
	<note>ACL&apos;11</note>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Convolutional lstm network: A machine learning approach for precipitation nowcasting</title>
		<author>
			<persName><surname>Shen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">International Conference on Learning Representations</title>
				<editor>
			<persName><forename type="first">C</forename><surname>Cortes</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">N</forename><forename type="middle">D</forename><surname>Lawrence</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">D</forename><forename type="middle">D</forename><surname>Lee</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">M</forename><surname>Sugiyama</surname></persName>
		</editor>
		<editor>
			<persName><forename type="first">R</forename><surname>Garnett</surname></persName>
		</editor>
		<imprint>
			<publisher>Curran Associates, Inc</publisher>
			<date type="published" when="2015">2018. 2018. 2015. 2015. 2015. 2015</date>
			<biblScope unit="volume">28</biblScope>
			<biblScope unit="page" from="910" to="920" />
		</imprint>
	</monogr>
	<note>Social media and pharmacovigilance: A review of the opportunities and challenges</note>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Detecting inpatient falls by using natural language processing of electronic medical records</title>
		<author>
			<persName><forename type="first">Jie</forename><surname>Toyabe ; Shin-Ichi Toyabe</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Yue</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><surname>Zhang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics</title>
				<meeting>the 56th Annual Meeting of the Association for Computational Linguistics</meeting>
		<imprint>
			<date type="published" when="2012-12">2012. Dec 2012. 2018</date>
			<biblScope unit="volume">12</biblScope>
		</imprint>
	</monogr>
	<note>Ncrf++: An open-source neural sequence labeling toolkit</note>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
