<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>HEALTH BANK - A Workbench for Data Science Applications in Healthcare</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Hercules Dalianis</string-name>
          <email>hercules@dsv.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Aron Henriksson</string-name>
          <email>aronhen@dsv.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria Kvist</string-name>
          <email>maria.kvist@karolinska.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Sumithra Velupillai</string-name>
          <email>sumithra@dsv.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rebecka Weegar</string-name>
          <email>rebeckaw@dsv.su.se</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Computer and Systems Sciences, (DSV) Stockholm University</institution>
          ,
          <country country="SE">Sweden</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Learning</institution>
          ,
          <addr-line>Informatics, Management and Ethics (LIME) Karolinska Institutet, Stockholm</addr-line>
          ,
          <country country="SE">Sweden</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2015</year>
      </pub-date>
      <abstract>
        <p>The enormous amounts of data that are generated in the healthcare process and stored in electronic health record (EHR) systems are an underutilized resource that, with the use of data science applications, can be exploited to improve healthcare. To foster the development and use of data science applications in healthcare, there is a fundamental need for access to EHR data, which is typically not readily available to researchers and developers. A relatively rare exception is the large EHR database, the Stockholm EPR Corpus, comprising data from more than two million patients, that has been been made available to a limited group of researchers at Stockholm University. Here, we describe a number of data science applications that have been developed using this database, demonstrating the potential reuse of EHR data to support healthcare and public health activities, as well as facilitate medical research. However, in order to realize the full potential of this resource, it needs to be made available to a larger community of researchers, as well as to industry actors. To that end, we envision the provision of an infrastructure around this database called HEALTH BANK - the Swedish Health Record Research Bank. It will function both as a workbench for the development of data science applications and as a data exploration tool, allowing epidemiologists, pharmacologists and other medical researchers to generate and evaluate hypotheses. Aggregated data will be fed into a pipeline for open e-access, while non-aggregated data will be provided to researchers within an ethical permission framework. We believe that HEALTH BANK has the potential to promote a growing industry around the development of data science applications that will ultimately increase the efficiency and effectiveness of healthcare.</p>
      </abstract>
      <kwd-group>
        <kwd>electronic health record</kwd>
        <kwd>data science</kwd>
        <kwd>health intelligence</kwd>
        <kwd>infrastructure</kwd>
        <kwd>data mining</kwd>
        <kwd>text mining</kwd>
        <kwd>predictive modeling</kwd>
        <kwd>clinical text</kwd>
        <kwd>health bank</kwd>
        <kwd>health record research</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>Introduction</title>
      <p>Data produced in the healthcare setting is very valuable for further analysis and
development of improved healthcare processes, such as real-time monitoring,
decision support, and predictive analytics. Electronic health record (EHR) systems
are used in almost all healthcare institutions in the Nordic countries, providing
an invaluable opportunity for secondary data use and development of systems to
aid clinicians in their daily work, hospital managements in their work on process
and healthcare delivery improvements, and researchers in their work.</p>
      <p>
        Resources for health and medical research are currently available through
biobanks and national registers such as cancer registers and cause of death
registers for researchers with appropriate ethical permission. However, in the Nordic
countries, there are no easily available health record resources that describe
health processes, diagnoses and treatments of a real clinical population [
        <xref ref-type="bibr" rid="ref33 ref40">33, 40</xref>
        ].
      </p>
      <p>There has been an intense development of tools and techniques in the last
twenty years to automatically process a variety of data sources because of the
digitisation of the world, to enable further analysis and tool development. For
instance, as is widely known, the Internet contains information in various
formats, and a number of systems have been developed to make this information
readily available for easy access, such as search engines and information
extraction tools. The move to digitized solutions has also taken place in healthcare.
The tools have, however, not been developed at the same pace. One important
reason is that the health data has not been openly available for the research
community and industry in order to construct such tools, primarily because health
record data contains sensitive information about individuals – an aspect that is
extremely important and that requires particular considerations.</p>
      <p>To address these issues, we propose to develop an infrastructure that enables
access to de-identified EHR data for further analysis and system development.
This infrastructure will include a workbench with various preprocessing tools,
and will consist of two pipelines: one providing access to structured, aggregated
and completely de-identified data, and one requiring ethical permission before
access to original data is provided.</p>
      <p>
        This infrastructure will be based on a large clinical database, the Stockholm
EPR (Electronic Patient Record) Corpus, which has been collected and refined
during eight years [
        <xref ref-type="bibr" rid="ref7 ref8">8, 7</xref>
        ]. The Stockholm EPR Corpus contains over two million
patients from all medical and surgical departments from the entire hospital
(excluding only psychiatry and venereology), both inpatient and outpatient records
written by several different professionals at Karolinska University Hospital. The
records encompass the period 2006-2014. The corpus is de-identified with
regard to names of patients and personal identity numbers. The personal identity
number has been replaced by a serial number to ensure that the patient can be
followed through the care process. The database contains both structured data
– such as age, gender, ICD-10 diagnosis codes, ATC-drug codes, blood and
laboratory values, admission and discharge dates, timestamps – and unstructured
data (free text), e.g. daily notes by clinicians and discharges summaries.
      </p>
      <p>The infrastructure and workbench, called the Swedish Health Record
Research Bank (HEALTH BANK), will be unique in that it provides access to
authentic EHR data from the largest populated area in Sweden, from several
clinical departments and clinical professions. It will also provide language
technology tools for preprocessing and structuring the clinical narratives. Moreover,
it provides complementary data to available biobanks and registries, enabling
large-scale population studies for a variety of use-cases.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Electronic Health Record Resources for Research and</title>
    </sec>
    <sec id="sec-3">
      <title>System Development</title>
      <p>Internationally, some research groups have been able to obtain access to health
record data from one or two clinics, but almost never from a whole hospital or city
council. Moreover, access is usually restricted only to the research group, which
limits reproducibility and generalizability of research findings. Access to this
type of data is limited mostly due to legal reasons, but also because such large
repositories are often complex and not easy to extract data from. In particular,
the parts of the EHRs that are written in free text, such as discharge summaries
and daily notes, are often most difficult to obtain access to given their sensitive
nature, but constitute a large part of the healthcare documentation.</p>
      <p>Some large patient record databases or corpora (text collections) are available
for research, including the
i2b21 corpus contains of several clinical sub corpora in English that has been
used in several shared challenges.</p>
      <p>
        CMC2 corpus, containing 2,216 patient records in English
MIMIC II database3, which consists of 30 000 intensive care patient records
written in English [
        <xref ref-type="bibr" rid="ref44">44</xref>
        ]
A Finnish clinical corpus4, containing 2,800 sentences from nursing notes
and finally
THIN database, containing 11 million English patient records from general
practices [
        <xref ref-type="bibr" rid="ref34">34</xref>
        ]
      </p>
      <p>
        Both academia and industry have developed methods within computer
science, statistics, computational linguistics and machine learning. This is an
evolving research area also called e-science, or (big) data science - to process abundant
data and produce meaningful information [
        <xref ref-type="bibr" rid="ref29 ref37 ref6">37, 29, 6</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-4">
      <title>Data Science Applications for Healthcare</title>
      <p>It has been estimated that at least ten percent of all patients treated at hospitals
in Europe suffer from an adverse event (AE), including adverse drug events</p>
      <sec id="sec-4-1">
        <title>1 https://www.i2b2.org/NLP/HeartDisease/PreviousChallenges.php 2 http://computationalmedicine.org/catalog 3 http://www.physionet.org/physiotools/deid 4 http://bionlp.utu.fi/clinicalcorpus.html</title>
        <p>
          (ADE), healthcare associated infections (HAI), fall injuries and bedsores – in
total three million patients yearly [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ]. Such AEs prolong the treatment of the
patient, cause suffering for the patient, and is costly for society. In Sweden, with
its ten million inhabitants, it is estimated that AEs are responsible for 750,000
extra healthcare days at the hospital, costing an additional of 700 million euros
yearly, without taking into account the suffering of the patients [
          <xref ref-type="bibr" rid="ref45">45</xref>
          ]. Therefore,
detecting AEs is a cricital issue in healthcare.
        </p>
        <p>The Stockholm EPR Corpus at Stockholm University has been used for
several research projects that are of practical importance for healthcare. These
projects have included work on HAI detection, detection of ADEs in a
postmarketing setting, text simplification of the EHRs for laypeople, automatic
ICD10 diagnosis code assignment, mining of cancer records and pathology reports
for future improvement of cancer screening, and co-morbidity studies.</p>
        <p>
          For the successful development of such applications, basic text processing
tools are needed. Clinical notes in EHRs are difficult to process for several
reasons: they contain a large amount of misspellings, non-standard words and
abbreviations, incomplete sentences, and medical jargon. Therefore, we have
developed a set of basic tools to process clinical text written in Swedish. These include
factuality level classification [
          <xref ref-type="bibr" rid="ref58 ref61">58, 61</xref>
          ], negation detection [
          <xref ref-type="bibr" rid="ref46">46</xref>
          ], spelling error
detection [
          <xref ref-type="bibr" rid="ref10">10</xref>
          ], abbreviation normalization, [
          <xref ref-type="bibr" rid="ref28 ref32 ref57">28, 32, 57</xref>
          ], named entity recognition
[
          <xref ref-type="bibr" rid="ref17 ref48">48, 17</xref>
          ], as well as tools for expanding medical vocabularies [
          <xref ref-type="bibr" rid="ref16 ref23 ref24 ref47">16, 24, 47, 23</xref>
          ].
        </p>
        <p>
          We have also initiated studies on characterizing the domain-specific language
in this type of text [
          <xref ref-type="bibr" rid="ref49">49</xref>
          ], and performed studies on how well general language
tools and techniques work on clinical notes, such as syntactic parsers [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and
distributional semantic models [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] – studies that are important for the future
development of tools adapted for this domain.
        </p>
        <p>The development of these tools have also involved the creation of seven
reference standards, manually annotated for de-identification (of protected health
information), factuality levels of diagnostic expressions, clinical named entities,
indications and ADE relations, cervical cancer symptoms, classifications of HAI
(healthcare-associated infections) and clinical abbreviations. Many of the above
mentioned tools are trained on the annotated corpora. We would like to share
these valuable resources with other researchers.
3.1</p>
        <sec id="sec-4-1-1">
          <title>Automatic surveillance of healthcare-associated infections</title>
          <p>A healthcare-associated infection (HAI) is an infection obtained by a patient
during healthcare treatment. There is a requirement to report annually the number
of HAIs in each hospital, which is currently carried out in one of two ways: by
compulsory reporting of HAI cases, but also through so called Point Prevelance
Measurements (PPMs), which are carried out twice a year at all hospitals in
Sweden. PPMs are conducted manually by assessing all the patients admitted
on one particular day and deciding whether those patients have suffered from
a HAI or not. The estimates obtained through PPMs are not very reliable due
to the limited sample size: only 1-2% of all patients admitted during a year are
analyzed. Measurements made more frequently would give healthcare insitutions
a better instrument for surveillance, as well as facilitate the evaluation of actions
performed to reduce the number of HAIs.</p>
          <p>
            We have developed several prototype tools for detecting HAIs in EHRs. One
machine learning based tool, Detect-HAI, analyzes the clinical notes in a
patient’s health records automatically deterimines if the patient has potentially
suffered a HAI or not. The selected patients can thereafter be assessed by a
clinician. The tool is trained on health records that have been manually annotated,
or classified, by a physician. The system has access to the clinical text, body
temperature, drug lists and microbiology reports; it obtains 87% recall and 83%
precision using the random forest algorithm [
            <xref ref-type="bibr" rid="ref11">11</xref>
            ]. In another approach, rule- or
knowledge-based systems are developed for for specific HAI diagnoses, initially
focusing on urinary tract infections [
            <xref ref-type="bibr" rid="ref56">56</xref>
            ] and bacteriemia [
            <xref ref-type="bibr" rid="ref31">31</xref>
            ].
          </p>
          <p>In Figure 1 a tentative system for HAI surveillance is depicted. The
system follows the patient between caregivers, utilizing the fact that the Swedish
health-care system is connected throughout the country, which means that the
measurements can be carried out centrally by pulling information out of several
EHR systems and pushing back risk assessments.
3.2</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Detection and exploration of adverse drug events</title>
          <p>
            Adverse drug events constitute the most common form of iatrogenic injury,
causing approximately 3.7% of hospital admissions worldwide [
            <xref ref-type="bibr" rid="ref26">26</xref>
            ], and one of the
most common causes of death: in Sweden, they have been identified as the
seventh most common cause of death [
            <xref ref-type="bibr" rid="ref64">64</xref>
            ]. The safety of drug is thus a major public
health issue, necessitating their continuous monitoring, including post
marketing due to the unavoidable limitations of clinical trials in terms of duration and
sample size (number of patients). This activity, known as drug safety surveillance
or pharmacovigilance, primarily relies on collecting information voluntarily
reported by clinicians or users of the target drugs. Such individual case reports,
however, come with severe limitations, such underreporting and low reliability
[
            <xref ref-type="bibr" rid="ref12">12</xref>
            ]. In recent years, alternative sources for pharmacovigilance have emerged,
including EHRs, which have the distinct advantage of containing longitudinal
observations of the treatment of patients, including their drug use. To address
the underreporting of ADEs and thereby support pharmacovigilance, predictive
modeling can be leveraged to create systems that can detect ADEs on the basis
of patient-specific EHR data [
            <xref ref-type="bibr" rid="ref30 ref66 ref67">30, 67, 66</xref>
            ].
          </p>
          <p>
            EHR data can also be used for data exploration and testing hypotheses with
respect to, for instance, ADEs [
            <xref ref-type="bibr" rid="ref22">22</xref>
            ]. aDEX is an example of an exploratory data
analysis tool for investigating ADEs, currently using health records over a
twoyear period (2009-2010). With the tool, one can create case and control groups
to compare, e.g., patients who have experienced a specific ADE to patients who
have not. Using disproportionality analysis methods, which calculate how much
an event deviates from what is expected, one can identify drugs that seem to
have the largest risk of causing the ADE. Figure 2 displays a screenshot of aDEX.
3.3
          </p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Diagnosis code assignment</title>
          <p>
            Assigning diagnosis codes that correspond to a given disease or health condition
is necessary in order to estimate the prevalence and incidence of diseases and
health conditions, as well as monitor differences therein over space and time.
For such statistics to be, to some degree, comparable, a standard known as the
International Statistical Classification of Diseases and Related Health Problems
(ICD),[
            <xref ref-type="bibr" rid="ref65">65</xref>
            ], created by the World Health Organization, is in use. The process
of assigning diagnosis codes is generally carried out by either expert coders
or physicians. In both cases, diagnosis code assignment is expensive and
timeconsuming, yet essential. According to one estimate, the cost of diagnosis coding
and associated errors is approximately $25 billion per annum in the US [
            <xref ref-type="bibr" rid="ref41">41</xref>
            ]. The
Swedish National Board of Health and Welfare also estimates that 20 percent of
the assigned ICD-10 diagnosis codes are erroneous [
            <xref ref-type="bibr" rid="ref50">50</xref>
            ].
          </p>
          <p>
            It is not surprising, then, that efforts have long been made to provide
computeraided diagnostic coding [
            <xref ref-type="bibr" rid="ref41 ref52">52, 41</xref>
            ]. Using the Stockholm EPR Corpus, we have
explored the repurposing of distributional semantics – i.e., models of word meaning
that exploit word co-occurrence patterns in large corpora to obtain estimates of
semantic similarity between words [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ] – for the task of recommending diagnosis
codes to assign to a care episode [
            <xref ref-type="bibr" rid="ref18 ref19 ref20 ref21">21, 18–20</xref>
            ]. This approach leverages historical
encoding of diagnoses and the words used in the clinical notes of the
corresponding care episodes to create a predictive model that recommends possible diagnosis
codes to assign to a new care episode on the basis of the data – primarily in the
form of free-text – that is available for that care episode.
3.4
          </p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Text mining in the cancer domain</title>
          <p>
            Cervic al cancer is a disease that is treatable with a high success rate in its
early stages, but with few early symptoms. In later stages, it is a serious illness,
causing around 180 deaths in Sweden yearly [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
          </p>
          <p>
            An infection with a human papiloma virus (HPV) is necessary for the
development of cervical cancer [
            <xref ref-type="bibr" rid="ref62">62</xref>
            ], and as vaccines against HPV types 6, 11, 16 and
18 provides a high degree of protection against infection, vaccination programs
are belived to reduce the cases of cervical cancer [
            <xref ref-type="bibr" rid="ref38">38</xref>
            ]. Since screening with pap
smears, where women are investigated for pre-cancerous changes, have been
implemented, the number of cervical cancer cases has nearly halved in Sweden [
            <xref ref-type="bibr" rid="ref3">3</xref>
            ].
However, not all women take part in screening and other methods of finding
early symptoms would therefore be valuable.
          </p>
          <p>
            Health records contain a patient’s medical history, the free text part of the
records can reveal what previous diseases and symptoms a patient has
experienced. By applying text mining methods on records of cervical cancer patients
early, possibly unknown symptoms can be found. These symptoms could be
of great value for detection of the disease. We have investigated symptoms
described in the health records of patients with a cervical cancer diagnosis from the
Stockholm EPR Corpus, by performing named entity recognition and negation
detection [
            <xref ref-type="bibr" rid="ref63">63</xref>
            ].
          </p>
          <p>
            Another area in the cancer domain where text mining can be of value is
the transferral of free text information in pathology reports into structured
databases. Pathology reports describe tissue samples and can contain both
macroscopic and microscopic observations and a possible diagnosis for a patient with
known or suspected cancer [
            <xref ref-type="bibr" rid="ref39">39</xref>
            ]. Several studies have been performed on text
mining of pathology reports, where the aim has been to transfer the free text
data into structured format [
            <xref ref-type="bibr" rid="ref51">51</xref>
            ]. Manual transferal of pathology reports can be
expensive and time consuming, for example, at Kreftregistret in Oslo (Cancer
Registry of Norway), 20-25 human coders are working with manually
transferring the pathology reports produced in Norway to a database. Text mining
techniques can be used to automatize the transferral, completely or partly, to
the data database.
3.5
          </p>
        </sec>
        <sec id="sec-4-1-5">
          <title>Temporal modeling of clinical events</title>
          <p>Temporal information is a crucial aspect for developing accurate models of e.g.
disease progression and treatment effects. For instance, knowing that a particular
symptom occured before or after a patient was treated with a specific medication
alters the conclusions that can be drawn from how well a medication worked for a
particular problem. Time information can be extracted from EHR data through
document timestamps and other structured information, but is often also
documented in free text. To be able to extract time information from narratives,
usually three steps are required:
1. extracting temporal expressions denoting specific points in time (today, two
years ago, a while back, at 6 AM)
2. extracting the clinically relevant events (infection, antibiotics, surgery)
3. ordering these in time (infection before surgery).</p>
          <p>
            This is a challenging natural language processing task that has been subject
of several research studies on English clinical text [
            <xref ref-type="bibr" rid="ref35 ref4 ref42 ref53 ref54">53, 4, 35, 54, 42</xref>
            ]. Work on
creating systems for temporal information extraction for Swedish clinical text is
ongoing [
            <xref ref-type="bibr" rid="ref59">59</xref>
            ]. After successful temporal modeling of information in clinical notes,
patient trajectories and visualized timelines can be created, to be further used in
applications such as summarization tools [
            <xref ref-type="bibr" rid="ref25">25</xref>
            ] or for enriched predictive analysis.
3.6
          </p>
        </sec>
        <sec id="sec-4-1-6">
          <title>Text simplication of clinical narratives</title>
          <p>
            An area of increasing importance is also patient engagement and involvement. In
the future, patients themselves will most likely take a more active role in their
own healthcare process. This is already the case in some areas, through, for
instance, systems for self-monitoring of measurement values and self-treatment
guided by remote healthcare contact. There is also political incentives and
legislation in Sweden that describe how healthcare is to be transparent and
understandable for patients. One aspect with healthcare documentation is that it
is very specialized and complicated – an aspect that is necessary for the
communication among healthcare professionals in order to ensure preciseness and
detail. However, this means that the documentation is difficult to understand
for a layperson – e.g., a patient wanting to read her own medical records. One
way of bridging this gap would be to provide patients with a simplified version
of the medical records, where technical jargon and domain-specific vocabulary
is translated, or converted, to language that does not require medical expert
knowledge. We have performed several studies in this area, in particular in the
radiology domain. In collaboration with the Center for Easy-to-Read (Centrum
för Lättläst, in Sweden), we have analyzed which aspects of clinical
documentation are central to target for the creation of simplified "translations", we have
also studied and identified linguistic features that are characteristic for this type
of documentation [
            <xref ref-type="bibr" rid="ref49">49</xref>
            ]. Moreover, we have developed a pilot tool for handling
medical abbreviations [
            <xref ref-type="bibr" rid="ref28 ref32 ref36">28, 32, 36</xref>
            ], initiated work on lexical simplification [
            <xref ref-type="bibr" rid="ref13">13</xref>
            ],
and conducted interview studies with patients to identify which aspects of
clinical documentation are difficult to understand from their perspective [
            <xref ref-type="bibr" rid="ref1">1</xref>
            ].
3.7
          </p>
        </sec>
        <sec id="sec-4-1-7">
          <title>Comorbidity analysis</title>
          <p>Comorbidity is the presence of one or more additional disorders (or diseases)
co-occurring with a primary disease or disorder. In the current prototype
Comorbidity view5, researchers can inspect what comorbidities, based on assigned
ICD-10 diagnosis codes from the Stockholm EPR corpus, a group of patients
have. The case group can be selected based on, for instance, gender and age.</p>
          <p>
            Figure 3 shows a screenshot of the current Comorbidity View demonstrator,
displaying a group of patients from a subset of the database (2006-2008) who
have at least two ICD-10 diagnosis codes. An early version of the demonstrator
is described in Tanushi et al. [
            <xref ref-type="bibr" rid="ref55">55</xref>
            ].
4
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>HEALTH BANK – An Envisioned Infrastructure for</title>
    </sec>
    <sec id="sec-6">
      <title>EHR Data Access</title>
      <p>We have hitherto been successful in organizing and utilizing our EHR database
for research, as described above; however, the database is currently far from
being utilized to its full potential. To fulfill our vision of facilitating the
development of useful data science applications in the healthcare domain, our goal is to
provide access to this data, in a refined form, to both researchers and suppliers of
healthcare-related IT tools. To provide the data on a large scale in a sustainable
manner, there is a need for an infrastructure, the details of which are described
below. The intension is that this infrastructure – the Swedish Health Record
Research Bank (HEALTH BANK) – will provide a workbench for data science
application development in the healthcare domain. We believe that HEALTH
BANK will attract researchers and IT entrepreneurs from around the world to</p>
      <sec id="sec-6-1">
        <title>5 http://www2.dsv.su.se/comorbidityview-demo/</title>
        <p>promote the growth of the industry around data-intensive IT solutions in
healthcare. Making this valuable resoure readily available will moreover give Sweden
a competetitive advantage, while hopefully leading to more countries following
suit in taking similar iniatiatives in the endeavor of improving healthcare.
4.1</p>
        <sec id="sec-6-1-1">
          <title>Technical solutions</title>
          <p>The HEALTH BANK infrastructure requires a technical solution that
conveniently provides access to the EHR data to the various intended users, while
doing so in a secure fashion, which is critical given the inherently sensitive
nature of the data. The infrastructure will be designed as a pipeline, allowing the
user to select the data it wants and to obtain the data via e-access in a form
that fits the user’s needs (see Figure 4). An important prerequisite is thus that
the entire database is appropriately preprocessed and indexed to ensure that the
required information can be readily extracted. There will essentially be two ways
of accessing EHR data from HEALTH BANK:
1. Through standard web-based access, where users without ethical permission
can analyze the data from different views and/or download aggregated data
at levels encompassing at least one hundred patients. This will allow us to
provide users with secure access to de-identified data. For these purposes,
we plan to make the previously described aDEX and Comorbidity View6
available.
2. Through an encrypted e-connection, where users with ethical permission can
download non-aggregated data, including sensitive text7.</p>
          <p>As there is a great demand to link different health registers, such as biobanks
and cancer registers, with healthcare data8, we envision linking these data sources
to create added value. We also plan to add primary care data to the already
acquired hospital care data. This will allow researchers to follow patients
throughout the, sometimes elaborate, healthcare process.
4.2</p>
        </sec>
        <sec id="sec-6-1-2">
          <title>Ethical considerations</title>
          <p>
            We are aware of the profound ethical challenges involved in having access to a
large repository that contains information that, if it ends up in the wrong hands,
6 We plan to extend the Comorbidity view demonstrator to encompass the entire
database and to include more functionality, e.g., by adding diagnosis expressions
mined from clinical notes (similar to Roques et al. [
            <xref ref-type="bibr" rid="ref43">43</xref>
            ]).
7 Although the data has already been de-identified in the sense that social security
numbers and names in structured fields have been removed/replaced, the clinical text
may contain names of, e.g., relatives to the patient or phone numbers. Regarding
the sensitive nature of clinical text in the Stockholm EPR Corpus, several studies
have been carried out [
            <xref ref-type="bibr" rid="ref60 ref9">60, 9</xref>
            ]
8
http://www.nordforsk.org/en/news/report-on-nordic-registers-and-biobankslaunched
can cause suffering for the individual patients. For this reason, it is vital that we
continue, as we have been, to communicate with the ethical review board
(Regionala etikprövningsnämnden i Stockholm) regarding our research initiatives.
Approval from the ethical review board needs to be obtained before carrying out
new research or giving access to the Swedish Health Record Research Bank.
          </p>
          <p>
            We have hitherto obtained seven ethical permissions from the regional
ethical board for five different research projects that have been carried out both
internally, together with other Swedish universities, and externally, in a research
network (HEXAnord) and a center of excellence (NIASC), both in a Nordic
context. One of the ethical permissions has an amendment, allowing us to share one
hundred de-identified and pseudonymised health records, in the framework of a
shared task, with other researchers affiliated to an academic institution. These
hundred records are described in Alfalahi et al. [
            <xref ref-type="bibr" rid="ref2">2</xref>
            ]. We are moreover in
continuous contact with the chief medical officer of Karolinska University Hospital on
these matters.
          </p>
          <p>When providing access to sensitive data to a larger group of people, as
HEALTH BANK is intended to do, it is important to have guidelines that
describe how to conduct research with EHR data. These guidelines should describe
various technical details, known problems and solutions, and, perhaps most
importantly, how to write applications to the ethical review board: what the
contents of such applications should be and a description of the required steps for
applying for ethical permission.</p>
          <p>HEALTH BANK will moreover comply with applicable legal requirements
and generally accepted standards. For information security and protection of
patient data, such as:</p>
          <p>Patientdatalag (2008:355), in applicable parts</p>
          <p>Personuppgiftslag (1998:204)
For information security standards, such as:</p>
          <p>ISO/IEC 27001, requirements for information security management system
ISO/IEC 27002, information security standard</p>
          <p>The security of the infrastructure’s technical components will be designed in
accordance with internal and external security requirements with respect to the
risks involved. The security of the infrastructure will moreover be audited on a
regular basis. In addition to addressing information security concerns, there will
be a reference group that will discuss any issues that may arise in relation to
how data is made available through HEALTH BANK. This reference group will
consist of medical experts, researchers, system developers, suppliers of health
management systems and patient organizations.
4.3</p>
        </sec>
        <sec id="sec-6-1-3">
          <title>Potential users</title>
          <p>We believe that interest in HEALTH BANK would be substantial and the
number of potential users large. In the eight years (2007-2015) that we have had
access to EHR data, albeit in a significantly more limited setting than is intended
for HEALTH BANK, we have collaborated with numerous academic
institutions, hospitals and health organizations, pharmaceutical companies, healthcare
management system developers and patient organizations:</p>
          <p>Academic institutions: Stockholm University, Karolinska Institutet,
Karolinska University Hospital, Uppsala University, Gothenburg University,
University of Borås, University of Turku, University of Copenhagen,
DTUDanmarks Tekniske Universitet, NTNU-Trondheim, Vytautas Magnus
University, Lithuania, UC San Diego and University of Utah, USA.
We have also collaborated with several hospitals and organizations: National
Board of Health, (Socialstyrelsen), The Swedish Association of Local
Authorities and Regions (Sveriges Kommuner och Landsting), Stockholm County
Council (Stockholms Läns Landsting), Östergötland County Council
(Landstinget i Östergötland), Uppsala Monitoring Center (UMC).</p>
          <p>Moreover with several several companies: Astra Zeneca (pharmaceutical
company), Capish Knowledge (database and software company), Pygargus
(clinical trials company), TakeCare Compugroup Medical (Electronic patient
records system company).</p>
          <p>Patient organizations as The Swedish Heart and Lung Association
(Hjärtoch Lungsjukas Riksförbund) och the Swedish Rheumatism association
(Svenska Reumatikerförbundet) and Swedish Patient Insurance (Landstingens
Ömsesidiga Försäkringsbolag).</p>
          <p>All of the above organizations are possible users of HEALTH BANK. In
addition to these, the following potential users have been identified:
SciLifeLab (national center of Science for Life Laboratory), partners of SciLifeLab,
BBMRI.se (The Biobanking and Molecular Resource Infrastructure of Sweden),
and the Swedish node of the bioinformatics infrastructure ELIXIR, and
partners of NIASC (The Nordic Center of Excellence in Health-Related e-Sciences),
which aims to connect health records to biobanks and registries. Moreover, the
Vinnova funded project IntergrIT needs to develop tools to perform research
on EHR data. We also believe that HEALTH BANK has the potential to
encourage entrepreneurs to start companies that focus on developing data science
applications in the healthcare domain.
5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Conclusions</title>
      <p>We have here provided an overview of research conducted using a database of
electronic health records – the Stockholm EPR Corpus – demonstrating the
potential of exploiting and reusing such data to create data science applications
that are intended to support and, ultimately, improve healthcare. The ability
to develop such applications, which are often data-intensive, hinges to a great
extent on having access to data, which is currently challenging to obtain. To
realize the full potential of data science applications in the healthcare domain,
health record data needs to be made available to both researchers and industry
actors, such as system developers. To that end, we have outlined a vision to
create an infrastructure, HEALTH BANK, around the Stockholm EPR Corpus,
effectively providing access to EHR data in aggregated as well as non-aggregated
form. However, making sensitive data available to the large number of potential
users requires paying careful attention to various ethical issues and complying
with information security standards and regulations: HEALTH BANK will make
data available in a ready and secure fashion. Supporting users with practical,
legal and ethical guidelines, to perform high quality research. We believe that
HEALTH BANK, by providing a workbench for system development, will
promote a growing industry around the creation of data science applications in
healthcare.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgements</title>
      <p>We thank Karolinska University Hospital for their confidence in giving us
access to the Stockholm EPR Corpus to carry out important research. We would
also like to thank Vinnova - Swedish Agency for Innovation Systems for initial
funding, SSF - Swedish Foundation for Strategic Research, through the project
High-Performance Data Mining for Drug Effect Detection under grant
IIS110053, the Swedish Research Council (project 350-2012-6658), Vårdalstiftelsens
Idéprovning, as well as NIASC-Nordic Center of Excellence in Health-Related
e-Sciences for partial funding of the research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Aanta</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wide</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salanterä</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Patients interpreting the medical language of discharge summaries. Manscript in preparation (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Alfalahi</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brissman</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Pseudonymisation of personal names and other PHIs in an annotated clinical Swedish corpus</article-title>
          . In: Third Workshop on Building and
          <article-title>Evaluating Resources for Biomedical Text Mining (BioTxtM 2012) held in conjunction with LREC 2012</article-title>
          , May 26, Istanbul. pp.
          <fpage>49</fpage>
          -
          <lpage>54</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Axelsson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Borgfeldt</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Cervixcancer</surname>
          </string-name>
          (
          <year>2013</year>
          ), http://www.internetmedicin.se/page.aspx?id=
          <fpage>2735</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derczynski</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pustejovsky</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verhagen</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Semeval-2015 task 6: Clinical tempeval</article-title>
          .
          <source>In: Proceedings of the 9th International Workshop on Semantic Evaluation (SemEval</source>
          <year>2015</year>
          ).
          <article-title>Association for Computational Linguistics (</article-title>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Widdows</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          :
          <article-title>Empirical distributional semantics: methods and biomedical applications</article-title>
          .
          <source>Journal of biomedical informatics 42(2)</source>
          ,
          <fpage>390</fpage>
          -
          <lpage>405</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Clinical text retrieval-an overview of basic building blocks and applications</article-title>
          . In: Professional Search in the Modern World, pp.
          <fpage>147</fpage>
          -
          <lpage>165</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          : Stockholm EPR Corpus:
          <article-title>A clinical database used to improve health care</article-title>
          .
          <source>In: Swedish Language Technology Conference</source>
          . pp.
          <fpage>17</fpage>
          -
          <lpage>18</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>The Stockholm EPR Corpus-Characteristics and Some Initial Findings</article-title>
          .
          <source>In: Proceedings of ISHIMR</source>
          <year>2009</year>
          ,
          <article-title>Evaluation and implementation of e-health and health information initiatives: international perspectives</article-title>
          .
          <source>14th International Symposium for Health Information Management Research</source>
          . pp.
          <fpage>243</fpage>
          -
          <lpage>249</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>De-identifying Swedish clinical text-refinement of a gold standard and experiments with Conditional random fields</article-title>
          .
          <source>J. Biomedical Semantics</source>
          <volume>1</volume>
          ,
          <issue>6</issue>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Dziadek</surname>
          </string-name>
          , J.:
          <article-title>Improving snomed mapping of clinical texts using context-sensitive spelling correction</article-title>
          .
          <source>Master thesis</source>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Ehrentraut</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sparrelid</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Detecting healthcareassociated infections in electronic health records: Evaluation of machine learning and preprocessing techniques</article-title>
          .
          <source>In: Sixth International Symposium on Semantic Mining in Biomedicine (SMBM</source>
          <year>2014</year>
          ). University of Aveiro (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Goldman</surname>
            ,
            <given-names>S.A.</given-names>
          </string-name>
          :
          <article-title>Limitations and strengths of spontaneous reports data</article-title>
          .
          <source>Clinical Therapeutics</source>
          <volume>20</volume>
          ,
          <fpage>C40</fpage>
          -
          <lpage>C44</lpage>
          (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Grigonyte</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wirèn</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Improving Readability of Swedish Electronic Health Records through Lexical Simplification: First Results, booktitle =</article-title>
          <source>Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations - PITR</source>
          . pp.
          <fpage>74</fpage>
          -
          <lpage>83</lpage>
          . Association for Computational Linguistics, Gothenburg,
          <source>Sweden (April</source>
          <year>2014</year>
          ), http://www.aclweb.org/anthology/W14-1209
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          : Something Old,
          <article-title>Something New - Applying a Pre-trained Parsing Model to Clinical Swedish</article-title>
          .
          <source>In: Proc. 18th Nordic Conf. on Comp. Ling. - NODALIDA '11 (May</source>
          <volume>11</volume>
          -13
          <year>2011</year>
          ), http://dspace.utlib.ee/dspace/handle/10062/17355
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Semantic Spaces of Clinical Text: Leveraging Distributional Semantics for Natural Language Processing of Electronic Health Records</article-title>
          .
          <source>Licentiate Thesis</source>
          , Stockholm University (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conway</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duneld</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          :
          <article-title>Identifying synonymy between SNOMED clinical terms of varying length using distributional analysis of electronic health records</article-title>
          .
          <source>In: AMIA Annual Symposium Proceedings</source>
          . pp.
          <fpage>600</fpage>
          -
          <lpage>609</lpage>
          . American Medical Informatics Association (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Generating features for named entity recognition by learning prototypes in semantic space: The case of de-identifying health records</article-title>
          .
          <source>In: International Conference on Bioinformatics and Biomedicine (BIBM)</source>
          . pp.
          <fpage>450</fpage>
          -
          <lpage>457</lpage>
          . IEEE (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Election of diagnosis codes: Words as responsible citizens</article-title>
          .
          <source>In: Proceedings of Louhi Workshop on Health Document Text Mining and Information Analysis</source>
          . pp.
          <fpage>67</fpage>
          -
          <lpage>74</lpage>
          . CEUR-WS (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Exploiting Structured Data, Negation Detection and SNOMED CT Terms in a Random Indexing Approach to Clinical Coding</article-title>
          .
          <source>In: Proceedings of RANLP Workshop on Biomedical Natural Language Processing</source>
          . pp.
          <fpage>3</fpage>
          -
          <lpage>10</lpage>
          . Association for Computational Linguistics (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Optimizing the dimensionality of clinical term spaces for improved diagnosis coding support</article-title>
          .
          <source>In: Proceedings of Louhi Workshop on Health Document Text Mining and Information Analysis</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Diagnosis code assignment support using random indexing of patient records - a qualitative feasibility study</article-title>
          .
          <source>In: Proccedings of Artificial Intelligence in Medicine</source>
          , pp.
          <fpage>348</fpage>
          -
          <lpage>352</lpage>
          . Springer (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Exploration of adverse drug reactions in semantic vector space models of clinical text</article-title>
          .
          <source>In: Proceedings of ICML Workshop on Machine Learning for Clinical Data Analysis</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Daudaravicius</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duneld</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Synonym extraction and abbreviation expansion with ensembles of semantic spaces</article-title>
          .
          <source>J. Biomedical Semantics</source>
          <volume>5</volume>
          (
          <issue>6</issue>
          ) (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duneld</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Conway</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Corpus-driven terminology development: populating Swedish SNOMED CT with synonyms extracted from electronic health records</article-title>
          .
          <source>In: Proceedings of BioNLP</source>
          . pp.
          <fpage>36</fpage>
          -
          <lpage>44</lpage>
          . Association for Computational Linguistics (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          25.
          <string-name>
            <surname>Hirsch</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanenbaum</surname>
            ,
            <given-names>J.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gorman</surname>
            ,
            <given-names>S.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmitz</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hashorva</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ervits</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vawdrey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sturm</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Elhadad</surname>
          </string-name>
          , N.:
          <article-title>HARVEST, a longitudinal patient record summarizer</article-title>
          .
          <source>Journal of American Medical Informatics Association</source>
          <volume>22</volume>
          (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          26.
          <string-name>
            <surname>Howard</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Avery</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Slavenburg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Royal</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pipe</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lucassen</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pirmohamed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Which drugs cause preventable admissions to hospital? a systematic review</article-title>
          .
          <source>British Journal of Clinical Pharmacology</source>
          <volume>63</volume>
          (
          <issue>2</issue>
          ),
          <fpage>136</fpage>
          -
          <lpage>147</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          27.
          <string-name>
            <surname>Humphreys</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Smyth</surname>
          </string-name>
          , E.T.M.:
          <article-title>Prevalence surveys of healthcare-associated infections: what do they tell us</article-title>
          ,
          <source>if anything? Clinical Microbiology and Infection</source>
          <volume>12</volume>
          (
          <issue>1</issue>
          ),
          <fpage>2</fpage>
          -
          <lpage>4</lpage>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          28.
          <string-name>
            <surname>Isenius</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Initial Results in the Development of SCAN: a Swedish Clinical Abbreviation Normalizer</article-title>
          .
          <source>In: Proceedings of the CLEF 2012 Workshop on Cross-Language Evaluation of Methods</source>
          , Applications, and
          <article-title>Resources for eHealth Document Analysis - CLEFeHealth2012</article-title>
          . CLEF, Rome, Italy (
          <year>September 2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          29.
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>P.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>L.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brunak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Mining electronic health records: towards better research applications and clinical care</article-title>
          .
          <source>Nature Reviews Genetics</source>
          <volume>13</volume>
          (
          <issue>6</issue>
          ),
          <fpage>395</fpage>
          -
          <lpage>405</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          30.
          <string-name>
            <surname>Karlsson</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boström</surname>
          </string-name>
          , H.:
          <article-title>Predicting adverse drug events by analyzing electronic patient records</article-title>
          .
          <source>In: Artificial Intelligence in Medicine Lecture Notes in Computer Science</source>
          , pp.
          <fpage>125</fpage>
          -
          <lpage>129</lpage>
          . Springer (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          31.
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tanushi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sparrelid</surname>
          </string-name>
          , E.:
          <source>Automated detection of HealthcareAssociated Infections in Swedish Electronic Health Records. Manscript in preparation (</source>
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          32.
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.:</given-names>
          </string-name>
          <article-title>SCAN: A Swedish Clinical Abbreviation Normalizer</article-title>
          .
          <source>In: Information Access Evaluation</source>
          . Multilinguality, Multimodality, and Interaction, pp.
          <fpage>62</fpage>
          -
          <lpage>73</lpage>
          . Springer (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          33.
          <string-name>
            <surname>Langseth</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luostarinen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bray</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillner</surname>
          </string-name>
          , J.:
          <article-title>Ensuring quality in studies linking cancer registries and biobanks</article-title>
          .
          <source>Acta Oncologica</source>
          <volume>49</volume>
          (
          <issue>3</issue>
          ),
          <fpage>368</fpage>
          -
          <lpage>377</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          34.
          <string-name>
            <surname>Lewis</surname>
            ,
            <given-names>J.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schinnar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bilker</surname>
            ,
            <given-names>W.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Strom</surname>
            ,
            <given-names>B.L.</given-names>
          </string-name>
          :
          <article-title>Validation studies of the health improvement network (thin) database for pharmacoepidemiology research</article-title>
          .
          <source>Pharmacoepidemiology and drug safety 16(4)</source>
          ,
          <fpage>393</fpage>
          -
          <lpage>401</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          35.
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>Y.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , R.A.:
          <article-title>MedTime: A temporal information extraction system for clinical narratives</article-title>
          .
          <source>Journal of Biomedical Informatics</source>
          <volume>46</volume>
          ,
          <fpage>20</fpage>
          -
          <lpage>28</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          36.
          <string-name>
            <surname>Lövestam</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Abbreviations in Swedish Clinical Text - use by three professions</article-title>
          .
          <source>Studies in Health Technology and Informatics</source>
          <volume>205</volume>
          ,
          <fpage>720</fpage>
          -
          <lpage>724</lpage>
          (
          <year>August 2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          37.
          <string-name>
            <surname>Meystre</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savova</surname>
            ,
            <given-names>G.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kipper-Schuler</surname>
            ,
            <given-names>K.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hurdle</surname>
            ,
            <given-names>J.F.</given-names>
          </string-name>
          :
          <article-title>Extracting information from textual documents in the electronic health record: a review of recent research</article-title>
          .
          <source>Yearb Med Inform</source>
          <volume>35</volume>
          ,
          <fpage>128</fpage>
          -
          <lpage>144</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          38.
          <string-name>
            <surname>Muñoz</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kjaer</surname>
            ,
            <given-names>S.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sigurdsson</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          n.,
          <string-name>
            <surname>Iversen</surname>
            ,
            <given-names>O.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hernandez-Avila</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wheeler</surname>
            ,
            <given-names>C.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Perez</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Koutsky</surname>
            ,
            <given-names>L.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tay</surname>
            ,
            <given-names>E.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garcia</surname>
            ,
            <given-names>P.a.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ault</surname>
            ,
            <given-names>K.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Garland</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Leodolter</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Olsson</surname>
            ,
            <given-names>S.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tang</surname>
            ,
            <given-names>G.W.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ferris</surname>
            ,
            <given-names>D.G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Paavonen</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Steben</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>F.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dillner</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Huh</surname>
            ,
            <given-names>W.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Joura</surname>
            ,
            <given-names>E.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kurman</surname>
            ,
            <given-names>R.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Majewski</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Myers</surname>
            ,
            <given-names>E.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villa</surname>
            ,
            <given-names>L.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Taddeo</surname>
            ,
            <given-names>F.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tadesse</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bryan</surname>
            ,
            <given-names>J.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lupinacci</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Giacoletti</surname>
            ,
            <given-names>K.E.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sings</surname>
            ,
            <given-names>H.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>James</surname>
            ,
            <given-names>M.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hesley</surname>
            ,
            <given-names>T.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Barr</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Haupt</surname>
            ,
            <given-names>R.M.:</given-names>
          </string-name>
          <article-title>Impact of Human Papillomavirus (HPV</article-title>
          )-
          <volume>6</volume>
          /11/16/18 Vaccine on
          <article-title>All HPV-Associated Genital Diseases in Young Women</article-title>
          .
          <volume>102</volume>
          ,
          <fpage>325</fpage>
          -
          <lpage>339</lpage>
          (
          <year>2010</year>
          ), http://dx.doi.org/10.1093/jnci/djp534
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          39. National Cancer Institute:
          <article-title>Pathology reports (</article-title>
          <year>2010</year>
          ), http://www.cancer.gov/cancertopics/diagnosis-staging/diagnosis/pathologyreports-fact-sheet
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          40. Nordforsk:
          <article-title>Joint Nordic Registers and Biobanks - A goldmine for health and welfare research</article-title>
          .
          <source>Nordforsk policy paper 5</source>
          (
          <year>2014</year>
          ), http://www.nordforsk.org/en/news/report-on
          <article-title>-nordic-registers-and-biobankslaunched</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          41.
          <string-name>
            <surname>Pestian</surname>
            ,
            <given-names>J.P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brew</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matykiewicz</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hovermale</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Johnson, N.,
          <string-name>
            <surname>Cohen</surname>
            ,
            <given-names>K.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duch</surname>
            ,
            <given-names>W.:</given-names>
          </string-name>
          <article-title>A shared task involving multi-label classification of clinical free text</article-title>
          .
          <source>In: Proceedings of the Workshop on BioNLP 2007: Biological, Translational, and Clinical Language Processing</source>
          . pp.
          <fpage>97</fpage>
          -
          <lpage>104</lpage>
          . Association for Computational Linguistics (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          42.
          <string-name>
            <surname>Reeves</surname>
            ,
            <given-names>R.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ong</surname>
            ,
            <given-names>F.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Matheny</surname>
            ,
            <given-names>M.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Denny</surname>
            ,
            <given-names>J.C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Aronsky</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gobbel</surname>
            ,
            <given-names>G.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Montella</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Speroff</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brown</surname>
          </string-name>
          , S.H.:
          <article-title>Detecting temporal expressions in medical narratives</article-title>
          .
          <source>International Journal of Medical Informatics</source>
          <volume>82</volume>
          ,
          <fpage>118</fpage>
          -
          <lpage>127</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref43">
        <mixed-citation>
          43.
          <string-name>
            <surname>Roque</surname>
            ,
            <given-names>F.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jensen</surname>
            ,
            <given-names>P.B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schmock</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalgaard</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Andreatta</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hansen</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Søeby</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bredkjaer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Juul</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Werge</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          , et al.:
          <article-title>Using electronic patient records to discover disease correlations and stratify patient cohorts</article-title>
          .
          <source>PLoS computational biology 7</source>
          (
          <issue>8</issue>
          ),
          <year>e1002141</year>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref44">
        <mixed-citation>
          44.
          <string-name>
            <surname>Saeed</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Villarroel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Reisner</surname>
            ,
            <given-names>A.T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Clifford</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lehman</surname>
            ,
            <given-names>L.W.</given-names>
          </string-name>
          , Moody, G.,
          <string-name>
            <surname>Heldt</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kyaw</surname>
          </string-name>
          , T.H., Moody, B.,
          <string-name>
            <surname>Mark</surname>
          </string-name>
          , R.G.:
          <article-title>Multiparameter intelligent monitoring in intensive care ii (mimic-ii): A public-access intensive care unit database</article-title>
          .
          <source>Critical care medicine 39(5)</source>
          ,
          <volume>952</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref45">
        <mixed-citation>
          45. SALAR:
          <article-title>Swedish Association of Local Authorities and Regions: Vårdrelaterade infektioner framgångsfaktorer som förebygger</article-title>
          . Stockholm, Sweden. ISBN:
          <fpage>978</fpage>
          -
          <lpage>91</lpage>
          - 7585-109-9, http://webbutik.skl.se/bilder/artiklar/pdf/978-91-7585-109-9.pdf,
          <source>Accessed April</source>
          <volume>10</volume>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref46">
        <mixed-citation>
          46.
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Negation detection in Swedish clinical text: An adaption of NegEx to Swedish</article-title>
          .
          <source>Journal of Biomedical Semantics</source>
          <volume>2</volume>
          (
          <issue>Suppl 3</issue>
          ),
          <source>S3</source>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref47">
        <mixed-citation>
          47.
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ahltorp</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>Vocabulary expansion by semantic extraction of medical terms</article-title>
          .
          <source>In: The 5th International Symposium on Languages in Biology and Medicine (LBM)</source>
          . pp.
          <fpage>63</fpage>
          -
          <lpage>68</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref48">
        <mixed-citation>
          48.
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilsson</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Automatic recognition of disorders, findings, pharmaceuticals and body structures from clinical text: An annotation and machine learning study</article-title>
          .
          <source>In: Journal of Biomedical Informatics</source>
          ,
          <volume>49</volume>
          . pp.
          <fpage>148</fpage>
          -
          <lpage>158</lpage>
        </mixed-citation>
      </ref>
      <ref id="ref49">
        <mixed-citation>
          49.
          <string-name>
            <surname>Smith</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Megyesi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Professional language in Swedish clinical text: Linguistic characterization and comparative studies</article-title>
          .
          <source>Nordic Journal of Linguistics</source>
          <volume>2</volume>
          ,
          <fpage>297</fpage>
          -
          <lpage>327</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref50">
        <mixed-citation>
          50.
          <string-name>
            <surname>Socialstyrelsen</surname>
          </string-name>
          :
          <article-title>The National Board of Health and Welfare</article-title>
          ,
          <source>Diagnosgranskningar utförda i Sverige</source>
          <year>1997</year>
          <article-title>-2005 samt råd inför granskning, (In Swedish)</article-title>
          . http://www.socialstyrelsen.se/Lists/Artikelkatalog/Attachments/9740/2006- 131-30_
          <fpage>200613131</fpage>
          .
          <string-name>
            <surname>pdf</surname>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref51">
        <mixed-citation>
          51.
          <string-name>
            <surname>Spasić</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Livsey</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Keane</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nenadić</surname>
          </string-name>
          , G.:
          <article-title>Text mining of cancer-related information: Review of current status and future directions</article-title>
          .
          <source>I. J. Medical Informatics</source>
          <volume>83</volume>
          (
          <issue>9</issue>
          ),
          <fpage>605</fpage>
          -
          <lpage>623</lpage>
          (
          <year>2014</year>
          ), http://dx.doi.org/10.1016/j.ijmedinf.
          <year>2014</year>
          .
          <volume>06</volume>
          .009
        </mixed-citation>
      </ref>
      <ref id="ref52">
        <mixed-citation>
          52.
          <string-name>
            <surname>Stanfill</surname>
            ,
            <given-names>M.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Williams</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fenton</surname>
            ,
            <given-names>S.H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jenders</surname>
            ,
            <given-names>R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hersh</surname>
            ,
            <given-names>W.R.:</given-names>
          </string-name>
          <article-title>A systematic literature review of automated clinical coding and classification systems</article-title>
          .
          <source>Journal of the American Medical Informatics Association</source>
          <volume>17</volume>
          (
          <issue>6</issue>
          ),
          <fpage>646</fpage>
          -
          <lpage>651</lpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref53">
        <mixed-citation>
          53.
          <string-name>
            <surname>Styler</surname>
            ,
            <given-names>W.I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bethard</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Finan</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Palmer</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pradhan</surname>
            , S., de Groen,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Erickson</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Miller</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lin</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Savova</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pustejovsky</surname>
          </string-name>
          , J.:
          <article-title>Temporal Annotation in the Clinical Domain</article-title>
          .
          <source>Transactions of the Association for Computational Linguistics</source>
          <volume>2</volume>
          ,
          <fpage>143</fpage>
          -
          <lpage>154</lpage>
          (
          <year>2014</year>
          ), https://tacl2013.cs.columbia.edu/ojs/index.php/tacl/article/view/305
        </mixed-citation>
      </ref>
      <ref id="ref54">
        <mixed-citation>
          54.
          <string-name>
            <surname>Sun</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rumshisky</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Uzuner</surname>
          </string-name>
          , Ö.:
          <article-title>Evaluating temporal relations in clinical text: 2012 i2b2 Challenge</article-title>
          . JAMIA
          <volume>20</volume>
          (
          <issue>5</issue>
          ),
          <fpage>806</fpage>
          -
          <lpage>813</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref55">
        <mixed-citation>
          55.
          <string-name>
            <surname>Tanushi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilsson</surname>
          </string-name>
          , G.:
          <article-title>Calculating prevalence of comorbidity and comorbidity combinations with diabetes in hospital care in sweden using a health care record database volume 744</article-title>
          , ISSN:
          <fpage>1613</fpage>
          -
          <lpage>0073</lpage>
          ,
          <fpage>59</fpage>
          -
          <lpage>66</lpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref56">
        <mixed-citation>
          56.
          <string-name>
            <surname>Tanushi</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sparrelid</surname>
          </string-name>
          , E.:
          <article-title>Detection of healthcare-associated urinary tract infection in Swedish electronic health records</article-title>
          .
          <source>Studies in health technology and informatics 207</source>
          ,
          <fpage>330</fpage>
          -
          <lpage>339</lpage>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref57">
        <mixed-citation>
          57.
          <string-name>
            <surname>Tengstrand</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Megyesi</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Duneld</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>EACL - Expansion of Abbreviations in Clinical text</article-title>
          .
          <source>In: Proceedings of the 3rd Workshop on Predicting and Improving Text Readability for Target Reader Populations (PITR)</source>
          . pp.
          <fpage>94</fpage>
          -
          <lpage>103</lpage>
          . Association for Computational Linguistics (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref58">
        <mixed-citation>
          58.
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Shades of Certainty: Annotation and Classification of Swedish Medical Records</article-title>
          .
          <source>Ph.D. thesis</source>
          , Stockholm University (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref59">
        <mixed-citation>
          59.
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Temporal Expressions in Swedish Medical Text - A Pilot Study</article-title>
          .
          <source>In: Proceedings of BioNLP 2014</source>
          . pp.
          <fpage>88</fpage>
          -
          <lpage>92</lpage>
          . Association for Computational Linguistics, Baltimore, Maryland (
          <year>June 2014</year>
          ), http://www.aclweb.org/anthology/W14- 3413
        </mixed-citation>
      </ref>
      <ref id="ref60">
        <mixed-citation>
          60.
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hassel</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nilsson</surname>
            ,
            <given-names>G.H.</given-names>
          </string-name>
          :
          <article-title>Developing a standard for de-identifying electronic patient records written in Swedish: precision, recall and F-measure in a manual and computerized annotation trial</article-title>
          .
          <source>International journal of medical informatics</source>
          <volume>78</volume>
          (
          <issue>12</issue>
          ),
          <fpage>e19</fpage>
          -
          <lpage>e26</lpage>
          (
          <year>2009</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref61">
        <mixed-citation>
          61.
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Skeppstedt</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mowery</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>B.E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chapman</surname>
            ,
            <given-names>W.W.</given-names>
          </string-name>
          :
          <article-title>Cue-based assertion classification for swedish clinical text-developing a lexicon for pycontextswe</article-title>
          .
          <source>Artificial intelligence in medicine 61(3)</source>
          ,
          <fpage>137</fpage>
          -
          <lpage>144</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref62">
        <mixed-citation>
          62.
          <string-name>
            <surname>Walboomers</surname>
            ,
            <given-names>J.M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jacobs</surname>
            ,
            <given-names>M.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manos</surname>
            ,
            <given-names>M.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosch</surname>
            ,
            <given-names>F.X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kummer</surname>
            ,
            <given-names>J.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>K.V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Snijders</surname>
            ,
            <given-names>P.J.F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Peto</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Meijer</surname>
            ,
            <given-names>C.J.L.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Muñoz</surname>
          </string-name>
          , N.:
          <article-title>Human papillomavirus is a necessary cause of invasive cervical cancer worldwide</article-title>
          .
          <source>The Journal of Pathology</source>
          <volume>189</volume>
          (
          <issue>1</issue>
          ),
          <fpage>12</fpage>
          -
          <lpage>19</lpage>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref63">
        <mixed-citation>
          63.
          <string-name>
            <surname>Weegar</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kvist</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sundström</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Brunak</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dalianis</surname>
          </string-name>
          , H.:
          <article-title>Finding Cervical Cancer Symptoms in Swedish Clinical Text using a Machine Learning Approach</article-title>
          and NegEx (2015 submitted)
        </mixed-citation>
      </ref>
      <ref id="ref64">
        <mixed-citation>
          64.
          <string-name>
            <surname>Wester</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jönsson</surname>
            ,
            <given-names>A.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Spigset</surname>
            ,
            <given-names>O.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Druid</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hägg</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Incidence of fatal adverse drug reactions: a population based study</article-title>
          .
          <source>British Journal of Clinical Pharmacology</source>
          <volume>65</volume>
          (
          <issue>4</issue>
          ),
          <fpage>573</fpage>
          -
          <lpage>579</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref65">
        <mixed-citation>
          65. WHO:
          <article-title>International Classification of Diseases (ICD))</article-title>
          , http://www.who.int/classifications/icd/en/, accessed 2014-
          <volume>04</volume>
          -09
        </mixed-citation>
      </ref>
      <ref id="ref66">
        <mixed-citation>
          66.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Asker</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boström</surname>
          </string-name>
          , H.:
          <article-title>Detecting adverse drug events with multiple representations of clinical measurements</article-title>
          .
          <source>In: IEEE International Conference on Bioinformatics and Biomedicine</source>
          . pp.
          <fpage>536</fpage>
          -
          <lpage>543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref67">
        <mixed-citation>
          67.
          <string-name>
            <surname>Zhao</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Henriksson</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Boström</surname>
          </string-name>
          , H.:
          <article-title>Detecting adverse drug events using concept hierarchies of clinical codes</article-title>
          .
          <source>In: IEEE International Conference on Healthcare Informatics (ICHI)</source>
          . pp.
          <fpage>285</fpage>
          -
          <lpage>293</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>