<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>TF-IDF vs Word Embeddings for Morbidity Identification in Clinical Notes: An Initial Study</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>nilo D</string-name>
          <email>danilo@unica.it</email>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Rim H</string-name>
          <email>rim.helaoui@philips.com</email>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Philips Research</institution>
          ,
          <addr-line>Eindhoven</addr-line>
          ,
          <country country="NL">Netherlands</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of Cagliari</institution>
          ,
          <addr-line>Cagliari</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2040</year>
      </pub-date>
      <fpage>0000</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>Today, we are seeing an ever-increasing number of clinical notes that contain clinical results, images, and textual descriptions of patient's health state. All these data can be analyzed and employed to cater novel services that can help people and domain experts with their common healthcare tasks. However, many technologies such as Deep Learning and tools like Word Embeddings have started to be investigated only recently, and many challenges remain open when it comes to healthcare domain applications. To address these challenges, we propose the use of Deep Learning and Word Embeddings for identifying sixteen morbidity types within textual descriptions of clinical records. For this purpose, we have used a Deep Learning model based on Bidirectional Long-Short Term Memory (LSTM) layers which can exploit state-of-the-art vector representations of data such as Word Embeddings. We have employed pre-trained Word Embeddings namely GloVe and Word2Vec, and our own Word Embeddings trained on the target domain. Furthermore, we have compared the performances of the deep learning approaches against the traditional tf-idf using Support Vector Machine and Multilayer perceptron (our baselines). From the obtained results it seems that the latter outperform the combination of Deep Learning approaches using any word embeddings. Our preliminary results indicate that there are specific features that make the dataset biased in favour of traditional machine learning approaches.</p>
      </abstract>
      <kwd-group>
        <kwd>Deep Learning</kwd>
        <kwd>Natural Language Processing</kwd>
        <kwd>Morbidity Detection</kwd>
        <kwd>Word Embeddings</kwd>
        <kwd>Classification</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>
        In these years we are seeing an increment of life expectancy that has also
increased the risk of long-term diseases such as cancer, diabetes, mental health
condition, and other chronic health threats [
        <xref ref-type="bibr" rid="ref10 ref21 ref22 ref3">22, 10, 3, 21</xref>
        ]. Also, one more
disadvantage with long life expectancy is that people can be affected by more than one
disease at a time, increasing the risk of substandard health quality. For instance,
a person suffering from long term diabetes have higher chances of hypertension,
high cholesterol levels, arteries or veins blockage. The World Health
Organization report [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] states that, in a developed country, over 40% of the population
is exposed to at least one long-term health condition including all ages, and
25% percent of the population suffers with multi-morbidity. Furthermore, the
report emphasizes that the high rate of multi-morbidity is directly proportional
to the middle and low-income countries as they do not have funds that should
be invested for enhancing primary care of population [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. All this medical
information is being monitored, and with the advent of Information Technology
(IT) services, a lot of clinical data are continuously being stored within clinical
reports, which might be employed to provide novel healthcare services
worldwide, overcoming issues related to the social or economic condition of people.
Clinical reports may contain various information in the form of numbers (e.g.,
laboratory results), images (e.g., x-ray), or medical descriptions (e.g., surgical
descriptions) that may be used to create content-based services. However, the
whole amount of data is challenging to be analyzed and employed by humans to
provide healthcare services, and the development of computer-based systems to
deal with it has recently gained the attention of the scientific community. For
example, textual data of clinical reports have been explored in tasks such as
classification [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], clustering [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ], and recommendation [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Although
state-of-theart research in this direction has already provided important outcomes, many
challenges still remain open [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ]. Methods based on Deep Learning models and
advanced Word Embedding representations have recently proven to be
state-ofthe-art for many tasks, but their use within many healthcare problems have not
been investigated yet. Therefore, in this paper we investigate the use of a Deep
Learning model with various types of Word Embeddings in order to address the
problem of morbidity detection related to the obesity disease in textual
clinical notes. We performed our research study on the n2c2 3 dataset released for
the i2b24 obesity and co-morbidity detection challenge in 2008, and compared
our results against two baseline approaches where the Term Frequency —
Inverse Document Frequency (TF-IDF) was used to represent clinical notes. More
precisely, the contributions of our paper are:
– We provide a Deep Learning model using Word Embeddings for performing
multi-morbidity detection within clinical notes.
– We analyze three types of Word Embeddings (for the Deep Learning
approaches) and the TF-IDF (for the baselines) representation for modelling
the knowledge of clinical notes.
– We compare the proposed deep learning approaches against two machine
learning methods by using k-fold cross-validation.
– We found out a very high f-measure score of the machine learning baselines
that beats those of the deep learning approaches indicating the occurrences
3 https://n2c2.dbmi.hms.harvard.edu/
4 https://www.i2b2.org/NLP/Obesity/
of representative tokens highly connected with the presence of each morbidity
class.
– We make available all the sources code used for our experiments by a GitHub
repository5.
      </p>
      <p>The remaining of this manuscript is organized as it follows: Section 2 discusses
the role of Deep Learning and Natural Language Processing (NLP) techniques
within the healthcare domain. Section 3 describes the dataset we have used, its
contents and how it was created. Section 4 introduces the methodology we have
applied, and the feature engineering approaches we have designed for the Deep
Learning task. Section 5 presents and discusses the results. Finally, Section 6
draws some conclusions of our preliminary investigation and describes problems
to solve and future research challenges where we are headed.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related Work</title>
      <p>
        These days Artificial Intelligence (AI) and its sub fields such as Deep Learning,
Text Mining, and more in general Machine Learning, are playing a significant
role in clinical decision making and understanding, automatic disease diagnosis,
and therapy assistance [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. Deep Learning applications in healthcare are
contributing with relevant improvements in many fields such as analyzing the blood
samples, detecting heart problems, detecting tumours, and so on [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ]. Moreover,
the high quality performances of Deep Learning models for healthcare issues has
raised positive discussions and interests within the AI community. However, the
use of Deep Learning technologies for the purpose of detecting multi-morbidity
in clinical notes has not been deeply analyzed yet. For example, a relevant
recent work [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] uses Negative Matrix Factorisation (NMF) for simultaneously
mining disease clusters in order to detect relations that exist between morbidity
patterns. Authors demonstrated how the temporal characteristics of the disease
clusters can help mining multi-morbidity networks and generating new
hypotheses for the emergence of various morbidity patterns over time. However, no Deep
Learning methods have been investigated to try to uncover morbidity patterns.
Other works only rely on NLP techniques [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ]. For example, authors in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]
present a method called FREGEX which is based on regular expressions to
extract features from written clinical notes. The use of Deep Learning models for
discovering multi-morbidity linked to the obesity disease has been recently
investigated by [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. The Deep Learning model used in this work presents two layers,
a Convolutional Neural Network (CNN) layer and a Max Pooling layer, that
were fed by using both word and entity embeddings, and allowed the authors
to improve the results that were obtained during the i2b2 obesity challenge in
2008 especially for the intuitive classification task. However, they investigated
only CNN as the main layer of their model and, therefore, the use of Long-Short
Term Memory (LSTM) still needs to be explored. In addition, in literature there
is a strong evidence that Bidirectional layers can infer relevant characteristic
5 https://github.com/vsrana-ai/SmartPhil
from clinical notes [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Hence, in this work we investigate the use of
Bidirectional LSTM layers for the task of multi-morbidity detection. Moreover, we try
to understand whether state-of-the-art Word Embeddings available in literature
can better represent the knowledge of clinical notes for the same task.
3
      </p>
    </sec>
    <sec id="sec-3">
      <title>Dataset Description</title>
      <p>The dataset used for this work is n2c2 obesity data. The n2c2 dataset contains
test and training documents of patients clinical records. The dataset was
completely anonymized by replacing personal and sensitive information of patients
with surrogates. The morbidity classes in the dataset are Asthma, CAD, CHF,
Depression, Diabetes, Gallstones, GERD, Gout, Hypercholesterolemia,
Hypertension, Hypertriglyceridemia, OA, Obesity, OSA, PVD, and Venous Insufficiency.
Each clinical note within the dataset is associated with two types of labels:
intuitive and textual. Textual labels indicate if there is a clear evidence that a clinical
note presents a specific morbidity. On the other hand, intuitive means that there
has been a domain expert (e.g., a physician) who, by reading the clinical note,
was able to infer that the clinical state of a patient can suggest the presence of
the target morbidity. The labels can assume a value in {Y, N, U, Q}, where “Y”
means yes, the patient has the morbidity, “N” means no, the patient does not
have the morbidity, “U” means the morbidity is not mentioned in the record, and
“Q” stands for questionable whether the patient has the morbidity. When more
domain experts disagreed on the morbidity for a specific clinical note, the label
can occasionally assumes the values “Q” or “U”. However, we only consider
clinical notes that clearly showed a morbidity with labels that valued “Y” or “N”.
More specifically, given the set of all clinical notes, let us say M , and an input
morbidity class c, we first selected all clinical notes which had “Y” or “N” as
textual label for the morbidity class c thus building the set N . Then, we added
to the set N all clinical notes in M − N that had “Y” or “N” as intuitive label,
yielding the set N 0.</p>
      <p>The set N 0 was used to create an unique dataset for each morbidity class c.
In doing so, it is straightforward to apply binary classifiers for detecting each
different morbidity in clinical notes. As final step for the dataset preparation, we
have merged in a unique dataset all the notes coming from the training and test
set of the original data. This was necessary in order to perform our experiments
by using the k fold cross-validation approach. The size of each dataset, and the
number of positive and negative labels within them is reported in Table 1.
4</p>
    </sec>
    <sec id="sec-4">
      <title>Methodology</title>
      <p>In this section we briefly describe which pre-processing steps have been
performed on the input texts. Then, we describe the Deep Learning model and
the data representations we employed to uncover the knowledge from our data.
Finally, we report details about the experimental setup we designed.
Morbidity
Asthma
CAD</p>
      <p>CHF
Depression</p>
      <p>Diabetes
Gallstones</p>
      <p>GERD</p>
      <p>Gout
Hypercholesterolemia</p>
      <p>Hypertension
Hypertriglyceridemia</p>
      <p>OA
Obesity</p>
      <p>OSA</p>
      <p>PVD
Venous Insufficiency
The problem we dealt with was a multi-class multi-label classification problem.
We addressed the problem by using binary classifiers for the target classes of our
dataset. In doing so we were able to investigate which classes were more
challenging for automatically inferring the knowledge about morbidity from textual
resources. More precisely, given a target morbidity class c representing a
morbidity and a clinical note text t, our purpose was to infer a function γ(t, c) → l where
l is a binary label that can only assume values in {0, 1}. The label l = 0 means
that the target morbidity c is not associated with the clinical note, whereas l = 1
means that from t that morbidity comes up.
4.2</p>
      <sec id="sec-4-1">
        <title>Pre-processing</title>
        <p>
          In order to prepare our textual data, we have performed the following steps:
1. We transformed our texts in lower case, so that the same word written in
different cases (e.g., Obesity and obesity ) could be represented by the same
string (i.e., obesity ).
2. We tokenized our texts and built a function f , where for each word w the
function f was able to associate an integer index i.
3. We encoded our input texts using bag of word representations into integer
numerical representations. For example, consider the sentence s “the patient
has the diabetes” and a function f that maps “the” to “5”, “patient” to
“34”, “has” to “10”, “diabetes” to “87”. Then, the integer-encoded sentence
sencoded is [
          <xref ref-type="bibr" rid="ref10 ref5 ref5">5, 34, 10, 5, 87</xref>
          ].
4. Because each clinical note might be too long, and therefore difficult to be
represented and potentially causing problems with the training step as there
are not many notes, we limited the number of tokens for each text. It has
been computed as the sum between the average and the standard deviation
of the number of tokens each input text had. For example, imagine to have
four texts with 25, 39, 44, and 80 tokens respectively. Then, the average
length is avg = (25 + 39 + 44 + 80)/4 = 47.00 and the standard deviation
std = 20.29, hence, the length that our method considers is 47 + 20 = 67.
Another alternative that we are already tackling is to break a long clinical
notes in more notes with the same annotations.
        </p>
        <p>
          In addition, as far as the TF-IDF is concerned, we performed the following
steps [
          <xref ref-type="bibr" rid="ref16 ref7">16, 7</xref>
          ] already adopted on this domain in literature:
1. We removed all stopwords by using the NLTK6 library.
2. We remove punctuation and numerical values.
3. We have used the maximum number of features max features (padding
wherever necessary) to design our TF-IDF matrix, so that no information is lost
while creating the matrix. For our dataset, if the number of clinical reports
is m and the overall number of distinct tokens is n, then the TF-IDF matrix
is of dimension {mxn}, where n = max f eatures.
4.3
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>Data Modelling</title>
        <p>
          Within our work, we used Word Embeddings and TF-IDF as representation
models of input data. Word Embeddings are distributed representations that
model words properties in vectors of real numbers and capture syntactic features
and semantic word relationships. They have shown to be suitable to model the
knowledge in many domains. On the other hand, TF-IDF was the most useful
representation method for textual data and, therefore, it remains the baseline
approach for any innovative method for text classification in various domains.
Hence, the data model representations we used within our study are:
– Pre-trained Word2Vec [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]. The Word2Vec algorithm aims to detect the
meaning and semantic relations by studying co-occurrences between words
in a given corpus. Within our work we employed the Word Embeddings7 of
size 300 generated on google news.
– Domain-trained Word2Vec. These domain-trained Word Embeddings
are trained with the same algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] on our morbidity dataset.
Training this kind of embeddings has shown advantages because they can embed
semantics about jargons and specific terms of the target domain. Moreover,
using Word Embeddings trained on the target domain avoids the problem
of having out-of-vocabulary words since the less frequent words also have a
vector representation. We generated these Word Embeddings of size 300 by
using the gensim8 library with 10 epochs and a window size of 5.
6 https://www.nltk.org/
7 https://code.google.com/archive/p/word2vec/
8 https://radimrehurek.com/gensim/
– GloVe [
          <xref ref-type="bibr" rid="ref20">20</xref>
          ]. GloVe generator algorithm was proposed by the Stanford
community in 2014. This algorithm adopts a statistics-based matrix in order
to represent how frequent words appear in a given context, and computes
vectors scores based on co-occurrences of words within contexts. Within our
work, we used the GloVe6B Word Embeddings of size 300.
– TF-IDF. It is a technique of data modelling that computes a weight for
each word which indicates the importance of the word for a given document
within a corpus. TF defines the occurrence of a word w in a document d.
IDF measures the rarity of a word w in the whole document. Equation 1
shows the TF-IDF formula where ciw is the number of occurrences of the
word w in the i-th document di, |di| is the size of the document expressed as
number of words, N is the number of documents in the collection, and nw
is the number of documents where the word w occurs at least once. TF-IDF
values are usually normalized in the range [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ].
        </p>
        <p>T F − IDF (w, di) =
|cdiwi| · log nNw
(1)
4.4</p>
      </sec>
      <sec id="sec-4-3">
        <title>The Deep Learning and TF-IDF Models</title>
        <p>
          Our Deep Learning model is depicted in Figure 1. The model is aimed at
performing binary classification and it takes inspiration from [
          <xref ref-type="bibr" rid="ref1 ref6">6, 1</xref>
          ], where Bidirectional
Long Short Term Memory (BiLSTM) layers fed by Word Embedding
representations were able to well represent the knowledge of target domains and obtained
state-of-the-art results. Differently from these previous models, our model can
be changed to parse different data representations that we decided to employ.
More specifically, in Figure 1 the model has an Embeddings layer as input layer.
Embeddings layer takes the input Word Embeddings of size 300 and accepts
integer-encoded vectors. More specifically, it organizes the input with a matrix
with size (N × M ) where N is the number of encoded texts, and M is the
maximum number of tokens considered for each text. Then, it loads an embeddings
matrix that is used to link the indexes of encoded texts to vector representations.
The output of the Embeddings layer is given to two hidden layers that implement
BiLSTM neural networks. LSTM is a particular kind of recurrent neural network
that is able to store the history of the input data and has already proven to be
able to find patterns in data where the sequence of the information matters. By
using the bidirectional version, the model is able to learn from the input data
both backward and forward patterns that might be not detected parsing the
data in just one direction. The results of backward LSTM and forward LSTM
are combined into a unique result. Finally, the last layer of the model is a
fullyconnected Dense layer whose purpose is to predict the binary label. To compare
our approach against a non deep learning model we have used Support Vector
Machine and Multilayer Perceptron in combination with the TF-IDF, and the
steps of the adopted process are shown in Figure 2. We have applied the steps 1
and 2 of pre-processing explained in Section 4.2, followed by removing the stop
words and special symbols from the input text data. To vectorize the data we
have used Tf-Idf Vectorizer from scikit-learn library. Tf-Idf vectorizer converts
the input text data to a matrix of TF-IDF features (max features). This explains
the second block of Figure 2. The generated matrix from the Tfidf vectorizer is
finally fed to SVM or the Multilayer Perceptron to classify the clinical notes.
The Deep Learning model described in Section 4.4 was fed with all 3 data
representation models introduced in Section 4.3. The model was trained by using
the rmsprop9 optimizer. The number of epochs we used is 20. This number is
due to the fact that the resulting datasets do not have a great number of records
and the convergence of the model is very fast. Finally, in order to prevent biased
performances of our model, given to a non-balanced distribution of the records,
we re-sampled our data by using the 10 fold cross-validation procedure. The
performances were measured by F1 score provided by scikit-learn10 library.
5
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Results And Discussion</title>
      <p>In this section we report the results of our experiments and discuss about
various representation methods we used to capture domain peculiarities. Table 2
reports the results that were obtained by the Deep Learning model fed by the
Word Embedding representations and SVM and Multilayer Perceptron TF-IDF
models. TF-IDF approach was a state-of-the-art method for many years before
the advent of Word Embeddings for textual knowledge representation for
Machine Learning algorithms. Thus, to compare our approach with two baselines
that do not employ Deep Learning technologies, we decided to use TF-IDF and
SVM model and TF-IDF and Multilayer Perceptron model.</p>
      <p>As far as the comparison of Pre-trained Word2Vec and GloVe is concerned,
it comes up that the latter performs better for all morbidity classes among the
deep learning approaches. This indicates that GloVe Word Embeddings enable
the Deep Learning model to better recognize diseases that are described within
the input clinical notes with respect to the other deep learning models. This is
in contrast with previous works where Word2Vec algorithm performed better
for tasks of classification.</p>
      <p>
        Moreover, although in literature domain-specific Word Embeddings proved to
outperform pre-trained embeddings (e.g., in e-learning domain [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), in this paper
we have noticed the other way around. This happens because the dataset that we
used to train the domain-specific embeddings is not big enough to allow to the
algorithm to learn domain peculiarities and, hence, state-of-the-art embeddings,
trained on a lot of texts, are still the best option among the deep learning
methods.
      </p>
      <p>From the results presented in Table 2, it turns out that both the baselines
beat all the deep learning approaches. Reason of that lies behind specific features
that appear alone for categories. When this happens the classical machine
learning algorithms are able to perform the classification with very high precision
thanks to way the feature vector has been built using the TF-IDF technique.
9 https://keras.io/optimizers/
10 https://scikit-learn.org/</p>
    </sec>
    <sec id="sec-6">
      <title>Conclusions and Future Work</title>
      <p>
        In this paper we have investigated a Deep Learning model and employed three
Word Embedding representations in order to recognize morbidity within clinical
notes. The preliminary results indicate that certain features make the
classification strongly biased and classical machine learning approaches are thus able to
perform efficiently the classification. They are also able to outperform the Deep
Learning strategies that have been adopted in combination with several word
embeddings. These results suggest us the next directions we need to explore:
identify the set of features for each category with the highest importance and
that make thus the classification of machine learning methods using the
TFIDF feature engineering strategy easy to be performed. Once we discover these
features, we will include one more action in the preprocessing step in order to
make the dataset less biased. We will therefore repeat the comparisons with deep
learning approaches and analyse again the results of the morbidity classification.
As a further future work, we would like to study other Deep Learning models
that might be morbidity-oriented. One more final direction worth to explore
is the collection and pre-processing of further data. We remind that the used
dataset consists of a few samples, where each of them includes a long text made
of sentences and tokens. A possible strategy to employ would be to consider the
different sentences as further samples each with the same annotation or, maybe,
structure the text as a graph-shape and employ graph embeddings. Finally, we
would also like to make experiments with other Word Embedding representations
such as Bidirectional Encoder Representations from Transformers (BERT) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgement</title>
      <p>The research leading to these results has received funding from the EU’s Marie
Curie training network PhilHumans - Personal Health Interfaces Leveraging
Human-Machine Natural interactions under grant agreement 812882. Moreover,
we gratefully acknowledge the support of NVIDIA Corporation with the
donation of the Titan X GPU used for this research.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Atzeni</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          <article-title>: Multi-domain sentiment analysis with mimicked and polarized word embeddings for human-robot interaction</article-title>
          .
          <source>Future Generation Computer Systems</source>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Barnett</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>S.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Norbury</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Watt</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wyke</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guthrie</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Epidemiology of multimorbidity and implications for health care, research, and medical education: a cross-sectional study</article-title>
          .
          <source>The Lancet</source>
          <volume>380</volume>
          (
          <issue>9836</issue>
          ),
          <fpage>37</fpage>
          -
          <lpage>43</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blum</surname>
            ,
            <given-names>R.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bastos</surname>
            ,
            <given-names>F.I.P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Kabiru</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Le</surname>
            ,
            <given-names>L.C.</given-names>
          </string-name>
          , et al.:
          <article-title>Adolescent health in the 21st century</article-title>
          .
          <source>The Lancet</source>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Bromuri</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zufferey</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hennebert</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schumacher</surname>
            ,
            <given-names>M.:</given-names>
          </string-name>
          <article-title>Multi-label classification of chronically ill patients with bag of words and supervised dimensionality reduction algorithms</article-title>
          .
          <source>Journal of biomedical informatics 51</source>
          ,
          <fpage>165</fpage>
          -
          <lpage>175</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Petkovic</surname>
          </string-name>
          , M. (eds.):
          <source>Data Science for Healthcare - Methodologies and Applications</source>
          . Springer (
          <year>2019</year>
          ). https://doi.org/10.1007/978-3-
          <fpage>030</fpage>
          -05249-2
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6. Dess`ı,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Dragoni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Fenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Marras</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            ,
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <surname>D.R.</surname>
          </string-name>
          :
          <article-title>Evaluating neural word embeddings created from online course reviews for sentiment analysis</article-title>
          .
          <source>In: Proceedings of the 34th ACM/SIGAPP Symposium on Applied Computing</source>
          . pp.
          <fpage>2124</fpage>
          -
          <lpage>2127</lpage>
          . ACM (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Dessi</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Recupero</surname>
            ,
            <given-names>D.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fenu</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Consoli</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Exploiting cognitive computing and frame semantic features for biomedical document clustering</article-title>
          .
          <source>In: SeWeBMeDA@ ESWC</source>
          . pp.
          <fpage>20</fpage>
          -
          <lpage>34</lpage>
          (
          <year>2017</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8. Dess`ı,
          <string-name>
            <given-names>D.</given-names>
            ,
            <surname>Recupero</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.R.</given-names>
            ,
            <surname>Fenu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            ,
            <surname>Consoli</surname>
          </string-name>
          ,
          <string-name>
            <surname>S.:</surname>
          </string-name>
          <article-title>A recommender system of medical reports leveraging cognitive computing and frame semantics</article-title>
          .
          <source>In: Machine Learning Paradigms</source>
          , pp.
          <fpage>7</fpage>
          -
          <lpage>30</lpage>
          . Springer (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Devlin</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chang</surname>
            ,
            <given-names>M.W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lee</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Toutanova</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          :
          <article-title>Bert: Pre-training of deep bidirectional transformers for language understanding</article-title>
          . arXiv preprint arXiv:
          <year>1810</year>
          .
          <volume>04805</volume>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10. of Economic, D., of the United Nations,
          <string-name>
            <surname>S.A.</surname>
          </string-name>
          :
          <source>World Mortality Report. United Nations Publications</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <surname>Flores</surname>
            ,
            <given-names>C.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Figueroa</surname>
            ,
            <given-names>R.L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pezoa</surname>
            ,
            <given-names>J.E.</given-names>
          </string-name>
          :
          <article-title>Fregex: A feature extraction method for biomedical text classification using regular expressions</article-title>
          .
          <source>In: 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC)</source>
          . pp.
          <fpage>6085</fpage>
          -
          <lpage>6088</lpage>
          . IEEE (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Gu</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Feng</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zeng</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mamitsuka</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Efficient semisupervised medline document clustering with mesh-semantic and global-content constraints</article-title>
          .
          <source>IEEE transactions on cybernetics 43(4)</source>
          ,
          <fpage>1265</fpage>
          -
          <lpage>1276</lpage>
          (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Hassaine</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Canoy</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Solares</surname>
            ,
            <given-names>J.R.A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rao</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Rahimi</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Salimi-Khorshidi</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          :
          <article-title>Learning multimorbidity patterns from electronic health records using non-negative matrix factorisation</article-title>
          . arXiv preprint arXiv:
          <year>1907</year>
          .
          <volume>08577</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Jagannatha</surname>
            ,
            <given-names>A.N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
          </string-name>
          , H.:
          <article-title>Bidirectional rnn for medical event detection in electronic health records</article-title>
          .
          <source>In: Proceedings of the conference. Association for Computational Linguistics. North American Chapter. Meeting</source>
          . vol.
          <year>2016</year>
          , p.
          <fpage>473</fpage>
          .
          <string-name>
            <given-names>NIH</given-names>
            <surname>Public Access</surname>
          </string-name>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mishra</surname>
            ,
            <given-names>B.K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mazzara</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
          </string-name>
          , A.:
          <article-title>Prediction of malignant &amp; benign breast cancer: A data mining approach in healthcare applications</article-title>
          . arXiv preprint arXiv:
          <year>1902</year>
          .
          <volume>03825</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Kumar</surname>
            ,
            <given-names>V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Verma</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mittal</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Gromov</surname>
            ,
            <given-names>S.V.</given-names>
          </string-name>
          :
          <article-title>Anatomy of preprocessing of big data for monolingual corpora paraphrase extraction: Source language sentence</article-title>
          .
          <source>Emerging Technologies in Data Mining and Information Security</source>
          <volume>3</volume>
          ,
          <issue>495</issue>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Mercer</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Furler</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moffat</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fischbacher-Smith</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sanci</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          :
          <article-title>Multimorbidity: technical series on safer primary care</article-title>
          . World Health Organization (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Efficient estimation of word representations in vector space</article-title>
          .
          <source>arXiv preprint arXiv:1301.3781</source>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          19.
          <string-name>
            <surname>Panch</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Szolovits</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Atun</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          :
          <article-title>Artificial intelligence, machine learning and health systems</article-title>
          .
          <source>Journal of global health 8(2)</source>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          20.
          <string-name>
            <surname>Pennington</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Socher</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Manning</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          : Glove:
          <article-title>Global vectors for word representation</article-title>
          .
          <source>In: Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP)</source>
          . pp.
          <fpage>1532</fpage>
          -
          <lpage>1543</lpage>
          (
          <year>2014</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          21.
          <string-name>
            <surname>Sawyer</surname>
            ,
            <given-names>S.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Drew</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yeo</surname>
            ,
            <given-names>M.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Britto</surname>
          </string-name>
          , M.T.:
          <article-title>Adolescents with a chronic condition: challenges living, challenges treating</article-title>
          .
          <source>The Lancet</source>
          <volume>369</volume>
          (
          <issue>9571</issue>
          ),
          <fpage>1481</fpage>
          -
          <lpage>1489</lpage>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          22.
          <string-name>
            <surname>Uijen</surname>
            ,
            <given-names>A.A.</given-names>
          </string-name>
          , van de Lisdonk,
          <string-name>
            <surname>E.H.</surname>
          </string-name>
          :
          <article-title>Multimorbidity in primary care: prevalence and trend over the last 20 years</article-title>
          .
          <source>The European journal of general practice 14(sup1)</source>
          ,
          <fpage>28</fpage>
          -
          <lpage>32</lpage>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          23.
          <string-name>
            <surname>Velupillai</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Suominen</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Liakata</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Roberts</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shah</surname>
            ,
            <given-names>A.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Morley</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Osborn</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hayes</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stewart</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Downs</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          , et al.:
          <article-title>Using clinical natural language processing for health outcomes research: Overview and actionable suggestions for future advances</article-title>
          .
          <source>Journal of biomedical informatics 88</source>
          ,
          <fpage>11</fpage>
          -
          <lpage>19</lpage>
          (
          <year>2018</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          24.
          <string-name>
            <surname>Yao</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mao</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>Y.:</given-names>
          </string-name>
          <article-title>Clinical text classification with rule-based features and knowledge-guided convolutional neural networks. BMC medical informatics and decision making 19(3</article-title>
          ),
          <volume>71</volume>
          (
          <year>2019</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>