<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <issn pub-type="ppub">1613-0073</issn>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Dual Loss Function for follow-up estimation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Dossena</string-name>
          <email>marco.dossena@uniupo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Christopher Irwin</string-name>
          <email>christopher.irwin@uniupo.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Computer Science Institute (DiSIT), University of Piemonte Orientale</institution>
          ,
          <addr-line>Alessandria</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Autoencoders have emerged as powerful tools for unsupervised representation learning, finding applications in various domains such as computer vision, natural language processing, and anomaly detection. More generally, the latent space reconstruction mechanism can be extended to the reconstruction of any type of data. This extended abstract presents an idea of representation learning applicable to data with an initial instant described by a baseline (input data), and an instant in the future referable to a follow-up (output data). The novel approach consists of combining a construction-focused loss with a classification-driven loss. The proposed hybrid autoencoder architecture aims to simultaneously enhance data reconstruction while learning discriminative features for classification tasks. Initial experimental results demonstrate the eficacy of the proposed hybrid autoencoder on a long-covid dataset.</p>
      </abstract>
      <kwd-group>
        <kwd>manifold learning</kwd>
        <kwd>autoencoder</kwd>
        <kwd>long-COVID syndrome</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>CEUR
ceur-ws.org</p>
    </sec>
    <sec id="sec-2">
      <title>1. Introduction</title>
      <p>
        Traditional data representation techniques often rely on manual feature engineering, which
is labor-intensive, domain-specific, and might miss out on intricate patterns present within
the data. Autoencoders [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ], a class of neural networks, present an appealing solution to this
challenge by enabling the automatic learning of data representations in an unsupervised manner.
The core idea of an autoencoder revolves around dimensionality reduction, wherein the network
learns a compressed representation of the input data that captures its salient features. This
compressed representation, often referred to as ”latent space”, can then be used for various
downstream tasks such as classification, reconstruction, and generation. In our research context,
the input space corresponds to the patient’s description at the time of hospitalization. In
contrast, the output space is expanded to include follow-up data, specifically one year after
hospitalization. This augmentation of the output space serves as a valuable means to inform
and guide the representation of the latent space within our model. Furthermore, to classify
patients who may be sufering from long-covid, we introduce an additional classification head
building upon the hidden patient representation.
      </p>
    </sec>
    <sec id="sec-3">
      <title>2. Methodology</title>
      <p>The autoencoder architecture includes two basic modules: encoder and decoder. In the
following, we will present these two components.</p>
      <p>
        Encoder: this module allows the compression of the initial features space into the latent
representation. In our setting, the encoder is realized by a fully connected layer that maps the
input space into a lower dimensional latent representation. The encoder takes as input a sample
 ∈ ℝ   and outputs a representation ℎ ∈ ℝ ℎ , where   ≫  ℎ . During the encoding phase,
we also introduce non-linearity by applying a non-linear activation function immediately after
the fully connected layer. Finally, to reduce overfitting and improve the model generalization
we also adopt a dropout layer [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ].
      </p>
      <p>Decoder: module that reconstructs the output data starting from the latent representation.
In most settings, this module is designed to complement the encoder, aiming to reconstruct
the input sample  from the latent representation ℎ. However, in our experiments, we chose
to expand the output dimension   to include the features of the patients at follow-up time
  =   +    . In this configuration, the encoder needs to map the input features to a latent
representation that not only compresses the data but also retains suficient information for
accurate reconstruction of the follow-up details during the decoding phase. As a result, the
model can potentially learn patterns that span across the hospitalization and follow-up time.
Lastly, to address the task of classifying patients with or without long-covid syndrome, we
incorporate a classification head into the model.</p>
      <sec id="sec-3-1">
        <title>2.1. Dual loss function</title>
        <p>During the learning process, the model uses a loss function comprised of two separate
components. The first part ℒ is responsible for the reconstruction part of the learning task. In
particular, we resorted to the Mean Squared Error (MSE) loss that calculates the distance
between the reconstructed data and the training samples. The second component denoted by ℒ
is used for the classification of long-covid syndrome, using the Binary Cross Entropy loss (BCE).
We chose to mix these two losses by introducing two coeficients, denoted as  and  , enabling
us to seamlessly balance between a reconstruction and classification regimen. The complete
loss function is described as follows:</p>
        <p>ℒ = ℒ  + ℒ</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>3. Experiments</title>
      <p>During the experiments, we applied the model to a real-world dataset. In the subsequent
sections, we will start by giving an overview of the long-covid scenario, presenting dataset
statistics, and finally, we will discuss the model configuration and performances.</p>
      <sec id="sec-4-1">
        <title>3.1. Long-COVID scenario</title>
        <p>
          Following the characterization in [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], long-COVID-19 syndrome consists of signs and symptoms
(sequelae) consistent with COVID-19 that are present beyond 12 weeks of the onset of acute
COVID-19 infection, and not ascribable to alternative causes (i.e., other diseases). Consider the
syndrome to be defined as the persistence of at least one of such symptoms, where instances
are the patients’ data collected at hospitalization, and the labels are the long-COVID symptoms
persisting at follow-up.
        </p>
        <p>Concerning patient characterization, baseline data indicate 38 features describing the
demographic and medical history of the patient, while hospitalization data (14 features) refer
to the patient’s symptoms at hospitalization (acute COVID-19 onset). Baseline data are not
directly related to COVID-19 infection but are important factors to take into account in order
to make an accurate diagnosis or prediction. Features in the baseline data can be grouped in
terms of demographic characteristics (sex, age, smoking attitude, ...) and of prior comorbidities
(obesity, chronic liver disease, hypertension, anxiety and depression, ...).</p>
        <p>Hospitalization data include the patient’s symptoms at COVID-19 onset (fever, cough, dyspnea,
arthralgia, ...), drugs administered (hydroxychloroquine, monoclonal antibodies, glucocorticoids,
antivirals, ...), and hospitalization information (duration, oxygen administration, ICU intubation,
..). Baseline and hospitalization jointly form the input space.</p>
        <p>The follow-up data (27 features) contains among others the same symptoms present in the
hospitalization but at a diferent instant (one year in the future). The follow-up set combined
with the input space forms the output space.</p>
        <p>
          The original dataset consisted of 324 entries, representing a very limited data scenario for a
deep learning architecture. To augment our dataset with additional samples, we employed the
Synthetic Minority Over-sampling Technique (SMOTE) [
          <xref ref-type="bibr" rid="ref4">4</xref>
          ], resulting in an expanded dataset
containing over 400 samples.
        </p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2. Results</title>
        <p>We conducted a series of tests on the aforementioned dataset to assess whether the latent space
constructed by our model could establish a correlation between hospitalization and follow-up
while simultaneously maintaining discriminative capabilities for identifying cases of long-covid.
Detailed hyperparameters employed during the model training are provided in Table 1.
To address the limitations posed by the dataset size, we adopted a shallow dimension for the
latent space and relatively high dropout probability. This decision is justified by the dataset’s
size, as maintaining a compact representation enhances the model’s generalization capacity
when there are limited examples available.</p>
        <p>The average accuracy achieved in our experiments is 71% ± 0.9. We applied PCA to the latent
space embeddings varying the  parameter of the loss, which governs the classification
contribution to the loss. As illustrated in Figure 1, it is clear that as the value of alpha increases,
the explained variance of the embeddings also rises. This suggests that the clusters become
progressively more linearly separable. The result is especially promising considering the scarcity
of data and the complexity of the multi-class classification problem.</p>
        <sec id="sec-4-2-1">
          <title>3.2.1. Latent space representation.</title>
          <p>
            The dual loss framework enables the creation of a latent space that takes into account the
presence of at least one of the symptoms as a discriminator. This capability is made possible by
the end-to-end architecture of our model. When visualizing the latent space in two dimensions
using t-distributed stochastic neighbor embedding (t-SNE) [
            <xref ref-type="bibr" rid="ref5">5</xref>
            ], we observe the emergence of
two distinct clusters that represent the distribution of samples within the latent space. Figure 2
illustrates these clusters within the latent space, with sample points color-coded to indicate the
presence or absence of symptoms.
          </p>
          <p />
        </sec>
        <sec id="sec-4-2-2">
          <title>Parameter</title>
          <p>Hidden dimension
Latent size
Reconstruction Loss
Learning rate
Dropout</p>
        </sec>
        <sec id="sec-4-2-3">
          <title>Configuration</title>
          <p>128
16
1
0.7
MSE
0.001
0.5</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4. Conclusion</title>
      <p>In conclusion, this paper has introduced a hybrid autoencoder architecture that leverages
both construction-focused and classification-driven loss functions to enhance unsupervised
representation learning. Autoencoders have already proven their utility in various domains,
and this work extends their applicability to data with a temporal aspect. By incorporating both
reconstruction and discriminative feature learning objectives, our approach aims to provide a
comprehensive solution for a wide range of tasks.</p>
      <p>Our initial experiments on a long-covid dataset have yielded promising results, demonstrating
the efectiveness of the proposed hybrid autoencoder in capturing meaningful representations
from sequential data. These findings pave the way for future research in utilizing autoencoders
for time-dependent data and underline the potential impact of this approach in addressing
complex real-world problems.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>M. Dossena and C. Irwin are supported by the National PhD program in Artificial Intelligence
for Healthcare and Life Sciences (Campus Bio-medico University of Rome). We want to thank
A. Chiocchetti and M. Bellan for having provided us with the long-COVID data and for several
fruitful discussions about the case study.</p>
      <p>We want to thank our tutors Luigi Portinale, Annalisa chiocchetti, Luca Piovesan and Stafania
Montani for their support in our PhD journey.</p>
      <p>
        This work has been supported by the “Piano Riparti Piemonte”, Azione n. 173 “INFRA-P.
Realizzazione, raforzamento e ampliamento Infrastrutture di ricerca pubbliche–bando”
INFRAP2-TECHNOMED-HUB n. 378-48 [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>D. E.</given-names>
            <surname>Rumelhart</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. L.</given-names>
            <surname>McClelland</surname>
          </string-name>
          ,
          <source>Learning Internal Representations by Error Propagation</source>
          ,
          <year>1987</year>
          , pp.
          <fpage>318</fpage>
          -
          <lpage>362</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>N.</given-names>
            <surname>Srivastava</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Krizhevsky</surname>
          </string-name>
          ,
          <string-name>
            <surname>I. Sutskever</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Salakhutdinov</surname>
          </string-name>
          ,
          <article-title>Dropout: A simple way to prevent neural networks from overfitting</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>15</volume>
          (
          <year>2014</year>
          )
          <fpage>1929</fpage>
          -
          <lpage>1958</lpage>
          . URL: http://jmlr.org/papers/v15/srivastava14a.html.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>A.</given-names>
            <surname>Nalbandian</surname>
          </string-name>
          , et al.,
          <article-title>Post-acute covid-19 syndrome</article-title>
          .,
          <source>Nature Medicine</source>
          <volume>27</volume>
          (
          <year>2021</year>
          )
          <fpage>601</fpage>
          -
          <lpage>615</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>K. W.</given-names>
            <surname>Bowyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N. V.</given-names>
            <surname>Chawla</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. O.</given-names>
            <surname>Hall</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W. P.</given-names>
            <surname>Kegelmeyer</surname>
          </string-name>
          ,
          <article-title>SMOTE: synthetic minority over-sampling technique</article-title>
          ,
          <source>CoRR abs/1106</source>
          .
          <year>1813</year>
          (
          <year>2011</year>
          ). URL: http://arxiv.org/abs/1106.
          <year>1813</year>
          . arXiv:
          <fpage>1106</fpage>
          .
          <year>1813</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>L. van der</given-names>
            <surname>Maaten</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. E.</given-names>
            <surname>Hinton</surname>
          </string-name>
          ,
          <article-title>Visualizing data using t-sne</article-title>
          ,
          <source>Journal of Machine Learning Research</source>
          <volume>9</volume>
          (
          <year>2008</year>
          )
          <fpage>2579</fpage>
          -
          <lpage>2605</lpage>
          . URL: https://api.semanticscholar.org/CorpusID:5855042.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>TECNOMED-HUB</surname>
            <given-names>webpage</given-names>
          </string-name>
          ,
          <year>2023</year>
          . URL: https://www.tecnomedhub.it.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>