<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Approach for Hybrid-AI-based Models: an Application Study for Semantic Segmentation of Medical Images</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Marco Scarfone</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pierangela Bruno</string-name>
          <email>bruno@mat.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Francesco Calimeri</string-name>
          <email>calimeri@mat.unical.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, University of Calabria</institution>
          ,
          <addr-line>Rende</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Parallel Computing, Answer Set Programming</institution>
          ,
          <addr-line>Deep Learning, Semantic Segmentation, Inductive-</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>In the last decades, Artificial Intelligence (AI) approaches have been fruitfully employed in many tasks; for instance, Deep Learning (DL)-based methods have shown great ability in extracting meaningful features from images, providing valuable support to computer-aided diagnosis and medicine. Including prior knowledge in DL-based approaches could help in making their decisions more powerful, understandable, and explainable. However, even if this combination has raised a lot of interest in the scientific community, still remains an open problem due to several dificulties, for example, in modeling complex domains, handling missing specifications, and identifying the most suitable architecture able to properly combine the two AI worlds.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Semantic image segmentation refers to the task of segmenting an image into regions
corresponding to meaningful objects and then assigning them an object category label [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. In medical
contexts, semantic segmentation of images can be extremely useful to support clinicians in
providing proper diagnosis, identifying pathological conditions, and highlighting image regions
related to a specific disease. In this context, Deep Learning (DL)-based approaches represent
a huge breakthrough, showing a great deal of potential in extracting meaningful information
from diferent types of images (e.g., computed tomography (CT), magnetic resonance imaging
(MRI), endoscopic imaging). These approaches show to be particularly suitable for semantic
segmentation and, in general, for supporting automated diagnosis, surgical scene understanding,
and computer-assisted interventions.
      </p>
      <p>Furthermore, including methods explicitly conceived for modeling prior knowledge in the
DLbased process can improve the quality of the results, paving the way for better interpretability
and explainability of neural networks. Indeed, such approaches have been widely studied and
used in diferent areas of AI, such as planning, probabilistic reasoning, bioinformatics, etc. (see,
e.g., [2, 3, 4]). Also, very recently, ways of combining deductive and inductive approaches have
raised a lot of interest in the scientific community.</p>
      <p>However, AI-based approaches, and, in particular, DL-based, require huge computational
time and even better processors to perform specific operations, especially in computer vision in
which images and videos of high quality are analyzed.</p>
      <p>For this reason, many scientists decided to rely on data parallelism (DP), by partitioning (and
distributing) the workload among the cluster processes across the batch size [5]. In this way,
neural networks can be trained in parallel and, at each batch size, all processes collaborate
to modify local weights that, globally, concur to the ”best” global parameters, so defining the
DL-based model [6].</p>
      <p>Among diferent existing methods, we made use of Message Passing Interface (MPI) [ 7]
which is a standardized communication protocol designed to function on parallel computing
architectures and distributed applications. MPI can be very useful to speed up the learning
phase which could be very slow according, for example, to the number of images received as
input or to the complex network structure [5].</p>
      <p>We propose a parallelization approach to perform semantic segmentation of Laryngeal
endoscopic images [8]. We make use of a hybrid-AI-based model proposed in [9] and [10]. In this
work, the authors combined diferent neural network architectures (i.e., DeepLab-v3, SegNet,
U-Net) and the potential coming from the declarative nature of Answer Set Programming
(ASP) to improve the overall performance via () an ad-hoc loss function and () a proper
post-processing phase. Our approach exploits parallel computing to drastically reduce the
execution time of the baseline work [9], keeping a comparable performance.</p>
      <p>The remainder of the paper is structured as follows. In Section 2 we provide a detailed
description of our approach, which has been assessed via a careful experimental activity, which
is in turn discussed in Section 3; we analyze and discuss results in Section 4, eventually drawing
our conclusions in Section 5.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Methods</title>
      <p>This approach relies on the seminal work that appeared in [9] which is used as the basis of our
parallelization. The authors in [9] proposed a framework to combine DL and ASP-based models
in performing semantic segmentation. Specifically, they used ASP to:
• drive the network’s learning and penalize the misclassification during the training phase:
the ASP-based model is used to quantify a penalty value by comparing the network’s
prediction to medical knowledge and ground truth segmentation. In this way, the approach
is able to express “how wrong” the classification is; this value takes part in defining the
loss function [9].
• improve the quality of the results via ASP-based post-processing. The approach is able
to remove noise (i.e., small “islands” of misclassified pixels) and wrong predicted classes
(i.e., classes which do not respect medical requirements). Specifically, after converting the
network’s prediction into logical rules, the ASP-based model identifies pixels that need to
be removed and, eventually, re-assigns misclassified pixels/elements to the more frequent
class in the neighborhood.</p>
      <p>In this paper, we rely on data decomposition to parallelize the above-described approach and,
in particular, the training process. The workflow of the approach is shown in Fig. 1.</p>
      <sec id="sec-2-1">
        <title>2.1. Parallelization</title>
        <p>MPI ofers two kinds of communication functions: point-to-point and collective. In
particular, collective communications execute an exchange of data with all interested nodes; these
communications can be: () one-to-many where data are sent from the root process to all
others; () many-to-one where the root node receives data from all communicator process; ()
many-to-many where no root node exists and all processes send and receive from all the other
ones.</p>
        <p>In order to allow the parallelization of the approach, the input was split into various
processes through the scatter function to resolve the group of images. To better explain how it
works, we provide the following example. Given three processes ( 0,  1,  2), a tensor-shaped
3002242243 , and 300 images with dimensions 224224 at 3 channels, the scatter method
divides the 300 images into groups of 100 and send them to each process. The resulting processes
 0,  1,  2 have a tensor 1002242243 each. This is allowed thanks to the Single Program
Multiple Data (SPMD) paradigm by which each process performs the same program [11]. Then, using
this paradigm, if we had three processes, we would be able to execute three neural networks
contemporarily and each of them would take as input three diferent batches of images.</p>
        <p>After the data decomposition and during the learning phase, each neural network is able to
communicate and adjourn the learned information to the other networks.</p>
        <p>The communication occurs through the function Allreduce which is a many-to-many
communication and operates similarly to the reduce standard method. Specifically, it executes
mathematical operations on weights calculated from each network and adjourns the status of
the other ones. More in detail, when a process completes its own batch or a specific epoch, it
waits until all the other processes finish and, only after, an average of all weights is computed.</p>
        <p>The average is computed via a personalized operation in which a reducer sums all tensors
with the same key and, after, it averages them and sends the result to each process. In this way,
it is possible to use diferent neural networks in parallel. Each network operates on diferent
data (then, the quantity of images is reduced) and exchanges the information just obtained.</p>
        <p>Particularly, at each iteration, many models are obtained with equal information; these
models correspond to the number of processes started. Afterward, in the next batch, each neural
network is ready to independently update its weights based on its own batch of images, and,
then, the information is exchanged with the other models. An example of this process is shown
in Fig. 2.
2.1.1. ASP
Our approach also provides the possibility to parallelize the ASP model which can be executed
by each process at the same time. Then, ASP works on a sub-group of images in each process,
generating several loss functions for each batch of images and for each process. These
ASPbased loss functions are then added to the loss function obtained by the neural network to define
the final loss, according to [ 9]. To handle parallel computing with ASP, the outputs produced by
neural networks are separately and simultaneously stored. Therefore, each process can access
its own ASP-based output according to the specific batch of images received. This makes the
entire computing process much faster and more eficient.</p>
        <p>Similarly, our approach could allow us to parallelize ASP-based post-processing such that
each process can simultaneously access the rule-based model and the knowledge base describing
the specific batch of images. In this way, each process is able to accurately identify the wrong
classes or noise and re-assign the right locations in the image.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Experimental Activities</title>
      <p>For the experimental analysis, we used the same dataset proposed in [9]: the Laryngeal
Endoscopic Images dataset [8]. It consists of 536 manually segmented in vivo color images of
the larynx captured during two diferent resection surgeries. In particular, the images are
categorized in 7 classes: void, vocal folds, other tissue, glottal space, pathology, surgical tool, and
intubation, corresponding to index 0, 1, 2, 3, 4, 5, 6, respectively.</p>
      <p>In order to ensure a proper comparison w.r.t. the results obtained in [9], we kept the same
configuration. The dataset was split into training (80%) and testing (20%) sets, each network
was implemented in Pytorch using the SGD optimizer and cross-entropy (CE) as a loss function.
In order to complete the experiments, due to time constraints, we reduced the number of epochs
from 1000 to 300; future experiments are planned, with increased limits.</p>
      <p>Also, to assess the efectiveness of our approach we used Intersection-over-Union ( IoU )</p>
      <p>TP
evaluation metric, which is as IoU = TP+FP+FN , where TP is the number of true positive, FP
false positive and FN false negative pixels, respectively.</p>
      <p>We point out that, at present, the results refer to parallel training without the inclusion of
ASP in the learning phase. Furthermore, even if we designed the parallel workflow, currently
the presented post-processing results are obtained without parallelization. Full experiments are
carried out at the time of writing, and the updated results will be released in the near future.</p>
      <sec id="sec-3-1">
        <title>3.1. Parallelization metrics</title>
        <p>Speed-up is an important factor to consider when evaluating a parallel approach. It is computed
as follows:
 −  =
 
 
Where   is the execution time of the sequential algorithm, while   is the parallel time. Two
kinds of speed-up are existing, absolute and relative: relative indicates the speed-up where   is
the time of the algorithm with one process; absolute indicates the speed-up where   is the time
of the best sequential algorithm. For example, if a sequential algorithm needs 10 minutes of
calculation time and a correspondent parallel algorithm needs 2 minutes, we can say that the
speed increases by 5 times.</p>
        <p>Since speed-up measures how fast a parallel algorithm goes, the ideal speed-up is equal to
the number of processes in use. In this case, we talk about linear speed-up. It could happen that
the speed-up stops or drastically reduces the growth when the number of processes increases.
This behavior can be explained by Amdahl’s law [12] which evaluates the maximum value that
the speed-up can reach on a determined algorithm as follows:
 − () =</p>
        <p>1
(1 −  ) + ( )


where  indicates the portion of parallel code, 1 −  indicates the sequential part remaining, and
 is the number of processes. In other words, the speed-up exclusively depends on the number of
sequential portions, independently of the number of processes utilized. In addition, maintaining
constant the number of processes, the parallel part  &lt; 1 , whatever big it is, has as an upper
bound the number of processes; then, linear speed-up occurs when the  − () = 
.</p>
        <p>Another reason why acceleration does not grow linearly is due to overhead. The overhead is
the overload of work due to diferent factors:
• Time to start activity
• Synchronization between processes
• Communication of data
• Libraries and operating, system overload, etc.</p>
        <p>• Time to conclude the activity.</p>
        <sec id="sec-3-1-1">
          <title>The overhead can be calculated via the following formula: At last, another parameter to be considered is eficiency. The eficiency is a value between 0 and 1 and it is computed as:</title>
          <p>ℎ() =  ⋅</p>
          <p>−  
  =



Where  is the number of processes and   is the speed-up with  processes. This relation
indicates the fraction of time in which each element is really utilized. In particular:

• If   &lt; 1 , then the algorithm has a slackening
• If   = 1, then we have a linear speed-up (very dificult)
The eficiency metric is useful to quantify the scaling down. Specifically, if the eficiency remains
constant to the variation of the number of processes, we obtain a linear scaling down. Actually,
by increasing the number of processes  and fixing the problem dimension  the eficiency
decreases (see Fig. 3 (a)), on the contrary, fixing the number of processes and increasing the
problem dimension, the eficiency increases (see Fig.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>3 (b)). The object is therefore to maintain, as previously described, a constant eficiency. Consequently, we need to increase the number of processes, increasing the problem dimension, bringing us to the iso-eficiency concept [ 13].</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Discussion</title>
      <p>Table 4 shows the results achieved using parallel training. We can see that the IoU of neural
networks follows the same trend as the sequential approach used in [9]. However, the class
pathology (i.e., 4), which is considered the most dificult since the lower occurrence in the
dataset [8], achieved a very low IoU value. This is most likely caused by the decrease of
epochs that negatively afects the performance of the network in recognizing this class and,
consequently, in the overall averaged IoU value.</p>
      <p>In general, SegNet and Deeplabv3 show better performance than U-Net and the
postprocessing is able to slightly but systematically improve the image quality in almost all classes.</p>
      <p>A visual example of the results is shown in Fig. 5. These results, which are graphically
compared with raw images and ground truth (GT) segmentation, show the capability of our
approach in assigning the right class to each pixel and removing misclassification errors via</p>
      <sec id="sec-4-1">
        <title>ASP-based post-processing.</title>
        <sec id="sec-4-1-1">
          <title>4.1. Parallelization performance</title>
          <p>Figure 6 shows the execution time for each epoch required to train each network according to
the number of processes used. We can notice that SegNet results in the heaviest network, taking
about 21 minutes to conclude an epoch. However, thanks to parallel processing the execution
time is reduced up to ∼ 7 minutes using 4 processes.</p>
          <p>Figure 7 shows the relative speed-up achieved using the three neural networks. We can notice
that speed-up for two processes reaches a value close to two, similarly using three processes,
meaning that the network’s learning is going two or almost three times faster. Instead, when
using four processes (on the same machine), there are no huge improvements; this could be
explained via Amdahl’s law described in Sec. 3.1.</p>
          <p>We also computed the eficiency trend on the diferent models achieved using the neural
networks, as shown in Fig. 8. SegNet network shows an eficiency value of 92% using two
processes but, when the number of processes increases, at the same problem dimension, the
eficiency starts to decrease. However, the performance of the approach still remains comparable,
showing good results.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>In this work, we proposed a parallel AI-based approach to perform semantic segmentation of
medical images. We used an existing framework as a baseline for our approach. This framework
combines Neural Networks and ASP to define a novel loss function and a post-processing phase.
We performed a thorough experimental analysis; our proposal reduced the execution time and
achieved comparable results w.r.t. baseline approach.</p>
      <p>Actually, the reported results achieved via the parallel approach refer to an experimental
activity performed without the inclusion of the ASP-based model in the training phase. As future
work is concerned, we aim to refine experimental analysis including ASP-based knowledge in the
parallel training phase and investigate misclassification errors and improve the generalization
capability of the model, as well as the overall performance.
[2] C. Dodaro, G. Galatà, A. Grioni, M. Maratea, M. Mochi, I. Porro, An asp-based solution to the
chemotherapy treatment scheduling problem, Theory and Practice of Logic Programming
21 (2021) 835–851.
[3] C. Dodaro, D. Ilardi, L. Oneto, F. Ricca, Deep learning for the generation of heuristics in
answer set programming: A case study of graph coloring, in: International Conference on
Logic Programming and Nonmonotonic Reasoning, Springer, 2022, pp. 145–158.
[4] E. Di Rosa, E. Giunchiglia, M. Maratea, A new approach for solving satisfiability problems
with qualitative preferences, in: ECAI 2008, IOS Press, 2008, pp. 510–514.
[5] A. Castelló, E. S. Quintana-Ortí, J. Duato, Accelerating distributed deep neural network
training with pipelined mpi allreduce, Cluster Computing 24 (2021) 3797–3813.
[6] T. Ben-Nun, T. Hoefler, Demystifying parallel and distributed deep learning: An in-depth
concurrency analysis, ACM Computing Surveys (CSUR) 52 (2019) 1–43.
[7] M. Snir, S. Otto, S. Huss-Lederman, D. Walker, J. Dongarra, Mpi: The complete reference
the mit press, Cambridge, Massachusetts (1996).
[8] M.-H. Laves, J. Bicker, L. A. Kahrs, T. Ortmaier, A dataset of laryngeal endoscopic images
with comparative study on convolution neural network-based semantic segmentation,
International journal of computer assisted radiology and surgery 14 (2019) 483–492.
[9] P. Bruno, F. Calimeri, C. Marte, M. Manna, Combining deep learning and asp-based models
for the semantic segmentation of medical images, in: International Joint Conference on
Rules and Reasoning, Springer, 2021, pp. 95–110.
[10] P. Bruno, F. Calimeri, C. Marte, Dedudeep: An extensible framework for combining deep
learning and asp-based models, in: International Conference on Logic Programming and
Nonmonotonic Reasoning, Springer, 2022, pp. 505–510.
[11] P. Czarnul, Parallel programming for modern high performance computing systems, CRC</p>
      <p>Press, 2018.
[12] G. M. Amdahl, Validity of the single processor approach to achieving large scale computing
capabilities, in: Proceedings of the April 18-20, 1967, spring joint computer conference,
1967, pp. 483–485.
[13] S. Kumar, Introduction to Parallel Programming, Cambridge University Press, 2022.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Mo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liao</surname>
          </string-name>
          ,
          <article-title>Review the state-of-the-art technologies of semantic segmentation based on deep learning</article-title>
          ,
          <source>Neurocomputing</source>
          <volume>493</volume>
          (
          <year>2022</year>
          )
          <fpage>626</fpage>
          -
          <lpage>646</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>