<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Conference (March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Optimizing DICOM data management with NSGA-G</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Queries Q</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Detail Freq SELECT UID</institution>
          ,
          <addr-line>GeneralTags, GeneralVRs, GeneralNames</addr-line>
          ,
          <institution>Gener- 100 alValues FROM GeneralInfoTable SELECT GeneralTags, count(GeneralValues) FROM GeneralIn- 100 foTable GROUP BY GeneralTags SELECT UID, GeneralNames FROM GeneralInfoTable WHERE 100 GeneralNames = 'Modality' SELECT UID, GeneralVRs FROM GeneralInfoTable WHERE 100 GeneralVRs = 'DA'</institution>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Table 1: Frequency of Queries in Workload W</institution>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Table 2: Attribute Usage Matrix of GeneralInfoTable</institution>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Trung-Dung Le Univ Rennes CNRS</institution>
          ,
          <addr-line>IRISA Lannion</addr-line>
          ,
          <country country="FR">France</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>Verena Kantere University of Ottawa School of Electrical Engineering and Computer Science Ottawa</institution>
          ,
          <country country="CA">Canada</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>26</volume>
      <issue>2019</issue>
      <abstract>
        <p>Cloud-based systems enable to manage ever-increasing medical data. The Digital Imaging and Communication in Medicine (DICOM) standard has been widely accepted to store and transfer the medical data, which uses single (row/column) or hybrid data storage technique (row-column). In particular, hybrid systems leverage the advantages of both techniques and allow to take into account various kinds of queries from full records retrieval (online transaction processing) to analytics (online analytical processing) queries. Additionally, the pay-as-you-go model and elasticity of cloud computing raise an important issue regarding to Multiple Objective Optimization (MOO) to find a data congfiuration according to users preferences such as storage space, processing response time, monetary cost, quality, etc. In such a context, the considerable space of solutions in MOO leads to generation of Pareto-optimal front with high complexity. Pareto-dominated based Multiple Objective Evolutionary Algorithms are often used as an alternative solution, e.g., Non-dominated Sorting Genetic Algorithms (NSGA) which provide less computational complexity. This paper presents NSGA-G, an NSGA based on Grid Partitioning to improve the complexity and quality of current NSGAs and to obtain efficient storage and querying of DICOM hybrid data. Experimental results on DTLZ test problems [10] and DICOM hybrid data prove the relevance of the proposed algorithm.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        A widely international standard between various vendors to
transmit, store, retrieve, print, process and display medical imaging
information is Digital Imaging and Communications in Medicine
(DICOM). Cloud computing makes it possible to manage a
tremendous growth medical data volume. In particular, DICOM data
is also deployed in a cloud by traditional (row/column) [
        <xref ref-type="bibr" rid="ref2 ref27 ref30 ref35">2, 27,
30, 35</xref>
        ] or hybrid (row-column) [
        <xref ref-type="bibr" rid="ref11 ref14 ref29">11, 14, 29</xref>
        ] data storage
technique. The hybrid stores take advantage of both techniques and
take into account various kinds of queries, including Online
analytical processing (OLAP) and Online transaction processing
(OLTP) queries. Some recent works [
        <xref ref-type="bibr" rid="ref11 ref14 ref28 ref29">11, 14, 28, 29</xref>
        ] have been
proposed to optimize the hybrid data configuration. However,
HYRISE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and SAP HANA [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] do not consider the high
volume and sparsity of DICOM data. Besides, the
pay-as-yougo model of DICOM leads to Multiple Objective Optimization
(MOO) problem to find a data configuration according to users
preferences regarding storage space, processing response time,
monetary cost, quality, etc. Moreover, an automatic approach
producing data storage configurations for DICOM data is also
presented in [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]. Authors claimed that the space of candidate
solutions in MOO is large, but did not give any method to find
the optimal hybrid data configurations. The vast space of data
      </p>
      <p>GeneralTags
(a1)
1
1
0
0</p>
      <p>GeneralVRs
(a2)
1
0
0
1</p>
      <p>GeneralNames
(a3)
1
0
1
0</p>
      <p>GeneralValues
(a4)
1
1
0
0</p>
      <p>Conf Typical candidate
data storage
configuration
C1 {UID, a1, a2 , a3, a4} 81,135,145</p>
      <p>=&gt; row store
C2 {UID, a1, a2 , a3, a4} 81,135,145
=&gt; column store</p>
      <p>No. of stored
data cells</p>
      <p>Null No.
ratio of</p>
      <p>
        joins
3.49% 0
configuration candidates in hybrid store system leverages an
alternative solution to find a Pareto-optimal. Evolutionary
Multiobjective Optimization (EMO) [
        <xref ref-type="bibr" rid="ref18 ref22 ref34 ref41 ref8 ref9">8, 9, 18, 22, 34, 41</xref>
        ] based on
Pareto dominance techniques is an approximations approach for
MOO. Among EMO approaches, Non-dominated Sorting
Algorithms (NSGAs) [
        <xref ref-type="bibr" rid="ref40 ref41 ref6 ref9">6, 9, 40, 41</xref>
        ] are potential solutions. However,
the diversity, convergence and computational quality of NSGAs
still need to be improved.
      </p>
      <p>For example, GeneralInfoTable table of DICOM data is the
largest entity table in terms of storage space size for a given
medical dataset. GeneralInfoTable, comprising 16,226,762 tuples
and 4,845,042 MB, is often processed by a workfload W, as
shown in Table 1. The Attribute Usage Matrix of this table is
shown in Table 2. The statistic of null value ratios corresponding
to the attributes in GeneralInfoTable table is described as follows:
GeneralTags (0.0 %), GeneralVRs (0.0 %), GeneralNames (0.0
%), GeneralValues (13.97 %).</p>
      <p>Table 3 shows two original candidates of data configuration of
GenealInTable. Besides, many other candidates can decompose
this table into sub-tables and can be stored in row or column stores
corresponding to four different objective values: null ratio, number
of joins, number of scanned data cells and execution time.</p>
      <p>
        To solve the multi-objective problems above, the problems
are often solved by turning the problem into a single-objective
problem first and then solving that problem. However,
singleobjective problems cannot adequately represent multi-objective
problems [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This approach may significantly changes the
problem nature. In some cases, the problem becomes harder to solve
or certain optimal solutions are not found anymore [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. In
general, Multi-Objective Optimization problem is more complex than
single-objective optimization problem. Moreover, large space of
candidates leads to the necessity of finding a Pareto set of data
configurations in MOO. Besides, generating Pareto-optimal front
is often infeasible due to high complexity [
        <xref ref-type="bibr" rid="ref42">42</xref>
        ]. Therefore, in the
context of hybrid DICOM data storage in clouds, a challenging
problem is how to optimize the hybrid data storage with an ef-fi
cient algorithm.
      </p>
      <p>
        Meanwhile, Evolutionary Algorithms, an alternative to the
Pareto-optimal, look for approximations (set of solutions close to
the optimal front). For example, EMO approaches [
        <xref ref-type="bibr" rid="ref18 ref22 ref34 ref41 ref8 ref9">8, 9, 18, 22, 34,
41</xref>
        ] have been developed based on Pareto dominance techniques.
      </p>
      <p>
        Among EMO approaches, [
        <xref ref-type="bibr" rid="ref6 ref9">6, 9</xref>
        ] proposed Non-dominated
Sorting Algorithms (NSGAs) to decrease the computational
complexity while maintaining the diversity among solutions. The crowding
distance operators are used to maintain the diversity in
NSGAII [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and SPEA-II [
        <xref ref-type="bibr" rid="ref41">41</xref>
        ]. However, the crowding distance operators
need to be replaced because of high complexity and not
unsuitability for the problems of more than two objectives [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
Furthermore, MOEA/D maintains the diversity with more than three
objectives problem [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ]. This algorithm uses an approach based
on decomposition to divide a multiple objectives problem into
various single objective optimization sub-problems. Nevertheless,
MOEA/D can only solves up to four objectives [
        <xref ref-type="bibr" rid="ref33">33</xref>
        ]. Meanwhile,
Deb and Jain [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] proposed a set of reference directions to guide the
search process in NSGA-III. In spite of good quality, NSGA-III
has the highest computational complexity among NSGAs.
      </p>
      <p>
        This paper presents Non-dominated Sorting Algorithm based
on Grid Partitioning (NSGA-G) [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] to improve both quality and
computational efficiency of NSGAs, and also provides an
alternative Pareto-optimal for MOO problem of DICOM hybrid store.
NSGA-G maintains the convergence by keeping the original
generation process and the diversity by randomly selecting solutions
in a Pareto set in sub-groups. A solution is selected by
comparing members in a group, which is created by a Grid Partitioning
in the space of solutions, instead of all members in the space.
NSGA-G improves both quality and computation time to solve
MOO, while inheriting the superior characteristics of NSGAs in
terms of computational complexity. NSGA-G is validated through
experiments on DTLZ problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] in Generational Distance
(GD) [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], Inverted Generational Distance (IDG) and Maximum
Pareto Front Error (MPFE) statistic [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ], comparing with other
NSGAs, such as, NSGA-II, NSGA-III, etc. Furthermore,
NSGAG is also experimented in finding the Pareto-optimal of DICOM
hybrid data configuration.
      </p>
      <p>The remaining of this paper is organized as follows. Section 2
presents the background of our research. NSGA-G is presented in
Section 3, while Sections 4 and 5 present experiments to validate
NSGA-G to DTLZ problems and hybrid DICOM data storage,
respectively. Finally, conclusions and perspectives are presented
in Section 6.
2
2.1</p>
    </sec>
    <sec id="sec-2">
      <title>BACKGROUND</title>
    </sec>
    <sec id="sec-3">
      <title>DICOM</title>
      <p>The international standard of medical data, DICOM, to transfer,
store and display medical imaging information was firstly released
in 1980 to make inter-operable between different manufacturers.</p>
      <p>
        Besides characteristic of BigData, such as volume, variety and
velocity [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ], DICOM has been accessed by various OLAP, OLTP
and mixed workloads. Row stores data associated with a row
together has the advantage of adding/modifying a row and efficiently
reading many columns of a single row at the same moment. This
strategy is suitable for OLTP workload, but wastes I/O costs for
a query which requires few attributes of a table [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ]. In contrast,
column stores (e.g. MonetDB [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] and C-Store [
        <xref ref-type="bibr" rid="ref35">35</xref>
        ]) organize data
by column. A column contains data for a single attribute of a
tuple and stores sequentially on disk. The column stores allow
to read only relevant attributes and efficiently aggregating over
many rows, but only for a few attributes. Although, the column
stores are suitable for read-intensive (OLAP) workloads, their
tuple reconstruction cost in OLTP workloads is higher than row
stores. To improve performance of storing and querying in OLAP,
OLTP and mixed workloads, DICOM data needs to be stored in a
row-column store, called hybrid data storage.
2.2
      </p>
    </sec>
    <sec id="sec-4">
      <title>Hybrid data configuration</title>
      <p>
        Hybrid stores (e.g., HYRISE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], SAP HANA [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], HYTORMO
[
        <xref ref-type="bibr" rid="ref28">28</xref>
        ]) are proposed to optimize the performance of both OLAP
and OLTP workloads. The hybrid store has two processes in
optimizing storage and query.
      </p>
      <p>Data Storage Strategy. The first strategy aims to optimize
query performance and storage space over a mixed OLTP and
OLAP workload by extracting, organizing and storing data in a
manner to reduce space, tuple construction and I/O cost. The data
are organized into entity tables. The tables are decomposed into
multiple sub-tables, which are stored in row or column stores of
the hybrid store. A group of attributes classified as
frequentlyaccessed-together attributes can be stored in a row table. Other
groups are classified as optional attributes and stored in a column
store. Each attribute belongs to one group except that it is used
to join the tables together. This strategy removes the null rows in
tables.</p>
      <p>Query Processing Strategy. In order to improve
performance of query processing in a distributed file system of a cloud
environment, the hybrid store needs to modify sub-tables to reduce
the left-outer joins and irrelevant tuples in the input tables of join
operations. When a query needs attributes from many sub-tables,
the hybrid store should change data configuration to have efficient
query processing in joining operators between sub-tables. The
query performance is negatively impacted if the query execution
needs attributes by joining many tables. The hybrid store needs
to reconstruct result tuples and the storage space will increase to
store surrogate attributes.</p>
      <p>In general, based on a given workload and data specific
information, a large number of candidates of data storage configuration
can be created for a given table. The number of candidates depends
on the attributes, null values in tables, the number of database
engines, etc.
2.3</p>
    </sec>
    <sec id="sec-5">
      <title>Non-dominated Sorting Genetic Algorithms</title>
      <p>
        NSGAs are often used with low computational complexity of
nondominated sorting. At the beginning, a population P0 consisting
of N solutions is initialized. In hybrid data optimization problem,
a population represents a set of candidates of hybrid data
configuration. The space of all candidates is larger than the size of P0.
Each solution belongs to only one non-dominated level (there is
no candidate dominating any solution in level 1, each candidate in
level 2 is dominated by at least one solution in level 1 and so on).
Non-dominated Sorting
Pt
Qt
The binary tournament selection and mutation operators [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]
generate N solutions for the offspring population Q0. After that, 2N
solutions in R0 = P0 ∪ Q0 are selected to multiple sub populations
with different rank or non-dominated level. The next generation
P1 includes N candidates from R0. The first domination principle
is based on non-dominated sorting [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]. A population R0 is
classiifed into different non-domination ranks ( F1, F2 and so on). As a
consequence, N solutions in R0 from rank 1 to k are selected to
prepare the parent population for next-generation P1 and so on.
      </p>
      <p>
        Algorithm 1 shows the population generation in NSGA-II [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]
and NSGA-III [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. At the t th generation, a population Rt = Pt ∪
Qt is formed by a parent Pt and offspring Rt population. Then, 2N
solution in Rt are sorted in ranks F1, F2, etc . The non-dominated
F1 is the best front for the next generation Pt +1. All solutions in
F1 are moved to Pt +1 if the size of the first front F1 is smaller
than N . Thus, all candidates in the next front F2 are moved if
the size of the second front is smaller than N − | F1 | and so on.
At level l , if front Fl cannot be fitted in Pt +1, the process selects
N − Plj−=11 | Fj | remaining solutions in Fl . The procedure is
illustrated in Figure 1.
      </p>
      <p>
        The difference among NSGA-II, NSGA-III and other NSGAs
is the way to select members in the last level Fl . The
crowding distance operator [
        <xref ref-type="bibr" rid="ref41 ref9">9, 41</xref>
        ] is used to select solutions in last
front. However, the crowding distance operator should be replaced
for better performance [
        <xref ref-type="bibr" rid="ref17 ref23">17, 23</xref>
        ] in MOO problems. In particular,
NSGA-II prefers selecting the solutions in low-density area and
rejecting the candidates in high-density area. For example, when
Group
      </p>
      <p>Grid Max Point
Time
the number of solutions needs to be selected for the next
generation is 10, NSGA-II focuses on rejecting solutions in the square
near (1.0, 0.0), as shown in Figure 2.</p>
      <p>
        In a different way, MOEA/D [
        <xref ref-type="bibr" rid="ref40">40</xref>
        ] generates various scalar
optimization subproblems, instead of solving a multiple objectives
problem. The diversity of solutions depends on the way to choose
the scalar objectives. However, the number of neighborhoods
should be defined at the beginning. Furthermore, authors do not
mention the way to estimate good neighborhoods. The diversity
is considered as the selected solution associated with these
different sub-problems. Various versions of MOEA/D approaches are
presented in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. However, they fail to maintain the diversity of
solutions.
      </p>
      <p>
        To keep the diversity, an Evolutionary Many-Objective
Optimization Algorithm Using Reference-point Based Non-Dominated
Sorting Approach [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] (NSGA-III) uses various directions. The
crowding distance operator is replaced by comparing solutions.
NSGA-III generates multiple reference points and each solution
is associated with one of them. However, comparing solutions and
building reference points impact the execution time. NSGA-III
has the better diversity, but the execution time is longer than other
NSGAs. For example, in the problem of two objectives and two
divisions, NSGA-II creates three reference points, (0.0,1.0), (1.0,0.0)
and (0.5,0.5), as shown in Figure 2. After the selection process, the
selected solutions are closed to these three reference points. The
diversity of the population is improved by this approach. However,
comparing all solutions associated with reference points leads to
the high execution time of this algorithm.
      </p>
      <p>
        Furthermore, all approaches compare all candidates in Fl to
move good candidates to the next generation Pt +1. Hence, the
execution time for calculating and comparing becomes significant
when the number of solutions in the last front is huge.
Optimizing data configuration based on queries has been
addressed by systems like HYRISE [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and SAP HANA [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. In
particular, HYRISE can be applied to Customer Relationship
Management (CRM) and SAP HANA uses TPC-H [
        <xref ref-type="bibr" rid="ref36">36</xref>
        ] to experiment
the approach. However, they do not consider the high volume
and sparsity of DICOM data. Besides, HYTORMO [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ] uses
data storage strategy (α , β ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]) and query processing
strategy (θ , λ ∈ [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ]) to automatically generate a data configuration
corresponding to these parameters, but do not provide the
optimal algorithm to choose the best hybrid model or a Pareto data
configuration set.
      </p>
      <p>
        In the problem of the hybrid data configuration, HYTORMO
concerns at least four objectives. In some cases, some objectives
are homogeneous. In the reason of the homogeneity between the
multi-objectives functions, removing an objective do not affect to
the final results of MOO problem. In other cases, the objectives
may be contradictory. For example, the monetary is proportional
to the execution time in the same virtual machine configuration
in a cloud. However, cloud providers usually leases computing
resources that are typically charged based on a per time
quantum pricing scheme [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]. The solutions represent the trade-offs
between time and money. Hence, the execution time and the
monetary cost cannot be homogeneous. As a consequence, the
multiobjective problem cannot be reduced to a mono-objective problem.
Moreover, if we want to reduce the MOO to a mono-objective
optimization, we should have a policy to group all objectives by
the Weighted Sum Model (WSM) [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, estimating the
weights corresponding to different objectives in this model is also
a multi-objective problem.
      </p>
      <p>
        In addition, MOO problems could be solved by MOO
algorithms or WSM [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. However, MOO algorithms are selected
thanks to their advantages when comparing with WSM. The
optimal solution of WSM could be unacceptable, because of an
inappropriate setting of the coefcfiients [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. Furthermore, the
research in [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] proves that a small change in weights may result
in significant changes in the objective vectors and significantly
different weights may produce nearly similar objective vectors.
Moreover, if WSM changes, a new optimization process will be
required. Hence, our system applies a Multi-objective Optimization
algorithm to find a Pareto-optimal solution.
      </p>
      <p>
        As consequence, this paper proposes an approach to find a
Pareto data configuration set of hybrid store for DICOM using
Non-dominated Sorting Genetic Algorithm based on Grid
Partitioning [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ].
3
      </p>
    </sec>
    <sec id="sec-6">
      <title>NSGA-G AND OPTIMIZING DICOM DATA</title>
    </sec>
    <sec id="sec-7">
      <title>MANAGEMENT</title>
      <p>
        NSGA-G [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] is used to improve both diversity and convergence
while having an efficient computation time by reducing the space
of selected good solutions in the truncating process.
      </p>
      <p>At the t th generation of Non-dominated Sorting Algorithms,
Pt represents the parent population with size N and Qt is offspring
population with N members created by Pt . Rt = Pt ∪Qt is a group
in which N members will be selected for Pt +1.
3.1</p>
    </sec>
    <sec id="sec-8">
      <title>NSGA-G</title>
      <p>
        NSGA-G generates the grid points and classifies the solutions in
groups by the nearest smaller and bigger grid points. For instance,
a two-objective problem and grid points are shown in Figure 2. In
this example, the unit of grid point is 0.25. The closest smaller
point of the solution [0.35, 0.45] is [0.25, 0.5] and the nearest
bigger point is [0.5, 0.5]. Grid Min Point and Grid Max Point divide
the solutions in a front into various small groups, as shown in
Figure 2. This division aims to avoid comparing and calculating
multiple objective cost values of all solutions in the last front. All
solutions in a group have the same Grid Min Point and Grid Max
Point. To keep the diversity, a group is selected randomly. A
solution is compared with the others in a group to reduce the execution
time. In this way, only solutions in a group need to be calculated
and compared to select the best candidate, instead of all members
in the last front Fl , as shown in Figure 2. Moreover, randomly
choosing groups maintains the diversity of the population in the
Algorithm 2 Filter front in NSGA-G. [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]
1: function FILTER(Fl , M = N − Plj−=11 Fj )
2: updateIdealPoint()
3: updateIdealMaxPoint()
4: translateByIdealPoint()
5: normalizeByMinMax()
6: createGroups
7: while | Fl |&gt; M do
8: selectRandomGroup()
9: removeMaxSolutionInGroup()
10: end while
11: return Fl
12: end function
removing process. N − Plj−=11 Fj solutions in Fl are moved to the
next generation following this strategy, as shown in Algorithm 2
      </p>
      <p>
        The new origin coordinate is defined in the second line in
Algorithm 2. The maximum objective values are determined in
the third line. All solutions in the space are normalized in range
of [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], as shown in lines 4 and 5. After that, depending on
the grid points, the solutions are divided into different groups.
Randomly selecting a group is the most important characteristic
of the algorithm. This selection helps to avoid comparing and
calculating all solutions in fronts.
      </p>
      <p>Three qualities are used including convergence, diversity and
execution time to estimate the quality of proposed algorithm.</p>
      <p>
        Convergence. The proposed algorithm keeps the convergence
of NSGAs by following the steps of generation process, as shown
in Figure 1. Moreover, the convergence is also improved and
better than the original NSGAs. The experiments of GD [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ] and
IGD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] showing the advantages of the proposed algorithm will
be presented in Session 4.
      </p>
      <p>
        Diversity. NSGAs keep the next generation solutions distributed
in the space of solutions. The proposed approach also guarantees
the diversity by using Grid Partitioning. Assuming that the
problem has N objectives, N ≥ 4, and the last front needs to remove
k solutions. After normalizing all solutions in the last front in
range of [
        <xref ref-type="bibr" rid="ref1">0, 1</xref>
        ], each axis coordinate is divided by n, i.e., the
number of grid, in that range. Thus, the space in that range will have
nN groups. We choose the number of groups in the last front be
nN −1. The diversity of the genetic algorithm is kept by generating
k groups and removing k solutions. The worst solution in each
group is removed by determining the longest distance to the
minimum grid point. Hence, the parameter n of the proposed algorithm
is n = ⌈k1/(N −1) ⌉, where ⌈.⌉ is a ceiling operator.
      </p>
      <p>Computation. In this paper, the proposed algorithm aims to
reduce the computation of selecting good solution by dividing all
solutions in the last front into small groups. A good solution is
selected in a small group, instead of the last front. The selection
process is accelerated by this division in comparison with other
approaches scanning all solutions.
3.2</p>
    </sec>
    <sec id="sec-9">
      <title>Optimizing hybrid data configuration</title>
      <p>A workload W = (A, Q, AU M, F ) comprises four elements
including: a query set Q = {qi | i = 1, ..., m} in workload W executed
over T; an attribute set A = {aj | j = 1, ..., n} of table T; an
Attribute Usage Matrix AU M with size of m×n, where AU M[i, j] = 1
Algorithm 3 Find a data configuration for a table
computing.</p>
      <p>1: function BESTDATACONFIGURATION(Q, W, T, S, B)
2: // Find a Pareto data configuration set of table T and</p>
      <p>Workload W with weight sum model S and Constraint B
3: α ∈ {0; 1} //weight of similarity
4: β ∈ {0; 1} //clustering threshold
5: θ ∈ {0; 1} //merging threshold
6: λ ∈ {0; 1} //data layout threshold
7: AU M ← AttributeU saдeMatrix (W )
8: F ← QueryFrequencies (W )
9: I ← DataSpeci f ic (T )
10: P ← N SGA − G (α , β, θ , λ, AU M, I , F )
11: //Return best candidate in P with weight sum model
12: return BestInPareto(P, S, B)
13: end function</p>
      <sec id="sec-9-1">
        <title>Algorithm 4 Select the best data configuration in S and constraints B.</title>
        <p>P for weights
1: function BESTINPARETO(P, S, B)
2: PB ← p ∈ P |∀n ≤ |B| : cn (p) ≤ Bn
3: if PB , ∅ then
4: return p ∈ PB |C (p) = min(W eiдhtSum(PB , S))
5: else
6: return p ∈ P |C (p) = min(W eiдhtSum(P, S))
7: end if
8: end function
if qi accesses to attribute aj , otherwise AU M[i, j] = 0; a
frequencies set F = { fk | k = 1, ..., m}, where fk is total frequencies
count of qk in workload W.</p>
        <p>
          Vertical partitioning approaches, including affinity-based
algorithm [
          <xref ref-type="bibr" rid="ref32">32</xref>
          ], are widely used in the traditional database. Especially,
affinity-based algorithms use Attribute Usage Matrix and
Frequencies matrices to optimize data in Distributed Database system.
This approach is also used in the hybrid data store. In particular,
HYRISE [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ] and SAP HANA [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ] use the Attribute Usage
Matrix of a table and Frequency of queries in a workload to optimize
the hybrid data configuration.
        </p>
        <p>
          However, HYTORMO [
          <xref ref-type="bibr" rid="ref29">29</xref>
          ] concerns more about data specific
information, a matrix containing the null values of a table. Data
specific information does not appear in the traditional system.
Besides, HYRISE and SAP HANA do not concern the high
volume and sparsity of DICOM data (the null values). HYTORMO
concerns the high volume and sparsity of DICOM and mixed
OLTP/OLAP workloads in the automatic generating hybrid data
configuration.
        </p>
        <p>
          The data specific information is a matrix containing the null
values of table T. The hybrid data configuration is formed by four
parameters including weight of similarity α , clustering threshold
β , merging threshold θ and data layout threshold λ. Depending
on these four parameters, HYTORMO automatic creates a data
configuration of hybrid store. However, the authors did not
optimize the space of solutions of data configuration. Hence, in the
space of four parameters in [
          <xref ref-type="bibr" rid="ref1">0, 1</xref>
          ], we use NSGA-G to look for a
Pareto set of data configuration. Algorithm 3 finds the best data
configuration for a table T. Line 10 generates a Pateto set of data
configuration. After that, the line 12 uses Algorithm 4 to return
the best solution in this set with the weight sum model S and the
constraint B [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-10">
      <title>VALIDATION ON DTLZ TEST PROBLEMS</title>
      <p>
        Many studies on Multi-objective Evolutionary Algorithms (MOEAs)
present test problems, but most of them are either simple or not
scalable. Among them, DTLZ test problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] are useful in
various research activities on MOEAs, such as testing the
performance of a new MOEA, comparing different MOEAs and better
understanding of the working principles of MOEAs. The proposed
algorithm is experimented on DTLZ test problems with other
famous NSGAs to show advantages in convergence, diversity and
execution time.
4.1
      </p>
    </sec>
    <sec id="sec-11">
      <title>Environment</title>
      <p>
        For fair comparison and evaluation, the same parameters are
used, such as simulated binary crossover (30), polynomial
mutation (20), max evaluations (10000) and populations (100), for
eMOEA[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], NSGA-II, MOEA/D[
        <xref ref-type="bibr" rid="ref40">40</xref>
        ], NSGA-III and NSGA-G1.
All algorithms are experimented with the same population size
N = 100 and the maximum evaluation M = 10000. Two types
of problems in DTLZ test problems [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], DTLZ1 and DTLZ3,
with m objectives, m ∈ [
        <xref ref-type="bibr" rid="ref10 ref5">5, 10</xref>
        ], in MOEA framework [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], are
used with 50 independent runnings. All experiments are run in
Open JDK Java 1.8 and on a machine with following parameters:
Intel(R) core(TM) i7-6600U CPU @ 2.60GHz × 4, 16GB RAM.
4.2
      </p>
    </sec>
    <sec id="sec-12">
      <title>Results</title>
      <p>
        To estimate the qualities of algorithms, GD [
        <xref ref-type="bibr" rid="ref37">37</xref>
        ], IGD [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and
MPFE [
        <xref ref-type="bibr" rid="ref38">38</xref>
        ] are applied. GD measures the distance from the
evolved solution to the true Pareto front [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. The quality
measuring both the convergence and diversity is IDG. It estimates
the approximation quality of the Pareto front obtained by MOO
algorithms [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]. The most significant distance between the
individuals in Pareto front and the solutions in the approximation front is
showed in MPFE [
        <xref ref-type="bibr" rid="ref39">39</xref>
        ]. In three experiments, the better quality is
shown by the lower value.
      </p>
      <p>The advantage of NSGA-G, comparing to other NSGAs in
both diversity and convergence, is shown by dividing the space of
solutions into multiple partitions and selecting groups randomly.
The advantages of NSGA-G are presented not only on the diversity
and convergence in GD and IGD, as shown in Tables 4, 6, but
also on the distance between the individuals in Pareto front and
the solutions in the approximated front experiment, i.e., MPFE, as
presented in Table 8. The convergence and diversity of NSGA-G
are often the most or second quality in the tests.</p>
      <p>In high computational problems, NSGA-G outperforms in
forms of the computation time. It is explained by the
comparison among solutions in a group, instead of in the whole space. It
can be seen that NSGA-G has shorter computation time than the
others in the large objective experiments, as shown in Tables 5, 7
and 9.
5</p>
    </sec>
    <sec id="sec-13">
      <title>VALIDATION WITH DICOM DATA</title>
      <p>
        In this session, the proposed algorithm is applied to DICOM
dataset to look for a Pareto data configuration set. The dataset
containing the DICOM files in the white paper by Oracle [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] is
created by six different digital imaging modalities. Its total size is
about 2 terabytes, including 2.4 million images of 20,080 studies.
In particular, DICOM text files are used in [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ], as shown in
Table 11. They are extracted from real DICOM dataset, as shown
      </p>
      <sec id="sec-13-1">
        <title>1https://github.com/dungltr/MOEA</title>
        <p>19.183 MB. Workload WS accessing Study table is shown in
attributes, given by:
Moreover, Table 18 shows the advantage of NSGA-G among vfie</p>
      </sec>
      <sec id="sec-13-2">
        <title>NSGAs in execution times.</title>
        <p>
          Finally, to select the optimal data configuration, the weighted
sum model [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ] can also be applied to Pareto data configuration
set.
        </p>
        <p>In conclusion, despite the best quality algorithm in the case
of DICOM hybrid store, the computation time of NSGA-III is
too long. In contrast, in spite of the second good algorithm, the
execution time of NSGA-G is shorter than the others.
6</p>
      </sec>
    </sec>
    <sec id="sec-14">
      <title>CONCLUSION</title>
      <p>This paper introduced our solution to optimize the storage and
query processing of DICOM files in a hybrid (row-column) store.
Our proposed algorithm, NSGA-G, finds an approximation of
Pareto-optimal with a good trade-off between diversity and
performance. Experiments on DTLZ test problems show the advantages
of NSGA-G. Preliminary experiments on DICOM files in a hybrid
store prove that NSGA-G also provides the best processing time
with interesting results in both diversity and convergence.</p>
      <p>In future work, our approach will be experimented on other
datasets, such as CRM, TPC-H benchmark, etc., to evaluate the
suitability of the proposed algorithm to all kinds of data stored
in row-column store. The solution will also be extended so as
to address medical data management in a cloud federation, with
various cloud providers.</p>
    </sec>
    <sec id="sec-15">
      <title>ACKNOWLEDGMENTS</title>
      <p>The authors would like to thank members of SHAMAN team at
Univ Rennes, CNRS, IRISA and University of Ottawa School
of Electrical Engineering and Computer Science for insightful
comments.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Leonardo</surname>
            <given-names>C. T.</given-names>
          </string-name>
          <string-name>
            <surname>Bezerra</surname>
            , Manuel López-Ibáñez, and
            <given-names>Thomas</given-names>
          </string-name>
          <string-name>
            <surname>Stützle</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>An Empirical Assessment of the Properties of Inverted Generational Distance on Multi-</article-title>
          and
          <string-name>
            <surname>Many-Objective Optimization</surname>
          </string-name>
          .
          <source>In Evolutionary Multi-Criterion Optimization</source>
          . Springer International Publishing, Cham,
          <fpage>31</fpage>
          -
          <lpage>45</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Peter</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Boncz</surname>
            , Marcin Zukowski, and
            <given-names>Niels</given-names>
          </string-name>
          <string-name>
            <surname>Nes</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>MonetDB/X100: Hyper-Pipelining Query Execution</article-title>
          .
          <source>In CIDR 2005, Second Biennial Conference on Innovative Data Systems Research</source>
          . Asilomar, CA, USA,
          <fpage>225</fpage>
          -
          <lpage>237</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>V.</given-names>
            <surname>Chankong</surname>
          </string-name>
          and
          <string-name>
            <given-names>Y.Y.</given-names>
            <surname>Haimes</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>Multiobjective decision making: theory and methodology</article-title>
          . North Holland.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>Carlos</surname>
            <given-names>A. Coello</given-names>
          </string-name>
          <string-name>
            <surname>Coello</surname>
          </string-name>
          and Nareli Cruz Cortés.
          <year>2005</year>
          .
          <article-title>Solving Multiobjective Optimization Problems Using an Artificial Immune System</article-title>
          .
          <source>Genetic Programming and Evolvable Machines</source>
          <volume>6</volume>
          (
          <issue>Jun</issue>
          .
          <year>2005</year>
          ),
          <fpage>163</fpage>
          -
          <lpage>190</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Carlos</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Coello</surname>
            <given-names>Coello</given-names>
          </string-name>
          ,
          <string-name>
            <given-names>David A.</given-names>
            <surname>Van Veldhuizen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Gary</surname>
            <given-names>B.</given-names>
          </string-name>
          <string-name>
            <surname>Lamont</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>Evolutionary Algorithms for Solving Multi-Objective Problems (Genetic and Evolutionary Computation)</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Multi-objective optimization using evolutionary algorithms: an introduction</article-title>
          .
          <source>KanGAL Report</source>
          (
          <year>2011</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>24</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          and Ram Bhushan Agrawal.
          <year>1994</year>
          .
          <article-title>Simulated Binary Crossover for Continuous Search Space</article-title>
          .
          <source>Complex Systems</source>
          <volume>9</volume>
          (
          <year>1994</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>34</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          and
          <string-name>
            <given-names>Himanshu</given-names>
            <surname>Jain</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>An Evolutionary Many-Objective Optimization Algorithm Using Reference-point Based Non-dominated Sorting Approach, Part I: Solving Problems with Box Constraints</article-title>
          .
          <source>IEEEXplore</source>
          <volume>18</volume>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          , Amrit Pratap, Sameer Agarwal, and
          <string-name>
            <given-names>T.</given-names>
            <surname>Meyarivan</surname>
          </string-name>
          .
          <year>2002</year>
          .
          <article-title>A fast and elitist multiobjective genetic algorithm: NSGA-II</article-title>
          .
          <source>IEEE Trans. Evol. Comput</source>
          .
          <volume>6</volume>
          (
          <issue>2002</issue>
          ),
          <fpage>182</fpage>
          -
          <lpage>197</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Kalyanmoy</surname>
            <given-names>Deb</given-names>
          </string-name>
          , Lothar Thiele, Marco Laumanns, and
          <string-name>
            <given-names>Eckart</given-names>
            <surname>Zitzler</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>Scalable Test Problems for Evolutionary Multiobjective Optimization</article-title>
          .
          <source>Evolutionary Multiobjective Optimization. Theoretical Advances and Applications</source>
          (
          <year>2005</year>
          ),
          <fpage>105</fpage>
          -
          <lpage>145</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Franz</surname>
            <given-names>Färber</given-names>
          </string-name>
          , Sang Kyun Cha, Jürgen Primsch, Christof Bornhövd, Stefan Sigg, and
          <string-name>
            <given-names>Wolfgang</given-names>
            <surname>Lehner</surname>
          </string-name>
          .
          <year>2012</year>
          .
          <article-title>SAP HANA Database: Data Management for Modern Business Applications</article-title>
          .
          <source>SIGMOD Rec</source>
          .
          <volume>40</volume>
          (
          <issue>Jan</issue>
          .
          <year>2012</year>
          ),
          <fpage>45</fpage>
          -
          <lpage>51</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>C. M. Fonseca</surname>
            and
            <given-names>P. J.</given-names>
          </string-name>
          <string-name>
            <surname>Fleming</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>An Overview of Evolutionary Algorithms in Multiobjective Optimization</article-title>
          .
          <source>Evolutionary Computation</source>
          <volume>3</volume>
          ,
          <issue>1</issue>
          (Mar.
          <year>1995</year>
          ),
          <fpage>1</fpage>
          -
          <lpage>16</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>Christian</given-names>
            <surname>Glaßer</surname>
          </string-name>
          , Christian Reitwießner, Heinz Schmitz, and
          <string-name>
            <given-names>Maximilian</given-names>
            <surname>Witek</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Approximability and Hardness in Multi-objective Optimization</article-title>
          . In Programs, Proofs, Processes, Fernando Ferreira, Benedikt Löwe, Elvira Mayordomo, and Luís Mendes Gomes (Eds.). Springer Berlin Heidelberg, Berlin, Heidelberg,
          <fpage>180</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Martin</surname>
            <given-names>Grund</given-names>
          </string-name>
          , Jens Krüger, Hasso Plattner, Alexander Zeier, Philippe CudreMauroux, and
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Madden</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>HYRISE: A Main Memory Hybrid Storage Engine</article-title>
          .
          <source>VLDB Endow</source>
          .
          <volume>4</volume>
          (
          <issue>Nov</issue>
          .
          <year>2010</year>
          ),
          <fpage>105</fpage>
          -
          <lpage>116</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Stavros</surname>
            <given-names>Harizopoulos</given-names>
          </string-name>
          , Daniel J. Abadi, and
          <string-name>
            <given-names>Samuel</given-names>
            <surname>Madden</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Performance Tradeoffs in Read-Optimized Databases</article-title>
          .
          <source>VLDB</source>
          (
          <year>2006</year>
          ),
          <fpage>487</fpage>
          -
          <lpage>498</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Florian</given-names>
            <surname>Helff</surname>
          </string-name>
          and
          <string-name>
            <given-names>Laurent</given-names>
            <surname>Orazio</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Weighted Sum Model for MultiObjective Query Optimization for Mobile-Cloud Database Environments</article-title>
          . In EDBT/ICDT Workshops.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>H.</given-names>
            <surname>Ishibuchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Masuda</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y.</given-names>
            <surname>Nojima</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Sensitivity of performance evaluation results by inverted generational distance to reference points</article-title>
          .
          <source>In 2016 IEEE Congress on Evolutionary Computation (CEC)</source>
          .
          <volume>1107</volume>
          -
          <fpage>1114</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>Himanshu</given-names>
            <surname>Jain</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>An evolutionary many-objective optimization algorithm using reference-point based nondominated sorting approach</article-title>
          ,
          <string-name>
            <surname>Part</surname>
            <given-names>II</given-names>
          </string-name>
          :
          <article-title>Handling constraints and extending to an adaptive approach</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>18</volume>
          (
          <year>2014</year>
          ),
          <fpage>602</fpage>
          -
          <lpage>622</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Salman</surname>
            <given-names>A.</given-names>
          </string-name>
          <string-name>
            <surname>Khan</surname>
            and
            <given-names>Shafiqur</given-names>
          </string-name>
          <string-name>
            <surname>Rehman</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Iterative non-deterministic algorithms in on-shore wind farm design: A brief survey</article-title>
          .
          <source>Renewable and Sustainable Energy Reviews</source>
          <volume>19</volume>
          (
          <year>2013</year>
          ),
          <fpage>370</fpage>
          -
          <lpage>384</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>V.</given-names>
            <surname>Khare</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Yao</surname>
          </string-name>
          , and
          <string-name>
            <given-names>K.</given-names>
            <surname>Deb</surname>
          </string-name>
          .
          <year>2003</year>
          .
          <article-title>Performance Scaling of Multi-objective Evolutionary Algorithms</article-title>
          . In Evolutionary
          <string-name>
            <surname>Multi-Criterion Optimization</surname>
          </string-name>
          . Berlin, Heidelberg,
          <fpage>376</fpage>
          -
          <lpage>390</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Herald</surname>
            <given-names>Kllapi</given-names>
          </string-name>
          , Eva Sitaridi,
          <string-name>
            <surname>Manolis M. Tsangaris</surname>
            , and
            <given-names>Yannis</given-names>
          </string-name>
          <string-name>
            <surname>Ioannidis</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Schedule optimization for data processing flows on the cloud</article-title>
          .
          <source>Proceedings of the 2011 international conference on Management of data - SIGMOD '11</source>
          (
          <year>2011</year>
          ),
          <fpage>289</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>J.</given-names>
            <surname>Knowles</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Corne</surname>
          </string-name>
          .
          <year>1999</year>
          .
          <article-title>The Pareto archived evolution strategy: a new baseline algorithm for Pareto multiobjective optimisation</article-title>
          .
          <source>In 1999 Congress on Evolutionary Computation-CEC99 (Cat. No. 99TH8406)</source>
          , Vol.
          <volume>1</volume>
          .
          <fpage>98</fpage>
          -
          <lpage>105</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Mario</given-names>
            <surname>Köppen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Kaori</given-names>
            <surname>Yoshida</surname>
          </string-name>
          .
          <year>2006</year>
          .
          <article-title>Substitute Distance Assignments in NSGA-II for Handling Many-objective Optimization Problems</article-title>
          . (Jan.
          <year>2006</year>
          ),
          <fpage>727</fpage>
          -
          <lpage>741</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>Douglas</given-names>
            <surname>Laney</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>3D Data Management: Controlling Data Volume, Velocity, and Variety</article-title>
          .
          <source>Application Delivery Strategies</source>
          <volume>949</volume>
          (
          <issue>Feb</issue>
          .
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <surname>Trung-Dung</surname>
            <given-names>Le</given-names>
          </string-name>
          ,
          <source>Verena Kantere, and Laurent d'Orazio</source>
          .
          <year>2018</year>
          .
          <article-title>An efcfiient multi-objective genetic algorithm for cloud computing: NSGA-G</article-title>
          .
          <source>In IEEE International Conference on Big Data, Big Data</source>
          <year>2018</year>
          , Seattle, WA, USA, December
          <volume>10</volume>
          -
          <issue>13</issue>
          ,
          <year>2018</year>
          .
          <fpage>3883</fpage>
          -
          <lpage>3888</lpage>
          . https://doi.org/10.1109/BigData.
          <year>2018</year>
          . 8622148
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <surname>MOEA</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>The MOEA Website</article-title>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ). http://moeaframework.org/
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <surname>MySQL</surname>
          </string-name>
          <year>2018</year>
          .
          <article-title>The SQL Website</article-title>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ). https://www.mysql.com/
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Cong-Danh NGUYEN</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Workload- and Data-based Automated Design for a Hybrid Row-column Storage Model and Bloom Filter-based Query Processing for Large-scale DICOM Data Management</article-title>
          .
          <source>Ph.D. Dissertation</source>
          . Clermont Auvergne University.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Danh</given-names>
            <surname>Nguyen-Cong</surname>
          </string-name>
          , Laurent d'Orazio,
          <string-name>
            <given-names>Nga</given-names>
            <surname>Tran</surname>
          </string-name>
          , and
          <string-name>
            <surname>Mohand-Said Hacid</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Storing and Querying DICOM Data with HYTORMO</article-title>
          .
          <source>In Data Management and Analytics for Medicine and Healthcare</source>
          . Springer International Publishing, Cham,
          <fpage>43</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          <source>[30] Oracle</source>
          <year>2018</year>
          .
          <article-title>The Oracle Website</article-title>
          . (
          <year>2018</year>
          ). https://www.oracle.com/
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>An</given-names>
            <surname>Oracle</surname>
          </string-name>
          and
          <string-name>
            <given-names>White</given-names>
            <surname>Paper</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Performance Evaluation of Storage and Retrieval of DICOM Image Content in Oracle Database 11g Using HP Blade Servers</article-title>
          and
          <string-name>
            <given-names>Intel</given-names>
            <surname>Processors</surname>
          </string-name>
          .
          <source>January</source>
          (
          <year>2010</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <given-names>M. Tamer</given-names>
            <surname>Özsu</surname>
          </string-name>
          and
          <string-name>
            <given-names>Patrick</given-names>
            <surname>Valduriez</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Principles of distributed database systems, third edition</article-title>
          .
        </mixed-citation>
      </ref>
      <ref id="ref33">
        <mixed-citation>
          [33]
          <string-name>
            <surname>Haitham</surname>
            <given-names>Seada</given-names>
          </string-name>
          , Mohamed Abouhawwash, and
          <string-name>
            <given-names>Kalyanmoy</given-names>
            <surname>Deb</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Towards a Better Balance of Diversity and Convergence in NSGA-III: First Results</article-title>
          .
          <source>In Evolutionary Multi-Criterion Optimization</source>
          . Springer International Publishing, Cham,
          <fpage>545</fpage>
          -
          <lpage>559</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref34">
        <mixed-citation>
          [34]
          <string-name>
            <given-names>N.</given-names>
            <surname>Srinivas</surname>
          </string-name>
          and
          <string-name>
            <given-names>K.</given-names>
            <surname>Deb</surname>
          </string-name>
          .
          <year>1994</year>
          .
          <article-title>Muiltiobjective Optimization Using Nondominated Sorting in Genetic Algorithms</article-title>
          .
          <source>Evolutionary Computation 2 (Sept</source>
          <year>1994</year>
          ),
          <fpage>221</fpage>
          -
          <lpage>248</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref35">
        <mixed-citation>
          [35]
          <string-name>
            <surname>Mike</surname>
            <given-names>Stonebraker</given-names>
          </string-name>
          , Daniel J. Abadi, Adam Batkin, Xuedong Chen, Mitch Cherniack, Miguel Ferreira, Edmond Lau, Amerson Lin, Sam Madden,
          <string-name>
            <surname>Elizabeth O'Neil</surname>
          </string-name>
          ,
          <string-name>
            <surname>Pat O'Neil</surname>
            , Alex Rasin, Nga Tran, and
            <given-names>Stan</given-names>
          </string-name>
          <string-name>
            <surname>Zdonik</surname>
          </string-name>
          .
          <year>2005</year>
          .
          <article-title>C-store: A Column-oriented DBMS</article-title>
          .
          <source>In International Conference on Very Large Data Bases (VLDB '05)</source>
          . Trondheim, Norway,
          <fpage>553</fpage>
          -
          <lpage>564</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref36">
        <mixed-citation>
          [36]
          <string-name>
            <surname>TPC-H 2018. The TPC-H Website</surname>
          </string-name>
          .
          <article-title>(</article-title>
          <year>2018</year>
          ). http://www.tpc.org/tpch/
        </mixed-citation>
      </ref>
      <ref id="ref37">
        <mixed-citation>
          [37]
          <string-name>
            <surname>David</surname>
            <given-names>A. Van Veldhuizen</given-names>
          </string-name>
          and
          <string-name>
            <surname>Gary B. Lamont</surname>
          </string-name>
          .
          <year>1998</year>
          .
          <article-title>Evolutionary Computation and Convergence to a Pareto Front</article-title>
          .
          <source>Late Breaking Papers at the Genetic Programming 1998 Conference</source>
          (
          <year>1998</year>
          ),
          <fpage>221</fpage>
          -
          <lpage>228</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref38">
        <mixed-citation>
          [38]
          <string-name>
            <surname>David</surname>
            <given-names>A. Van Veldhuizen and David A. Van Veldhuizen. 1999. Multiobjective</given-names>
          </string-name>
          <string-name>
            <surname>Evolutionary</surname>
            <given-names>Algorithms</given-names>
          </string-name>
          : Classifications, Analyses, and New Innovations .
          <source>Technical Report</source>
          . Evolutionary Computation.
        </mixed-citation>
      </ref>
      <ref id="ref39">
        <mixed-citation>
          [39]
          <string-name>
            <given-names>G.G.</given-names>
            <surname>Yen</surname>
          </string-name>
          and
          <string-name>
            <given-names>Z.</given-names>
            <surname>He</surname>
          </string-name>
          .
          <year>2013</year>
          .
          <article-title>Performance Metrics Ensemble for Multiobjective Evolutionary Algorithms</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          (
          <year>2013</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref40">
        <mixed-citation>
          [40]
          <string-name>
            <given-names>Q.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Li</surname>
          </string-name>
          .
          <year>2007</year>
          .
          <article-title>MOEA/D: A Multiobjective Evolutionary Algorithm Based on Decomposition</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>11</volume>
          (
          <year>2007</year>
          ),
          <fpage>712</fpage>
          -
          <lpage>731</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref41">
        <mixed-citation>
          [41]
          <string-name>
            <surname>Eckart</surname>
            <given-names>Zitzler</given-names>
          </string-name>
          , Marco Laumanns, and
          <string-name>
            <given-names>Lothar</given-names>
            <surname>Thiele</surname>
          </string-name>
          .
          <year>2001</year>
          .
          <article-title>SPEA2: Improving the strength Pareto evolutionary algorithm</article-title>
          .
          <source>TIK-report 103</source>
          (
          <year>2001</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref42">
        <mixed-citation>
          [42]
          <string-name>
            <given-names>E.</given-names>
            <surname>Zitzler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Thiele</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Laumanns</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. M.</given-names>
            <surname>Fonseca</surname>
          </string-name>
          , and V. G. da Fonseca.
          <year>2003</year>
          .
          <article-title>Performance assessment of multiobjective optimizers: an analysis and review</article-title>
          .
          <source>IEEE Transactions on Evolutionary Computation</source>
          <volume>7</volume>
          (
          <issue>Apr</issue>
          .
          <year>2003</year>
          ),
          <fpage>117</fpage>
          -
          <lpage>132</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>