Analysis of Centralized Splitting Fork-Join
    Queueing Systems with Heterogeneous Servers
           and Threshold Control Policy

                    Oleg Osipov      and Ekaterina Rogachko

     Saratov State University, 83 Astrakhanskaya Street, Saratov 410012, Russia
             oleg.alex.osipov@gmail.com, rogachkoes@info.sgu.ru


       Abstract. In this paper, we consider a Markovian centralized splitting
       fork-join queueing system with heterogeneous servers and an infinite ca-
       pacity queue. Jobs arrive at the system according to a Poisson process,
       a job consists of multiple homogeneous tasks. Tasks of a job are served
       independently, when all the tasks associated with the job are finished,
       the job service is completed. The system operates under a threshold con-
       trol policy. The control consists in sending tasks to one of idle servers
       or to the queue. The threshold policy uses the slow servers only when
       the queue length reaches certain threshold levels. We perform an exact
       analysis which is based on the matrix-geometric method and obtain ex-
       pressions for the performance measures. Some numerical examples are
       presented. The results can be used for the performance analysis of mul-
       tiprocessor systems and other modern distributed systems.

       Keywords: Fork-Join Queueing Systems · Heterogeneous Servers · Thresh-
       old Control Policy · Matrix Geometric Method · Performance Measures
       · Distributed Computing System


1    Introduction

Fork-join queueing systems are widely used to model parallel and distributed
systems. Some examples of these systems include RAID, distributed web appli-
cations, MapReduce, GRID, multipath routing networks, multiprocessor systems
and etc. In these systems, arriving jobs are split for service by numerous servers
and joined before departure.
    In papers [1,2,3], fork-join queueing systems are applied to the analysis of
MapReduce clusters and multipath routing. Results on modeling of RAID arrays
and distributed databases are provided in [4,5]. Papers [6,7] describe an applica-
tion of fork-join queueing systems to the analysis of multiprocessor systems and
parallel algorithms. Fork-join queueing systems also arise in the performance
analysis of automated manufacturing systems. In these systems, parallelism can
be found in processes such as product assembly and logistic operations that
involve multiple suppliers.


Copyright © 2020 for this paper by its authors. Use permitted under
Creative Commons License Attribution 4.0 International (CC BY 4.0).
   Paper [4] presents an overview of the main results for fork-join and related
queueing models. Most of the papers consider parallel-server fork-join queueing
systems [8,9,10,11,12].
    The parallel-server fork-join queueing system consists of homogeneous servers
[9,10,13] or heterogeneous servers [1,2,8,11,12,14]. An arriving job is split into
a number of tasks, one for each server. Previous research in this area reports three
basic types of parallel-server fork-join queueing systems [15], such as the central-
ized splitting model, the distributed splitting model, and the split-merge model.
    In [11] upon arrival, a job of size S bringing tasks to S ≤ K of K servers is im-
mediately split so that each of its S tasks is allocated to exactly one server. In [14]
each job is split into a number of tasks, one for each free server. Paper [2] con-
siders both deterministic and probabilistic scheduling strategies. Within proba-
bilistic strategy, each task chooses a server with some probability. Instead, the
present paper considers the allocation of tasks between the heterogeneous servers
according to a threshold control policy. Under a threshold policy, a task is ac-
cepted at server i if and only if the number of tasks in queue is at or above
a given threshold qi before acceptance.
    The threshold control policy is used as optimal solution for the job routing
problem named the slow server problem [16,17]. This problem focuses on routing
jobs to heterogeneous servers so as to minimize the average response time of jobs
or to minimize the average number of jobs in the system. As it shown in [16], the
optimal policy which minimizes the average response time of jobs in the system
with two heterogeneous servers is of threshold type, i.e., the fast server should
always be used (as long as there are jobs in the queue), and that the slower server
should only be used if the number of jobs in queue exceeds a certain threshold
value. The generalization of this rule was obtained in [17,18] for the case of multi-
server queueing system with heterogeneous servers. The optimal control policy
has the monotonicity and the threshold property, i.e., there exists a queue length
threshold such that a new server (with the minimal service cost and with the
maximal service rate) should be used only after the queue reaches this threshold.
Later, the results were extended to retrial queueing systems with heterogeneous
servers [19], systems with unreliable servers [20] and with heterogeneous servers
in speed and in quality of resolution [21].
   In this paper, we consider a centralized splitting fork-join queueing model [15]
with heterogeneous servers and a threshold control policy. We apply the matrix-
geometric approach to analyze the considered system. A similar approach was
used in [10,14].
    The remainder of the paper is organized as follows. Section 2 describes the
fork-join queueing system under consideration. In Sections 3 and 4 we obtain the
stationary distribution and the main performance measures. Section 5 provides
an illustration of the above results, we also discuss some control policies and
compare them. Finally, a section of conclusions commenting the main research
contributions of this paper is presented.
2       The Model
We consider a system which consists of M heterogeneous servers Si , i = 1, . . . , M ,
and a FCFS infinite capacity queue as shown in Fig. 1. Jobs arrive at the system
according to a Poisson process with rate λ. Each job is split into d homogeneous
tasks, d ≤ M . An arriving task is placed in the queue. The service times at
server Si have exponential distribution with parameter µi , i = 1, . . . , M , and
µ1 > µ2 > · · · > µM .
    Let b denote the number of waiting tasks in the queue (queue length). At the
arrival times of new tasks the control consists in sending tasks from the queue
to the fastest idle server or leaving them in the queue. A task from the front
of the queue is assigned to a fastest idle server Si , i ∈ {1, . . . , M }, if the queue
length immediately after the arrival is not less than threshold level qi , i.e., qi ≤
b + 1, (1 = q1 ≤ q2 ≤ · · · ≤ qM < qM +1 = ∞).


                                                           µ1


    λ                   ...                                µ2

                                                           ...

                                                          µM

 Fig. 1. Centralized splitting fork-join queueing system with heterogeneous servers.


    When a task finishes its service at server Si , a task from the queue will be
assigned to server Si if the queue length satisfies b ≥ qi . Otherwise server Si will
be idle. Tasks of a job are served independently, when all the tasks associated
with the job are finished, the job service is completed.

3       The Stationary Distribution
The system state is described by a vector x = (r, n0 , n1 , . . . , nM ), where r =
r(b) = b div d denotes the number of waiting jobs in the queue, n0 = n0 (b) =
b mod d is the number of tasks for the served job in the queue,
                             (
                               0, if server Si is idle,
                        ni =
                               1, if server Si is busy.

   The process {x(t), t ≥ 0} is a (M + 2)-dimensional continuous-time Markov
chain on the state space X .
    For each state x, we denote by F(x) and F̄(x) the sets of idle and busy
server indices, respectively,

              F(x) = {i > 0 : ni = 0},              F̄(x) = {i > 0 : ni = 1}.

   Denote by γ(b) the mandatory (minimal) number of busy servers for queue
length b. We can write
                          
                          0,
                                            b = 0,
                    γ(b) = M,                b ≥ qM ,
                          
                            max{i : qi ≤ b}, otherwise.
                          

   Define ϕ(x) by
                    (
                     the minimal element of F(x), if F(x) 6= ∅,
             ϕ(x) =
                     ∞,                           otherwise.
If there are idle servers in the system, a task will be assigned to server Sϕ(x) .
    Let us define now the transition rate q(x, x0 ) from state x ∈ X to state x0 ∈
X . Each transition is associated with an event in the system, there are two types
of events.

1. A job arrives and splits into d tasks
   (a) if b ≥ qM ,

                               q (x, (r + 1, n0 , n1 , . . . , nM )) = λ,                 (1)
   (b) if b < qM ,

                                             q(x, x0 ) = λ,                               (2)
                 0     (d)
        where x = x can be obtained from the following recurrence relation
                 (                               (i)             (i)
                                                                      
                       (i)             (i)
                    r(b    + 1), n0 (b     + 1), n1   , . . . , n M ,    ϕ(x(i) ) > γ(b(i) + 1),
        x(i+1) =
                  x(i) + e2+ϕ(x(i) ) ,                                  otherwise,
                                 (0)
       with the initial
                      value x = x, i = 0, . . . , d − 1.
                             (i)      (i)                  (i)
       Here, x = r , n0 , . . . , nM and b(i) = dr(i) + n0 ; ej denotes the
                (i)     (i)

       vector of the appropriate dimension with all components equal to 0,
       except the jth, which is 1.
2. A task finishes its service at server Si (ni = 1)
   (a) if b < qi ,

                      q (x, (r, n0 , n1 , . . . , ni−1 , 0, ni+1 , . . . , nM )) = µi ,   (3)

   (b) if n0 > 0, {i ∈ F̄(x) : b ≥ qi } =
                                        6 ∅,
                                                                       X
                       q (x, (r, n0 − 1, n1 , . . . , nM )) =                      µi ,   (4)
                                                                  i∈F̄ (x): b≥qi
    (c) if r > 0, n0 = 0, {i ∈ F̄(x) : b ≥ qi } =
                                                6 ∅,
                                                                                   X
                            q(x, (r − 1, d − 1, n1 , . . . , nM )) =                            µi .           (5)
                                                                               i∈F̄ (x): b≥qi

   Let state space X be lexicographically ordered. The subset Xl is called level l,
l = 0, 1, . . . , and defined as

                             Xl = {(r, n0 , n1 , . . . , nM ) ∈ X : r = l}.

The cardinality L of X0 is

        L = 2M + (q2 − q1 )2M −1 + (q3 − q2 )2M −2 + · · · + (d − qσ0 )2M −σ0 ,

where σ0 = γ(d − 1).
    Let r̄ be the minimal number r ∈ {1, 2, . . . } such that rd ≥ qM . The cardi-
nalities of Xl , l ≥ r̄ are d.
    The cardinalities of Xl , 0 < l < r̄ are
                −                                 −
  Kl = 2M −σl (qσ− +1 − ld) + 2M −σl −1 (qσ− +2 − qσ− +1 ) + · · · +
                        l                                         l        l

                                          M −σl+ +1                                     +
                                     +2                (qσ+ − qσ+ −1 ) + 2M −σl (ld + d − qσ+ ),
                                                              l       l                                    l


where
                               σl− = γ(ld),         σl+ = γ(ld + d − 1).
   The system states are presented in Table 1.


                                       Table 1. System states.


                     States (r, n0 , n1 , . . . , nM )
                                                                                   Number of states
        r                n0                        n1 , . . . , nM
                      0                              n1 , . . . , nM                       2M
                 q1 ≤ n0 < q2                      1, n2 , . . . , nM                (q2 − q1 )2M −1
        0
                     ...                                   ...                             ...
               qσ0 ≤ n0 ≤ d − 1            1, . . . , 1, nσ0 +1 , . . . , nM        (d − qσ0 )2M −σ0
                                                                                        −
             0 ≤ n0 < qσ− +1 − ld 1, . . . , 1, nσ− +1 , . . . , nM    2M −σl (qσ− +1 − ld)
                           l                      l                                 l
              qσ− +1 − ld ≤ n0 ,
  0 < l < r̄                                                         M −σl− −1
                l
              n0 < qσ− +2 − ld    1, . . . , 1, nσ− +2 , . . . , nM 2          (qσ− +2 − qσ− +1 )
                                                  l                               l        l
                      l
                       ...                       ...                             ...
                                                                             +
              qσ+ − ld ≤ n0 < d 1, . . . , 1, nσ+ +1 , . . . , nM     2M −σl (ld + d − qσ+ )
                 l                                        l                                            l
    l ≥ r̄      0 ≤ n0 ≤ d − 1                        1, . . . , 1                              d


  It is easily shown that for Xl , l > 0, there are transitions to Xl−1 and Xl+1 .
Thus, the Markov chain {x(t), t ≥ 0} is a quasi-birth-and-death (QBD) process,
the generator matrix Q of the process has the following form
                                                                         
    B00 B01 0 0 · · · 0      0      0         0     0 0 0 ···
   B10 B11 B12 0 · · · 0    0      0         0     0 0 0 · · ·
                                                                         
    0 B21 B22 B23 · · · 0   0      0         0     0 0 0 · · ·
                                                                         
   ··· ··· ··· ··· ··· ··· ···    ···      · · · · · · · · · · · · · · ·
                                                                         
    0 0 0 0 · · · 0 Br̄−1,r̄−2 Br̄−1,r̄−1 Br̄−1,r̄ 0 0 0 · · ·
                                                                         
    0 0 0 0 ··· 0
 Q=                         0   Br̄,r̄−1 Br̄,r̄ A0 0 0 · · ·            ,
    0 0 0 0 ··· 0           0      0       A    2 A  1 A   0   0   · · · 
                                                                         
    0 0 0 0 ··· 0           0      0         0    A  2 A   1 A   0 · · · 
                                                                         
   
    0 0 0 0 ··· 0                                                  . . .
                             0      0         0     0   A   2 A   1
                                                                          
     ..  ..  ..  ..  .. ..   ..     ..        ..    .. .. . . . .
                                                                         
      .   .   .   .   . .     .      .         .     . .          . .

where blocks A2 , A1 , A0 are square matrices of order d, B00 is the square
matrix of order L, blocks B01 , B10 are L × K1 , K1 × L matrices respectively,
Bl,l , l = 1, . . . , r̄, are square matrices of order Kl (we set Kr̄ = d), blocks
Bl,l−1 , l = 2, . . . , r̄, are Kl × Kl−1 matrices, and blocks Bl,l+1 , l = 1, . . . , r̄ − 1,
are Kl × Kl+1 matrices.
    The elements of matrices B01 , Bl,l+1 , l = 1, . . . , r̄ −1, and A0 are determined
by (1) and (2). The elements of matrices B10 , Bl,l−1 , l = 2, . . . , r̄, and A2 are
determined by (5). Note that A0 = λI, where I is an identity matrix of the
appropriate order; A2 is the square matrix of order d, where the (1, d)th element
                                 PM
of the matrix is equal to i=1 µi , and other elements are zero.
    The non-diagonal elements of matrices B00 , Bl,l , l = 1, . . . , r̄, and A1 are
determined by (2)–(4); the diagonal elements of these matrices are determined
from conditions (we set Br̄,r̄+1 = A0 ):

                B00 1 + B01 1 = 0,       Bl,l−1 1 + Bl,l 1 + Bl,l+1 1 = 0 ,
                                 A0 1 + A1 1 + A2 1 = 0,

where 1 is an ones column vector of the appropriate order. Note that A1 is the
bidiagonal square matrix of order d, where the (j, j)th element of the matrix is
               PM                                                 PM
equal to −(λ + i=1 µi ), and the (j, j − 1)th element is equal to i=1 µi .
   It is known [22], that the QBD process is ergodic if and only if πA A0 1 <
πA A2 1, where πA A = 0, πA 1 = 1, A = A0 + A1 + A2 . The condition implies

                                  λd < µ1 + · · · + µM .

    The stationary distribution π = (π0 , π1 , . . . ) of the QBD process can be com-
puted using the matrix-geometric method [22]. The row vector πl , l = 0, 1, . . . ,
defines probabilities of states for level l, according to the lexicographic order,
πl = (π(x) : x ∈ Xl ).
    We solve linear system πQ = 0 and π1 = 1 to find the stationary probabil-
ities. Taking the advantage of the block structure in Q, we obtain
                                 π0 B00 + π1 B10 = 0;
                            π0 B01 + π1 B11 + π2 B21 = 0;
                            π1 B12 + π2 B22 + π3 B32 = 0;
                                     ..................
                     πi−1 Bi−1,i + πi Bi,i + πi+1 Bi+1,i = 0;                             (6)
                                     ..................
                πr̄−2 Br̄−2,r̄−1 + πr̄−1 Br̄−1,r̄−1 + πr̄ Br̄,r̄−1 = 0;
                      πr̄−1 Br̄−1,r̄ + πr̄ Br̄,r̄ + πr̄+1 A2 = 0;
             πl−1 A0 + πl A1 + πl+1 A2 = 0,                 l = r̄ + 1, r̄ + 2, . . . .

    We know [22], the solution of (6) has the matrix-geometric form given by

                             πr̄+l = πr̄ Rl ,        l = 1, 2, . . . ,

where R is the minimal non-negative solution to equation A0 +RA1 +R2 A2 = 0,
vectors π0 , π1 , . . . , πr̄ satisfy
                                                             
                          B00 B01 0 · · ·    0           0
                         B10 B11 B12 · · ·  0           0    
                                                             
                          0 B21 B22 · · ·   0           0    
     (π0 , . . . , πr̄ ) 
                         ··· ··· ··· ···
                                                               = (0, . . . , 0),
                                           ···         ···   
                                                              
                          0 0 0 · · · Br̄−1,r̄−1   Br̄−1,r̄ 
                            0 0 0 · · · Br̄,r̄−1 Br̄,r̄ + RA2

                π0 1 + π1 1 + · · · + πr̄−1 1 + πr̄ (I − R)−1 1 = 1.


4    Performance Measures
Once stationary distribution π is computed, a variety of other performance mea-
sures may be obtained.
   The average number B of jobs in the queue

                     ∞
                     X               r̄
                                     X               ∞
                                                     X
               B=          lπl 1 =         lπl 1 +     (r̄ + l)πr̄ Rl 1 =
                     l=1             l=1             l=1
                     r̄
                     X
                           lπl 1 + πr̄ R r̄(I − R)−1 + (I − R)−2 1.
                                                               
                 =
                     l=1

   The average job waiting time W in the queue (the average time that the job
waits in the queue before the first of its tasks is assigned to a server)

                                           W = B/λ.
   Denote by T the average response time, then
                                       T = W + V,
where V is the average job service time, which is defined as the average total
time required to process all of the tasks associated with the job once the first
task is assigned to a server.
   The average number N of jobs in the system is
                                        N = λT.
    To obtain the average job service time, we consider the service process of all
d tasks of an arbitrarily chosen “tagged” job. There are two cases considered.
1. We assume that when the first task is assigned, the queueing system is in
   a state x such that r > r̄. Then all servers are busy, and tasks of the tagged
   job sequentially occupy available servers.
   The service process of the tagged job is described by an absorbing Markov
   chain ν = {ν(t), t ≥ 0}. Denote the state of chain ν by (k, S), where k
   denotes the number of tasks of the tagged job in the queue, S denotes the
   set of indices for busy servers that process tasks of the tagged job. State
   (0, ∅) is absorbing. The state space N of Markov chain ν is given by
                                        d
                                        [           [
                                 N =          Nj     {(0, ∅)},
                                        j=1

    where
                     N j = {(k, S) : k ∈ {0, 1, . . . , d − j}, S ∈ S j },
    S j denotes the set of all j-element subsets of {1, . . . , M },
                                       d            
                                       X         M
                              |N | =     j             + 1.
                                       j=1
                                               d−j+1

    The initial probability distribution defined on state space N is given by
    the vector (α0 , α), where α0 = α0 (0, ∅) = 0, α = (α(k, S) : (k, S) ∈ N \
    {(0, ∅)}),
                                          µi
                        α (d − 1, {i}) = M    , i = 1, . . . , M,
                                         P
                                           µi
                                              i=1
    and other initial probabilities are zero.
    Markov chain ν is determined by generator matrix Qν with non-diagonal
    elements qν ((k, S), (k 0 , S 0 ))

                                           if k 0 = k = 0, S 0 = S \ {i}, i ∈ S,
                                
                                µi ,
                                
                                           if k 0 = k − 1 ≥ 0, S 0 = S {i}, i ∈
                                
                                µ ,                                   S
                     0   0            i                                         / S,
       qν ((k, S), (k , S )) = P                0                0
                                 i∈S µi , if k = k − 1 ≥ 0, S = S,
                                
                                
                                    0,     otherwise.
                                
   We denote by T the square matrix of order |N | − 1 which is a submatrix of
   matrix Qν and contains transition rates among the transient states.
   The absorption time ξ for Markov chain ν is a random variable which has
   a PH-distribution with PH-representation (α, T ) [22]. The expected absorp-
   tion time is
                               E[ξ] = −αT −1 1.
2. We assume that when the first task is assigned, the queueing system is in
   a state x such that 0 ≤ r ≤ r̄. Then service of the tagged job starts right
   after the events:
     – the job arrives,
     – a task finishes and leaves the system,
     – a job arrives and this leads to a task of the job is assigned to an idle
       server.
   In this case, the service process of the tagged job is described by an absorbing
   Markov chain ζ = {ζ(t), t ≥ 0}. Denote the state of chain ζ by (k, S, x),
   where k denotes the number of tasks of the tagged job in the queue, S
   denotes the set of indices for busy servers that process tasks of the tagged
   job, x denotes the system state, x ∈ X̂ , X̂ = {x ∈ X : r ≤ r̄}. States
   (0, ∅, x), x ∈ X̂ are absorbing. The state space Z of Markov chain ζ is given
   by
                                           d−1
                                            [
                                      Z=       Zk,
                                                k=0

   where                            [
                          Z0 =            {(0, S, x) : S ∈ S(x, 0)} ,
                                   x∈X̂
                      [
           Zk =                  {(k, S, x) : S ∈ S(x, k)} ,     k = 1, . . . , d − 1,
                  x∈X̂ : n0 =k

                                 
                        S(x, k) = S ⊆ F̄(x) : |S| + k ≤ d .
   The initial probability distribution defined on state space Z is given by the
   vector (β0 , β), where β0 = (β0 (0, ∅, x) : x ∈ X̂ ) = 0, β = (β(k, S, x0 ) :
   (k, S, x0 ) ∈ Z \ Z 0 ):
                                                               
                                X            XM       X
              β(k, S, x0 ) = λ       π(x) +     µi         π(x) G−1 ,
                                  x∈A(x0 )            i=1   x∈Di (x0 )


   x ∈ X presents states from which the system moves to state x0 as a result
   of a job arrival (x ∈ A(x0 )), or as a result of a task completion at server Si
   (x ∈ Di (x0 )), i = 1, . . . , M ; G denotes the normalizing constant.
   The Markov chain ζ is determined by generator matrix Qζ with non-diagonal
   elements qζ ((k, S, x), (k 0 , S 0 , x0 )).
   Suppose there is a transition from x to x0 as a result of an event de-
   scribed in Section 3. So we have the corresponding transition from (k, S, x)
   to (k 0 , S 0 , x0 ), where k 0 and S 0 will be described below.
   Assume the system moves from state x to state x0 as a result of a job
   arrival, we denote by H(x, x0 ) = F̄(x0 ) \ F̄(x) the index set of idle servers
   which will be busy right after the event. Let H(x, x0 ) = {i1 , i2 , . . . , im },
   i1 < i2 < · · · < im , then Hj (x, x0 ) ⊆ H(x, x0 ) denotes the j-element set

                                 Hj (x, x0 ) = {i1 , . . . , ij }.

   Thus, we can write
   (a) A job arrives
                              qζ ((k, S, x), (k 0 , S 0 , x0 )) = λ,

       where k 0 = max {k − |H(x, x0 )|, 0}, S 0 = S Hk−k0 (x, x0 ),
                                                            S

   (b) A task finishes its service at Si
        i. if k > 0, n0i = 1, i ∈
                                / S,

                             qζ ((k, S, x), (k − 1, S ∪ {i}, x0 )) = µi ,

        ii. if k > 0,
                                                                       X
                           qζ ((k, S, x), (k − 1, S, x0 )) =                      µi ,
                                                                     i∈S:n0i =1


       iii. if k > 0, n0i = 0,

                                 qζ ((k, S, x), (k, S \ {i}, x0 )) = µi ,

       iv. if k = 0,

                                 qζ ((0, S, x), (0, S \ {i}, x0 )) = µi .

   We denote by η the absorption time for Markov chain ζ. The expected ab-
   sorption time E[η] is computed similarly to E[ξ].

   Finally, we have
                                       E[ξ]c1 + E[η]c2
                                 V =                   ,
                                           c1 + c2
where c1 = ω1 , c2 = β1G,

                                                                        M
                                                                       X
               (ω1 , . . . , ωd ) = πr̄ (I − R)−1 − πr̄ − πr̄+1                µi .
                                                                         i=1
5         Numerical Results
Consider a fork-join queueing system which consists of M = 3 heterogeneous
servers, where µ1 = 5, µ2 = 2, µ3 = 1.
    The performance measures depend on threshold levels qi , i = 2, . . . , M . Let qi∗
be the optimal threshold levels with respect to the minimization of the average
response time T . They can be obtained by the exhaustive search method. Assume
that λ = 1 and d = 3, then the optimal threshold levels are q2∗ = 2, q3∗ = 8, and
T = 0.838.
    To illustrate the advantages of the threshold control policy, we consider the
fastest free server policy with qi = 1, i = 1, . . . , M . Figure 2 presents the average
response times for the optimal threshold control policy and the fastest free server
policy for several values of the arrival rate λ.


    1.5      Threshold control policy                       Threshold control policy
             Fastest free server policy                 6   Fastest free server policy


     1
                                                        4
T


                                                    T


    0.5
                                                        2


     0                                                  0
             0.5      1             1.5   2   2.5           0.5      1         1.5       2   2.5
                                λ                                          λ
                          (a)                                            (b)

              Fig. 2. The average response times for (a) d = 2 and (b) d = 3.


    It may be seen that the optimal threshold control policy provides lower values
of the average response time T . When the arrival rate λ is quite large, the differ-
ence between the policies can be neglected. As λ increases, the threshold levels
decrease that leads to the fastest free server policy is near optimal one in this
case.


6         Conclusions
We studied a Markovian fork-join queueing system with heterogeneous servers
and threshold control policy. Applying a matrix-geometric approach, we obtained
the stationary distribution of the system and performance measures. The results
can be used for the performance analysis of multiprocessor systems and other
modern distributed systems. An interesting topic for future research is to ob-
tain the optimal threshold control policy which minimizes the long-run average
number of jobs in the system or the average response time.
References
1. Rizk, A., Poloczek, F., Ciucu, F.: Computable bounds in fork-join queue-
   ing systems. SIGMETRICS Perform. Eval. Rev. 43(1), 335–346 (2015).
   https://doi.org/10.1145/2796314.2745859
2. KhudaBukhsh, W.R., Rizk, A., Frömmgen, A., Koeppl, H.: Optimizing
   stochastic scheduling in fork-join queueing models: bounds and applica-
   tions. In: IEEE INFOCOM 2017, pp. 1–9. IEEE, New York (2017).
   https://doi.org/10.1109/INFOCOM.2017.8057013
3. Glushkova, D., Jovanovic, P., Abello, A.: MapReduce performance model for Hadoop
   2.x. Information Systems 79, 32–43 (2019). https://doi.org/10.1016/j.is.2017.11.006
4. Thomassian, A.: Analysis of fork/join and related queueing systems. ACM Com-
   puting Surveys 47(2), 17:1–17:71 (2014). https://doi.org/10.1145/2628913
5. Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download
   from coded distributed storage. IEEE Journal on Selected Areas on Communications
   32(5), 989–997 (2014). https://doi.org/10.1109/JSAC.2014.140518
6. Heidelberger, P., Trivedi, K.S.: Analytic queueing models for programs
   with internal concurrency. IEEE Trans. Comp. C-3(1), 73–82 (1983).
   https://doi.org/10.1109/TC.1983.1676125
7. Gorbunova, A.V., Zaryadov, I.S., Matyushenko, S.I., Samouylov, K.E.,
   Shorgin, S.Ya.: The approximation of response time of a cloud comput-
   ing system. Informatics and Applications 9(3), 31–38 (2015) (in Russian).
   https://doi.org/10.14357/19922264150304
8. Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two de-
   mands I. SIAM Journal of Applied Mathematics 44(5), 1041–1053 (1984).
   https://doi.org/10.1137/0144074
9. Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchro-
   nization in parallel queues. IEEE Trans. Comp. 37(6), 739–743 (1988).
   https://doi.org/10.1109/12.2213
10. Squillante, M.S., Zhang, Y., Sivasubramaniam, A., Gautam, N.: Generalized
   parallel-server fork-join queues with dynamic task scheduling. Annals of Operations
   Research 160(1), 227–255 (2008). https://doi.org/10.1007/s10479-008-0312-7
11. Baccelli, F., Makowski, A.M., Shwartz, A.: The fork-join queue and related systems
   with synchronization constraints: stochastic ordering and computable bounds. Adv.
   Appl. Probab. 21(3), 629–660 (1989). https://doi.org/10.1017/S0001867800018851
12. Ko, S.-S., Serfozo, R.F.: Sojourn times in G/M/1 fork-join networks. Naval Re-
   search Logistics (NRL) 55(5), 432–443 (2008). https://doi.org/10.1002/nav.20294
13. Kumar, A., Shorey, R.: Performance analysis and scheduling of stochastic fork-join
   jobs in a multicomputer system. IEEE Transactions on Parallel and Distributed
   Systems 10(4), 1147–1164 (1993). https://doi.org/10.1109/71.246075
14. Osipov, O.A.: A heterogeneous fork-join queueing system in which each job occupy
   all free servers. RUDN Journal of Mathematics, Information Sciences and Physics
   26(1), 28–38 (2018) (in Russian). https://doi.org/10.22363/2312-9735-2018-26-1-
   28-38
15. Narahari, Y., Sundarrajan, P.: Performability analysis of fork-join queueing sys-
   tems. Journal of the Operational Research Society 46(10), 1237–1249 (1995).
   https://doi.org/10.1057/jors.1995.171
16. Lin, W., Kumar, P.: Optimal control of a queueing system with two heteroge-
   neous servers. IEEE Transactions on Automatic Control 29(8), 696–703 (1984).
   https://doi.org/10.1109/TAC.1984.1103637
17. Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Automation and Remote
   Control 70(12), 2013–2023 (2009). https://doi.org/10.1134/S0005117909120091
18. Rykov, V.: Monotone control of queueing systems with heterogeneous servers.
   Queueing Sys. 37(4), 391–403 (2001). https://doi.org/10.1023/A:1010893501581
19. Efrosinin, D., Breuer, L.: Threshold policies for controlled retrial queues
   with heterogeneous servers. Ann. Oper. Res. 141, 139–162 (2006).
   https://doi.org/10.1007/s10479-006-5297-5
20. Özkan, E., Kharoufeh, J.: Optimal control of a two-server queueing system with
   failures. Probability in the Engineering and Informational Sciences 28(10), 489–527
   (2014). https://doi.org/10.1017/S0269964814000114
21. Legros, B., Jouini, O.: Routing in a queueing system with two heterogeneous servers
   in speed and in quality of resolution. Stochastic Models 33(3), 392–410 (2017).
   https://doi.org/10.1080/15326349.2017.1303615
22. He, Q.-M.: Fundamentals of Matrix-Analytic Methods. Springer, New York (2014).
   https://doi.org/10.1007/978-1-4614-7330-5