Analysis of Centralized Splitting Fork-Join Queueing Systems with Heterogeneous Servers and Threshold Control Policy Oleg Osipov and Ekaterina Rogachko Saratov State University, 83 Astrakhanskaya Street, Saratov 410012, Russia oleg.alex.osipov@gmail.com, rogachkoes@info.sgu.ru Abstract. In this paper, we consider a Markovian centralized splitting fork-join queueing system with heterogeneous servers and an infinite ca- pacity queue. Jobs arrive at the system according to a Poisson process, a job consists of multiple homogeneous tasks. Tasks of a job are served independently, when all the tasks associated with the job are finished, the job service is completed. The system operates under a threshold con- trol policy. The control consists in sending tasks to one of idle servers or to the queue. The threshold policy uses the slow servers only when the queue length reaches certain threshold levels. We perform an exact analysis which is based on the matrix-geometric method and obtain ex- pressions for the performance measures. Some numerical examples are presented. The results can be used for the performance analysis of mul- tiprocessor systems and other modern distributed systems. Keywords: Fork-Join Queueing Systems · Heterogeneous Servers · Thresh- old Control Policy · Matrix Geometric Method · Performance Measures · Distributed Computing System 1 Introduction Fork-join queueing systems are widely used to model parallel and distributed systems. Some examples of these systems include RAID, distributed web appli- cations, MapReduce, GRID, multipath routing networks, multiprocessor systems and etc. In these systems, arriving jobs are split for service by numerous servers and joined before departure. In papers [1,2,3], fork-join queueing systems are applied to the analysis of MapReduce clusters and multipath routing. Results on modeling of RAID arrays and distributed databases are provided in [4,5]. Papers [6,7] describe an applica- tion of fork-join queueing systems to the analysis of multiprocessor systems and parallel algorithms. Fork-join queueing systems also arise in the performance analysis of automated manufacturing systems. In these systems, parallelism can be found in processes such as product assembly and logistic operations that involve multiple suppliers. Copyright © 2020 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). Paper [4] presents an overview of the main results for fork-join and related queueing models. Most of the papers consider parallel-server fork-join queueing systems [8,9,10,11,12]. The parallel-server fork-join queueing system consists of homogeneous servers [9,10,13] or heterogeneous servers [1,2,8,11,12,14]. An arriving job is split into a number of tasks, one for each server. Previous research in this area reports three basic types of parallel-server fork-join queueing systems [15], such as the central- ized splitting model, the distributed splitting model, and the split-merge model. In [11] upon arrival, a job of size S bringing tasks to S ≤ K of K servers is im- mediately split so that each of its S tasks is allocated to exactly one server. In [14] each job is split into a number of tasks, one for each free server. Paper [2] con- siders both deterministic and probabilistic scheduling strategies. Within proba- bilistic strategy, each task chooses a server with some probability. Instead, the present paper considers the allocation of tasks between the heterogeneous servers according to a threshold control policy. Under a threshold policy, a task is ac- cepted at server i if and only if the number of tasks in queue is at or above a given threshold qi before acceptance. The threshold control policy is used as optimal solution for the job routing problem named the slow server problem [16,17]. This problem focuses on routing jobs to heterogeneous servers so as to minimize the average response time of jobs or to minimize the average number of jobs in the system. As it shown in [16], the optimal policy which minimizes the average response time of jobs in the system with two heterogeneous servers is of threshold type, i.e., the fast server should always be used (as long as there are jobs in the queue), and that the slower server should only be used if the number of jobs in queue exceeds a certain threshold value. The generalization of this rule was obtained in [17,18] for the case of multi- server queueing system with heterogeneous servers. The optimal control policy has the monotonicity and the threshold property, i.e., there exists a queue length threshold such that a new server (with the minimal service cost and with the maximal service rate) should be used only after the queue reaches this threshold. Later, the results were extended to retrial queueing systems with heterogeneous servers [19], systems with unreliable servers [20] and with heterogeneous servers in speed and in quality of resolution [21]. In this paper, we consider a centralized splitting fork-join queueing model [15] with heterogeneous servers and a threshold control policy. We apply the matrix- geometric approach to analyze the considered system. A similar approach was used in [10,14]. The remainder of the paper is organized as follows. Section 2 describes the fork-join queueing system under consideration. In Sections 3 and 4 we obtain the stationary distribution and the main performance measures. Section 5 provides an illustration of the above results, we also discuss some control policies and compare them. Finally, a section of conclusions commenting the main research contributions of this paper is presented. 2 The Model We consider a system which consists of M heterogeneous servers Si , i = 1, . . . , M , and a FCFS infinite capacity queue as shown in Fig. 1. Jobs arrive at the system according to a Poisson process with rate λ. Each job is split into d homogeneous tasks, d ≤ M . An arriving task is placed in the queue. The service times at server Si have exponential distribution with parameter µi , i = 1, . . . , M , and µ1 > µ2 > · · · > µM . Let b denote the number of waiting tasks in the queue (queue length). At the arrival times of new tasks the control consists in sending tasks from the queue to the fastest idle server or leaving them in the queue. A task from the front of the queue is assigned to a fastest idle server Si , i ∈ {1, . . . , M }, if the queue length immediately after the arrival is not less than threshold level qi , i.e., qi ≤ b + 1, (1 = q1 ≤ q2 ≤ · · · ≤ qM < qM +1 = ∞). µ1 λ ... µ2 ... µM Fig. 1. Centralized splitting fork-join queueing system with heterogeneous servers. When a task finishes its service at server Si , a task from the queue will be assigned to server Si if the queue length satisfies b ≥ qi . Otherwise server Si will be idle. Tasks of a job are served independently, when all the tasks associated with the job are finished, the job service is completed. 3 The Stationary Distribution The system state is described by a vector x = (r, n0 , n1 , . . . , nM ), where r = r(b) = b div d denotes the number of waiting jobs in the queue, n0 = n0 (b) = b mod d is the number of tasks for the served job in the queue, ( 0, if server Si is idle, ni = 1, if server Si is busy. The process {x(t), t ≥ 0} is a (M + 2)-dimensional continuous-time Markov chain on the state space X . For each state x, we denote by F(x) and F̄(x) the sets of idle and busy server indices, respectively, F(x) = {i > 0 : ni = 0}, F̄(x) = {i > 0 : ni = 1}. Denote by γ(b) the mandatory (minimal) number of busy servers for queue length b. We can write  0,  b = 0, γ(b) = M, b ≥ qM ,  max{i : qi ≤ b}, otherwise.  Define ϕ(x) by ( the minimal element of F(x), if F(x) 6= ∅, ϕ(x) = ∞, otherwise. If there are idle servers in the system, a task will be assigned to server Sϕ(x) . Let us define now the transition rate q(x, x0 ) from state x ∈ X to state x0 ∈ X . Each transition is associated with an event in the system, there are two types of events. 1. A job arrives and splits into d tasks (a) if b ≥ qM , q (x, (r + 1, n0 , n1 , . . . , nM )) = λ, (1) (b) if b < qM , q(x, x0 ) = λ, (2) 0 (d) where x = x can be obtained from the following recurrence relation ( (i) (i)  (i) (i) r(b + 1), n0 (b + 1), n1 , . . . , n M , ϕ(x(i) ) > γ(b(i) + 1), x(i+1) = x(i) + e2+ϕ(x(i) ) , otherwise, (0) with the initial  value x = x, i = 0, . . . , d − 1. (i) (i) (i) Here, x = r , n0 , . . . , nM and b(i) = dr(i) + n0 ; ej denotes the (i) (i) vector of the appropriate dimension with all components equal to 0, except the jth, which is 1. 2. A task finishes its service at server Si (ni = 1) (a) if b < qi , q (x, (r, n0 , n1 , . . . , ni−1 , 0, ni+1 , . . . , nM )) = µi , (3) (b) if n0 > 0, {i ∈ F̄(x) : b ≥ qi } = 6 ∅, X q (x, (r, n0 − 1, n1 , . . . , nM )) = µi , (4) i∈F̄ (x): b≥qi (c) if r > 0, n0 = 0, {i ∈ F̄(x) : b ≥ qi } = 6 ∅, X q(x, (r − 1, d − 1, n1 , . . . , nM )) = µi . (5) i∈F̄ (x): b≥qi Let state space X be lexicographically ordered. The subset Xl is called level l, l = 0, 1, . . . , and defined as Xl = {(r, n0 , n1 , . . . , nM ) ∈ X : r = l}. The cardinality L of X0 is L = 2M + (q2 − q1 )2M −1 + (q3 − q2 )2M −2 + · · · + (d − qσ0 )2M −σ0 , where σ0 = γ(d − 1). Let r̄ be the minimal number r ∈ {1, 2, . . . } such that rd ≥ qM . The cardi- nalities of Xl , l ≥ r̄ are d. The cardinalities of Xl , 0 < l < r̄ are − − Kl = 2M −σl (qσ− +1 − ld) + 2M −σl −1 (qσ− +2 − qσ− +1 ) + · · · + l l l M −σl+ +1 + +2 (qσ+ − qσ+ −1 ) + 2M −σl (ld + d − qσ+ ), l l l where σl− = γ(ld), σl+ = γ(ld + d − 1). The system states are presented in Table 1. Table 1. System states. States (r, n0 , n1 , . . . , nM ) Number of states r n0 n1 , . . . , nM 0 n1 , . . . , nM 2M q1 ≤ n0 < q2 1, n2 , . . . , nM (q2 − q1 )2M −1 0 ... ... ... qσ0 ≤ n0 ≤ d − 1 1, . . . , 1, nσ0 +1 , . . . , nM (d − qσ0 )2M −σ0 − 0 ≤ n0 < qσ− +1 − ld 1, . . . , 1, nσ− +1 , . . . , nM 2M −σl (qσ− +1 − ld) l l l qσ− +1 − ld ≤ n0 , 0 < l < r̄ M −σl− −1 l n0 < qσ− +2 − ld 1, . . . , 1, nσ− +2 , . . . , nM 2 (qσ− +2 − qσ− +1 ) l l l l ... ... ... + qσ+ − ld ≤ n0 < d 1, . . . , 1, nσ+ +1 , . . . , nM 2M −σl (ld + d − qσ+ ) l l l l ≥ r̄ 0 ≤ n0 ≤ d − 1 1, . . . , 1 d It is easily shown that for Xl , l > 0, there are transitions to Xl−1 and Xl+1 . Thus, the Markov chain {x(t), t ≥ 0} is a quasi-birth-and-death (QBD) process, the generator matrix Q of the process has the following form   B00 B01 0 0 · · · 0 0 0 0 0 0 0 ··· B10 B11 B12 0 · · · 0 0 0 0 0 0 0 · · ·    0 B21 B22 B23 · · · 0 0 0 0 0 0 0 · · ·   ··· ··· ··· ··· ··· ··· ··· ··· · · · · · · · · · · · · · · ·    0 0 0 0 · · · 0 Br̄−1,r̄−2 Br̄−1,r̄−1 Br̄−1,r̄ 0 0 0 · · ·    0 0 0 0 ··· 0 Q= 0 Br̄,r̄−1 Br̄,r̄ A0 0 0 · · · ,  0 0 0 0 ··· 0 0 0 A 2 A 1 A 0 0 · · ·     0 0 0 0 ··· 0 0 0 0 A 2 A 1 A 0 · · ·      0 0 0 0 ··· 0 . . . 0 0 0 0 A 2 A 1  .. .. .. .. .. .. .. .. .. .. .. . . . .   . . . . . . . . . . . . . where blocks A2 , A1 , A0 are square matrices of order d, B00 is the square matrix of order L, blocks B01 , B10 are L × K1 , K1 × L matrices respectively, Bl,l , l = 1, . . . , r̄, are square matrices of order Kl (we set Kr̄ = d), blocks Bl,l−1 , l = 2, . . . , r̄, are Kl × Kl−1 matrices, and blocks Bl,l+1 , l = 1, . . . , r̄ − 1, are Kl × Kl+1 matrices. The elements of matrices B01 , Bl,l+1 , l = 1, . . . , r̄ −1, and A0 are determined by (1) and (2). The elements of matrices B10 , Bl,l−1 , l = 2, . . . , r̄, and A2 are determined by (5). Note that A0 = λI, where I is an identity matrix of the appropriate order; A2 is the square matrix of order d, where the (1, d)th element PM of the matrix is equal to i=1 µi , and other elements are zero. The non-diagonal elements of matrices B00 , Bl,l , l = 1, . . . , r̄, and A1 are determined by (2)–(4); the diagonal elements of these matrices are determined from conditions (we set Br̄,r̄+1 = A0 ): B00 1 + B01 1 = 0, Bl,l−1 1 + Bl,l 1 + Bl,l+1 1 = 0 , A0 1 + A1 1 + A2 1 = 0, where 1 is an ones column vector of the appropriate order. Note that A1 is the bidiagonal square matrix of order d, where the (j, j)th element of the matrix is PM PM equal to −(λ + i=1 µi ), and the (j, j − 1)th element is equal to i=1 µi . It is known [22], that the QBD process is ergodic if and only if πA A0 1 < πA A2 1, where πA A = 0, πA 1 = 1, A = A0 + A1 + A2 . The condition implies λd < µ1 + · · · + µM . The stationary distribution π = (π0 , π1 , . . . ) of the QBD process can be com- puted using the matrix-geometric method [22]. The row vector πl , l = 0, 1, . . . , defines probabilities of states for level l, according to the lexicographic order, πl = (π(x) : x ∈ Xl ). We solve linear system πQ = 0 and π1 = 1 to find the stationary probabil- ities. Taking the advantage of the block structure in Q, we obtain π0 B00 + π1 B10 = 0; π0 B01 + π1 B11 + π2 B21 = 0; π1 B12 + π2 B22 + π3 B32 = 0; .................. πi−1 Bi−1,i + πi Bi,i + πi+1 Bi+1,i = 0; (6) .................. πr̄−2 Br̄−2,r̄−1 + πr̄−1 Br̄−1,r̄−1 + πr̄ Br̄,r̄−1 = 0; πr̄−1 Br̄−1,r̄ + πr̄ Br̄,r̄ + πr̄+1 A2 = 0; πl−1 A0 + πl A1 + πl+1 A2 = 0, l = r̄ + 1, r̄ + 2, . . . . We know [22], the solution of (6) has the matrix-geometric form given by πr̄+l = πr̄ Rl , l = 1, 2, . . . , where R is the minimal non-negative solution to equation A0 +RA1 +R2 A2 = 0, vectors π0 , π1 , . . . , πr̄ satisfy   B00 B01 0 · · · 0 0 B10 B11 B12 · · · 0 0     0 B21 B22 · · · 0 0  (π0 , . . . , πr̄ )  ··· ··· ··· ···  = (0, . . . , 0),  ··· ···    0 0 0 · · · Br̄−1,r̄−1 Br̄−1,r̄  0 0 0 · · · Br̄,r̄−1 Br̄,r̄ + RA2 π0 1 + π1 1 + · · · + πr̄−1 1 + πr̄ (I − R)−1 1 = 1. 4 Performance Measures Once stationary distribution π is computed, a variety of other performance mea- sures may be obtained. The average number B of jobs in the queue ∞ X r̄ X ∞ X B= lπl 1 = lπl 1 + (r̄ + l)πr̄ Rl 1 = l=1 l=1 l=1 r̄ X lπl 1 + πr̄ R r̄(I − R)−1 + (I − R)−2 1.   = l=1 The average job waiting time W in the queue (the average time that the job waits in the queue before the first of its tasks is assigned to a server) W = B/λ. Denote by T the average response time, then T = W + V, where V is the average job service time, which is defined as the average total time required to process all of the tasks associated with the job once the first task is assigned to a server. The average number N of jobs in the system is N = λT. To obtain the average job service time, we consider the service process of all d tasks of an arbitrarily chosen “tagged” job. There are two cases considered. 1. We assume that when the first task is assigned, the queueing system is in a state x such that r > r̄. Then all servers are busy, and tasks of the tagged job sequentially occupy available servers. The service process of the tagged job is described by an absorbing Markov chain ν = {ν(t), t ≥ 0}. Denote the state of chain ν by (k, S), where k denotes the number of tasks of the tagged job in the queue, S denotes the set of indices for busy servers that process tasks of the tagged job. State (0, ∅) is absorbing. The state space N of Markov chain ν is given by d [ [ N = Nj {(0, ∅)}, j=1 where N j = {(k, S) : k ∈ {0, 1, . . . , d − j}, S ∈ S j }, S j denotes the set of all j-element subsets of {1, . . . , M }, d   X M |N | = j + 1. j=1 d−j+1 The initial probability distribution defined on state space N is given by the vector (α0 , α), where α0 = α0 (0, ∅) = 0, α = (α(k, S) : (k, S) ∈ N \ {(0, ∅)}), µi α (d − 1, {i}) = M , i = 1, . . . , M, P µi i=1 and other initial probabilities are zero. Markov chain ν is determined by generator matrix Qν with non-diagonal elements qν ((k, S), (k 0 , S 0 )) if k 0 = k = 0, S 0 = S \ {i}, i ∈ S,  µi ,  if k 0 = k − 1 ≥ 0, S 0 = S {i}, i ∈  µ , S 0 0 i / S, qν ((k, S), (k , S )) = P 0 0  i∈S µi , if k = k − 1 ≥ 0, S = S,   0, otherwise.  We denote by T the square matrix of order |N | − 1 which is a submatrix of matrix Qν and contains transition rates among the transient states. The absorption time ξ for Markov chain ν is a random variable which has a PH-distribution with PH-representation (α, T ) [22]. The expected absorp- tion time is E[ξ] = −αT −1 1. 2. We assume that when the first task is assigned, the queueing system is in a state x such that 0 ≤ r ≤ r̄. Then service of the tagged job starts right after the events: – the job arrives, – a task finishes and leaves the system, – a job arrives and this leads to a task of the job is assigned to an idle server. In this case, the service process of the tagged job is described by an absorbing Markov chain ζ = {ζ(t), t ≥ 0}. Denote the state of chain ζ by (k, S, x), where k denotes the number of tasks of the tagged job in the queue, S denotes the set of indices for busy servers that process tasks of the tagged job, x denotes the system state, x ∈ X̂ , X̂ = {x ∈ X : r ≤ r̄}. States (0, ∅, x), x ∈ X̂ are absorbing. The state space Z of Markov chain ζ is given by d−1 [ Z= Zk, k=0 where [ Z0 = {(0, S, x) : S ∈ S(x, 0)} , x∈X̂ [ Zk = {(k, S, x) : S ∈ S(x, k)} , k = 1, . . . , d − 1, x∈X̂ : n0 =k  S(x, k) = S ⊆ F̄(x) : |S| + k ≤ d . The initial probability distribution defined on state space Z is given by the vector (β0 , β), where β0 = (β0 (0, ∅, x) : x ∈ X̂ ) = 0, β = (β(k, S, x0 ) : (k, S, x0 ) ∈ Z \ Z 0 ):   X XM X β(k, S, x0 ) = λ π(x) + µi π(x) G−1 , x∈A(x0 ) i=1 x∈Di (x0 ) x ∈ X presents states from which the system moves to state x0 as a result of a job arrival (x ∈ A(x0 )), or as a result of a task completion at server Si (x ∈ Di (x0 )), i = 1, . . . , M ; G denotes the normalizing constant. The Markov chain ζ is determined by generator matrix Qζ with non-diagonal elements qζ ((k, S, x), (k 0 , S 0 , x0 )). Suppose there is a transition from x to x0 as a result of an event de- scribed in Section 3. So we have the corresponding transition from (k, S, x) to (k 0 , S 0 , x0 ), where k 0 and S 0 will be described below. Assume the system moves from state x to state x0 as a result of a job arrival, we denote by H(x, x0 ) = F̄(x0 ) \ F̄(x) the index set of idle servers which will be busy right after the event. Let H(x, x0 ) = {i1 , i2 , . . . , im }, i1 < i2 < · · · < im , then Hj (x, x0 ) ⊆ H(x, x0 ) denotes the j-element set Hj (x, x0 ) = {i1 , . . . , ij }. Thus, we can write (a) A job arrives qζ ((k, S, x), (k 0 , S 0 , x0 )) = λ, where k 0 = max {k − |H(x, x0 )|, 0}, S 0 = S Hk−k0 (x, x0 ), S (b) A task finishes its service at Si i. if k > 0, n0i = 1, i ∈ / S, qζ ((k, S, x), (k − 1, S ∪ {i}, x0 )) = µi , ii. if k > 0, X qζ ((k, S, x), (k − 1, S, x0 )) = µi , i∈S:n0i =1 iii. if k > 0, n0i = 0, qζ ((k, S, x), (k, S \ {i}, x0 )) = µi , iv. if k = 0, qζ ((0, S, x), (0, S \ {i}, x0 )) = µi . We denote by η the absorption time for Markov chain ζ. The expected ab- sorption time E[η] is computed similarly to E[ξ]. Finally, we have E[ξ]c1 + E[η]c2 V = , c1 + c2 where c1 = ω1 , c2 = β1G, M X (ω1 , . . . , ωd ) = πr̄ (I − R)−1 − πr̄ − πr̄+1 µi . i=1 5 Numerical Results Consider a fork-join queueing system which consists of M = 3 heterogeneous servers, where µ1 = 5, µ2 = 2, µ3 = 1. The performance measures depend on threshold levels qi , i = 2, . . . , M . Let qi∗ be the optimal threshold levels with respect to the minimization of the average response time T . They can be obtained by the exhaustive search method. Assume that λ = 1 and d = 3, then the optimal threshold levels are q2∗ = 2, q3∗ = 8, and T = 0.838. To illustrate the advantages of the threshold control policy, we consider the fastest free server policy with qi = 1, i = 1, . . . , M . Figure 2 presents the average response times for the optimal threshold control policy and the fastest free server policy for several values of the arrival rate λ. 1.5 Threshold control policy Threshold control policy Fastest free server policy 6 Fastest free server policy 1 4 T T 0.5 2 0 0 0.5 1 1.5 2 2.5 0.5 1 1.5 2 2.5 λ λ (a) (b) Fig. 2. The average response times for (a) d = 2 and (b) d = 3. It may be seen that the optimal threshold control policy provides lower values of the average response time T . When the arrival rate λ is quite large, the differ- ence between the policies can be neglected. As λ increases, the threshold levels decrease that leads to the fastest free server policy is near optimal one in this case. 6 Conclusions We studied a Markovian fork-join queueing system with heterogeneous servers and threshold control policy. Applying a matrix-geometric approach, we obtained the stationary distribution of the system and performance measures. The results can be used for the performance analysis of multiprocessor systems and other modern distributed systems. An interesting topic for future research is to ob- tain the optimal threshold control policy which minimizes the long-run average number of jobs in the system or the average response time. References 1. Rizk, A., Poloczek, F., Ciucu, F.: Computable bounds in fork-join queue- ing systems. SIGMETRICS Perform. Eval. Rev. 43(1), 335–346 (2015). https://doi.org/10.1145/2796314.2745859 2. KhudaBukhsh, W.R., Rizk, A., Frömmgen, A., Koeppl, H.: Optimizing stochastic scheduling in fork-join queueing models: bounds and applica- tions. In: IEEE INFOCOM 2017, pp. 1–9. IEEE, New York (2017). https://doi.org/10.1109/INFOCOM.2017.8057013 3. Glushkova, D., Jovanovic, P., Abello, A.: MapReduce performance model for Hadoop 2.x. Information Systems 79, 32–43 (2019). https://doi.org/10.1016/j.is.2017.11.006 4. Thomassian, A.: Analysis of fork/join and related queueing systems. ACM Com- puting Surveys 47(2), 17:1–17:71 (2014). https://doi.org/10.1145/2628913 5. Joshi, G., Liu, Y., Soljanin, E.: On the delay-storage trade-off in content download from coded distributed storage. IEEE Journal on Selected Areas on Communications 32(5), 989–997 (2014). https://doi.org/10.1109/JSAC.2014.140518 6. Heidelberger, P., Trivedi, K.S.: Analytic queueing models for programs with internal concurrency. IEEE Trans. Comp. C-3(1), 73–82 (1983). https://doi.org/10.1109/TC.1983.1676125 7. Gorbunova, A.V., Zaryadov, I.S., Matyushenko, S.I., Samouylov, K.E., Shorgin, S.Ya.: The approximation of response time of a cloud comput- ing system. Informatics and Applications 9(3), 31–38 (2015) (in Russian). https://doi.org/10.14357/19922264150304 8. Flatto, L., Hahn, S.: Two parallel queues created by arrivals with two de- mands I. SIAM Journal of Applied Mathematics 44(5), 1041–1053 (1984). https://doi.org/10.1137/0144074 9. Nelson, R., Tantawi, A.N.: Approximate analysis of fork/join synchro- nization in parallel queues. IEEE Trans. Comp. 37(6), 739–743 (1988). https://doi.org/10.1109/12.2213 10. Squillante, M.S., Zhang, Y., Sivasubramaniam, A., Gautam, N.: Generalized parallel-server fork-join queues with dynamic task scheduling. Annals of Operations Research 160(1), 227–255 (2008). https://doi.org/10.1007/s10479-008-0312-7 11. Baccelli, F., Makowski, A.M., Shwartz, A.: The fork-join queue and related systems with synchronization constraints: stochastic ordering and computable bounds. Adv. Appl. Probab. 21(3), 629–660 (1989). https://doi.org/10.1017/S0001867800018851 12. Ko, S.-S., Serfozo, R.F.: Sojourn times in G/M/1 fork-join networks. Naval Re- search Logistics (NRL) 55(5), 432–443 (2008). https://doi.org/10.1002/nav.20294 13. Kumar, A., Shorey, R.: Performance analysis and scheduling of stochastic fork-join jobs in a multicomputer system. IEEE Transactions on Parallel and Distributed Systems 10(4), 1147–1164 (1993). https://doi.org/10.1109/71.246075 14. Osipov, O.A.: A heterogeneous fork-join queueing system in which each job occupy all free servers. RUDN Journal of Mathematics, Information Sciences and Physics 26(1), 28–38 (2018) (in Russian). https://doi.org/10.22363/2312-9735-2018-26-1- 28-38 15. Narahari, Y., Sundarrajan, P.: Performability analysis of fork-join queueing sys- tems. Journal of the Operational Research Society 46(10), 1237–1249 (1995). https://doi.org/10.1057/jors.1995.171 16. Lin, W., Kumar, P.: Optimal control of a queueing system with two heteroge- neous servers. IEEE Transactions on Automatic Control 29(8), 696–703 (1984). https://doi.org/10.1109/TAC.1984.1103637 17. Rykov, V.V., Efrosinin, D.V.: On the slow server problem. Automation and Remote Control 70(12), 2013–2023 (2009). https://doi.org/10.1134/S0005117909120091 18. Rykov, V.: Monotone control of queueing systems with heterogeneous servers. Queueing Sys. 37(4), 391–403 (2001). https://doi.org/10.1023/A:1010893501581 19. Efrosinin, D., Breuer, L.: Threshold policies for controlled retrial queues with heterogeneous servers. Ann. Oper. Res. 141, 139–162 (2006). https://doi.org/10.1007/s10479-006-5297-5 20. Özkan, E., Kharoufeh, J.: Optimal control of a two-server queueing system with failures. Probability in the Engineering and Informational Sciences 28(10), 489–527 (2014). https://doi.org/10.1017/S0269964814000114 21. Legros, B., Jouini, O.: Routing in a queueing system with two heterogeneous servers in speed and in quality of resolution. Stochastic Models 33(3), 392–410 (2017). https://doi.org/10.1080/15326349.2017.1303615 22. He, Q.-M.: Fundamentals of Matrix-Analytic Methods. Springer, New York (2014). https://doi.org/10.1007/978-1-4614-7330-5