1. Introduction

Journal of Machine Learning Research 6 (2005) 363

2374-8486

10.2307/2685209

Oksana Pichugina

oksanapichugina1@gmail.com 1 2

Olha Matsiy

olga.matsiy@gmail.com 0 1 0 Kharkiv National Automobile and Highway University , Str.Yaroslava Mudrogo 25, 61002 Kharkiv , Ukraine 1 Machine Learning , Supervised Learning, Combinatorial Optimization, Linear Assignment 2 National Aerospace University "Kharkiv Aviation Institute" , 17 Chkalova Street, 61070 Kharkiv , Ukraine

2015

904 0000 0002

Hybrid approaches (Approach 1 and Approach 2) that combine Supervised Learning with Linear Combinatorial Optimization are offered to solve the problem of minimization of the total delivery expenditure for supplying customers by batches of rocks. Approach 1 applies multi-class classification for customers' assessment delivery expenditures, and Approach 2 solves an additional regression prediction problem. Numeric characteristics utilizing all available information are proposed allowing to evaluate the effectiveness of applying the approaches. The presented way to deal with classification problems with additional constraints can be extended on wide class real-world problems.

Problem classification regression

1. Introduction

Combinatorial optimization problems (CO problems, COPs) arise in almost every practice area where decisions are made based on the choice of indivisible objects [1]. Since the absolute majority of real COPs are np-hard, difficulties in their solution arise in the development of pseudopolynomial algorithms for their exact solving [2, 3, 4, 5, 6, 7, 8, 9]. On the other hand, the presence of additional constraints does not allow developing effective heuristics. Conditionally, to the class of COPs, it is possible to include the classical problems of Machine Learning, such as classification and clustering. At the same time, the main Supervised and Unsupervised Learning methods heuristic. This is due to the fact that, in these problems, there are no restrictions preventing forming an arbitrarily large sample of feasible solutions to the problem and applying heuristics to them. Indeed, each sample can be assigned to any class in a classification problem and an arbitrary cluster in a clustering problem. However, passing from the classical formulations to a practical problem, we invariably face natural constraints imposed on single elements or their combinations. This can be restrictions on the number of elements in some classes, constraints from above and below, on the distance between elements of the same or different classes, etc. In addition, some additional information is often known, such as misclassification penalties and correct classification rewards. Evidently, it is highly desirable to take them into account in the solution, but this is not always possible when using classical methods of Classification and Cluster Analysis. Meanwhile, having formalized such COPs as discrete or continuous optimization problems, there appears a real opportunity to solve them in a reasonable time exactly or with the accuracy assessment by suitable methods. In this paper, we will consider a practical problem that can be effectively solved by combining Machine Learning methods with classical CO methods.

Consider the following practical problem.

2021 Copyright for this paper by its authors.

Problem statement. Rocks are mined in several quarries and supplied to consumers in batches. It is necessary to develop a delivery plan for a certain set of such batches in order to fulfill customers’ demand and minimize the total delivery expenditure consisting of delivery costs, expenses for exceeding the price of rocks to the customer requests and penalties for the supply of rock batches of lower quality than provided for by the contract with the consumers.

This problem can be thought of as a classification problem, where the samples are the batches and classes are the consumers. Also, on the one hand, additional constraints are imposed on the delivery volume to particular consumers. On the other hand, a target function appears expressing the total delivery expenditure.

The presence of these two essential components deduce the problem from the standard classification problems, respectively, do not allow its solving by standard methods of classification. Therefore, we need new approaches to solving this problem. This is what this work is about. This paper offers two hybrid approaches (Approach 1 and Approach 2), combining Supervised Machine Learning with classical Combinatorial Optimization methods dealing with a generalized Linear Assignment Problem that directly maps these batches to the set of consumers. Approach 1 solves one more classification problem at the first stage of Machine Learning, which identifies a class of rock batches, and Approach 2 solves this problem, where the cost of a batch of rocks is predicted.

Supposedly, the proposed way will allow, on the one hand, to use historical data, namely, to apply Machine Learning on batches of rocks previously mined in these quarries, and on the other hand, to implement the linear programming method in the second phase of the approaches.

2. Prerequisites 2.1. Supervised Learning elements

Practical Machine Learning (ML) mostly uses Supervised Learning (SL) [10, 11, 12, 13]. In SL, you have an input variable vector ( ) and an output variable ( ) and you use a SL algorithm to learn the mapping function from the input to the output = ( ), where ∈ . The goal is to approximate the mapping function so well that when you have new input data ′ that you can predict the output variables ′ for that data. We come to regression or classification problems depending on which values, discrete or continuous, takes the function . Particularly, if ∶ → ℝ1, the problem of its finding is a regression problem (RP) [10]. In contrast, if ∶ → , where | | < ∞, the problem under consideration is a classification problem (CP) [10].

In other words, a CP [10, 14] is a problem of identifying to which class a new instance belongs based on available class membership for instances in a training set (TS).

Instances to which regression/classification is applied form a test set (TeS).

In classification, a TS-instance is given by a feature tuple and a class label . A TeS-instance is given by a feature tuple ′, while a class label ′ is unknown and needs to be found.

In regression, - is a real value representing outcome, while the task is to predict the real value ′ for the input vector .

can be a numeric vector but not necessarily since features presented in -components characterize certain instance’ properties, which can be categorical, ordinal, integer-valued, or real-valued.

If categorical or ordinal features are present, a preprocessing stage is required making mapping into Euclidean space:

Let be a set of classes.

To X , we will refer to as an instances' space and to

 x→x  X  R L.

C = C0 ,..., Cs ( 1 ) as a class label space.

A class

| Ci0 |= miJis0n Ci

Ci0  C of cardinality | Ci0 |= miaJs0x Ci

is called a major class (positive class), the one of size is called a minor class (negative class). A classification algorithm (CA) is intended to train a classifier, which is a function mapping an instances' space X into a class label space ( 2 ):

Type of function , linear or not, defines two large groups of classification algorithms – linear CAs and nonlinear CAs (see overview in [15, 11]). In addition, a wide group of highly effective methods called ensemble CAs (ECAs) exists [16, 17, 18].

Popular CAs are:

Y = 0,..., s f : X → Y.

( 2 ) ( 3 ) 1. Linear CAs: a) Logistic Regression (generalized linear model, LR) [19]; b) Linear Support Vector Machine Classification (SVM) [20]; c) Naive Bayes (NB) [21]; d) Linear Discriminant Analysis (LDA) [22]; 2. Nonlinear CAs: a) Decision Tree (DT) [23] - C4.5 [24], CART [25]; b) k-Nearest Neighbors (KNN) [26]; c) Artificial Neural Networks (Deep Learning, DL) [27]; d) Kernel SVM (KSVM) [28]; e) Quadratic Discriminant Analysis (QDA) [22]; 3. Ensemble CAs: a) Bagged Decision Trees (BDT); b) Random Forest (RF) [25]; c) Extra Trees (ET); d) Adaptive Boosting (AdaB) [29]; e) Gradient Boosting Machine (GBM) [30]; f) Stochastic Gradient Boosting (SGB) [31]; g) Extreme Gradient Boosting (eXtreme Gradient Boosting,XGBoost, XGB) [30]. A regression algorithm (RA) is targeted to train a regressor, which is a function mapping an 1 instances' space X into a prediction space Z  R : g : X → Z.

( 4 )

In SL, most of the algorithms are CAs. Some of them are generalized to regression. Among regression algorithms (RAs) are: 1. Linear Regression (LR) [13]; 2. Linear Support Vector Machine Regression (SVR) [32]; 3. Decision Tree Regression (DTR) [23]; 4. Deep Learning Regression (DLR) [33]; 5. Random Forest regression (RFR) [25, 34]. 2.2.

Linear Assignment Problem

Assignment problems (APs) deal with the question of how to assign items (jobs) to machines (workers) in the best possible way. APs consist of two components: the assignment as an underlying combinatorial structure and an objective function describing the "best way" [35, 36, 37].

Mathematically an assignment can be seen a bijective mapping of a finite set Jn = {1,..., n} into itself, i.e., an n -permutation of J n forming a permutation set n . Assignments can be modelled in different ways: every permutation   n corresponds in a unique way to a permutation matrix X = ( xij )i, j with xij = 1 for j =  (i) and xij = 0 for j  i . If the objective function is linear on X -components, the AP is called linear (LAP); if quadratic, it is called a quadratic AP (QAP) etc.

LAP integer programming model is [35]: find ∈ Π′ such that Here, C = (cij )i, j is the cost matrix of the assignments.

LAP is polynomially solvable [36]. It is generalized differently, such as allowing multiple assignments, when multiple machines can be assigned to one worker [35, 37]. As a result, the classical LAP-model ( 3 )-( 6 ) becomes

n F ( X ) = cij xij min;

i, j=1 n xij = 1, i  Jn ; j=1 n xij = 1, j  Jn ; i=1 xij {0,1}, i, j  J n .

n m F ( X ) = cij xij → min;

i=1 j=1 m xij = 1, i  Jn ; j=1 n xij = n j , j  Jm ; i=1 xij {0,1}, i  J n , j  J m , where m  n , n1 + ... + nm = n . The AP ( 9 )-( 12 ) as it is reducible to LAP with columns having multiplicities n1,..., nm , respectively.

3. Approaches 1,2 description 3.1. Modelling

Let us build a mathematical model of the problem stated in Introduction. Let C  C be class of rocks such as granite, marble, limestone, etc., y = y(C)  Z + be its number (a label), while z = p( y) be the known price of the rock batch of class y .

Suppose that several rocks can be mined in a quarry, rock samples are automatically processed. Namely, their size, shape, spectral characteristics are measured, after which the type of rock is automatically determined. Based on this information, the rock class of the entire batch is predicted. Suppose all the features are numeric, thus, X = X .

As a training set TS, we use historical data on rock batches for which their type is known, then xi  X is a vector of parameters of the shape, size and spectral characteristics of a training batch ( 5 ) ( 6 ) ( 7 ) ( 8 ) ( 9 ) ( 10 ) ( 11 ) ( 12 ) ( i  Jn ), while x  X  is a vector the same parameters for the batch involving in our problem, i.e. an i element of our training set TS ( i  Jn ). Respectively, Y = ( yi )Jn are real labels of TS, Z = ( zi )iJn , where zi = p( yi ) are the real cost of the batch i in a TS.

In order to assess the effectiveness of our approaches numerically, we offer two metrics that use actual labels of the test set. So, let Y  = ( yi)iJn be real labels of TeS, then Z  = ( zi)iJn , where zi = p( yi) will be the real cost of the batch i in a TeS ( i  J n ). multiplicities n1,..., nm , respectively.

3.1.1. Stage 1 The first stage differs for these two approaches that we offer.

Approach 1. Here, the training set has the form of

TS =  xi , yi iJn , TeS =  x  i iJn

Yˆ' = {yˆ'i}iJn , whereyˆ'i = f ( xˆ'i ), i  J n.

Zˆ' = {zˆ'i}iJn , wherezˆ'i = p( yˆ'i ), i  J n.

TS =  xi , zi iJn .

zi' = g ( xi) , i  J n , while the test set is is

Stage 1 consists of solving a CP on TS and applying a found classificator f for predicting labels of a TeS. It results in a multiset

Yˆ can be used for prediction of the cost of the batches, particularly, the collection of the prediction Approach 2. At Stage 1, we set a regression problem. In this case, the training set looks like this: The result of applying a regression algorithm to the set TS will be the regressor g . Applying it to our test set TeS, we obtain a set where zi' is a predicted cost of a rock batch i obtained by Approach 2.

Note that, in this step, we can use standard regression metrics to evaluate the quality of this prediction.

Now, let us move to the second Combinatorial Optimization stage. 3.1.2. Stage 2

This stage will be common for our approaches. The only difference is that, in Approach 1, Zˆ' is used as a vector of predicted cost while, in Approach 2, it will be Z .

Suppose that for the test set TeS, we also know from which quarry k out of a set J K rocks come. Accordingly, it is known their location, distances between the quarries is the same as the location of the consumers and the distance between them and to the quarries. This makes it possible to find the cost of delivering a batch from each of the quarries to the consumers.

Bm be the customers, A = ( akj )kJk , jJm be a cost matrix, where akj is the cost of delivery of the batch from the k -th quarry to the consumer Bj .

Approach 2. With the help of A , let us build another cost matrix

A' = ( a'ij )iJn , where a'ij is the predicted (with the help of Approach 2) total delivery expenditure of the batch i to the consumer Bj , j  Jm .

Here, aij includes, as one component, the delivery cost ak j , where ki is the quarry where the batch i i was mined.

The other two components directly depend on how the real cost of the batch relates to the cost of the rocks ordered by the customer and on the penalties that the consumers regulate for violating the delivery quality.

So, let b j be the cost of the rock ordered by the customer Bj , v j be the volume of ordered rock batches, d j be the penalty for delivering a low-quality batch i (per unit).

Then, if z > bi is satisfied for z  Z' = z'iiJn , then the mining company owners will receive less money for exceeding the quality of rock batches of the ordered level. If z = bi , then there will be no additional costs. Finally if z < bi , then the company will have to pay (bi − z ) di currency units as a penalty for the delivery of poor quality rocks. Thus

z'i − bi , if z'i  bi ; a'ij = aki j + (bj − z'i ) d j , if z'i < bi .

Having filled the matrix A' , we can proceed to the optimization step, in which it is decided where to send which batch in order to minimize the total delivery expenditure. This can be represented as the need to assign the batches to the consumers subject to the constraint on fulfilling the customers' orders.

The mathematical model is: find a set of unknowns such that

The objective function is

T = (t )

ij iJn, jJm 1, if the batch i goes to the customer Bj ; tij =  0, otherwise.

n m F'(t ) = a'ij tij → min

i=1 j=1 under constraints: − the batch is either supplied or not supplied at all, which can be expressed by inequality m tij  1, i  Jn , j=1 ( 13 ) ( 14 ) ( 15 ) ( 16 ) ( 17 ) − the costumers' orders are obligatory for execution: n ' tij  v j , j  Jm i=i2

The problem ( 15 )-( 18 ) is a generalized linear assignment problem that can be solved in polynomial time in dimension by methods such as Hungarian. Let T * = (t* ) ij iJn, jJm this AP. For the consistency of the problem, the condition be an optimal solution to n v j  n j=1 Aˆ ' = ( aˆ' ) ij iJn , must be satisfied, meaning that the number of rock bathes is sufficient to meet the consumer demand.

Approach 1. Here, the difference of this stage in Approach 1 compared to Approach 2 will be outlined.

The formula ( 13 ) is replaced by where aˆ'ij is the predicted (with the help of Approach 1) total delivery expenditure of the batch i to the consumer Bj , j  Jm .

The formula ( 14 ) becomes Respectively, the objective function ( 16 ) becomes

zˆ'i − bi , if zˆ'i  bi ; aˆ'ij = aki j + (bj − zˆ'i ) d j , if zˆ'i < bi .

n m Fˆ'(t ) = aˆ'ij tij → min .

i=1 j=1 n m F  (t ) = aij tij → min,

i=1 j=1 A = ( a ) ij iJn Let a solution to the AP (22) with constraints ( 15 ), ( 17 )-( 19 ) be denoted T * = (t* ) ij iJn , jJm . 3.2.

Comparison metrics

To compare the quality of the solutions T * ,T * , we solve one more AP, where actual costs of training set's items are used when comparing the optimal global solution with these two. This is an AP subject to constraints ( 15 ), ( 17 ), ( 18 ), where is a matrix of the total delivery expenditures, for instance, aij is the total delivery expenditure for the batch i delivered to the consumer Bj , j  Jm . A can be found similarly to ( 21 ) as ( 18 ) ( 19 ) ( 20 ) ( 21 ) (22) (23) (24) zi − bi , if zi  bi ; aij = aki j + (bj − zi) d j , if zi < bi .

Let T * be an optimal solution for the AP ( 15 ), ( 17 ), ( 18 ), (23) and F * = F (T * ) be its optimal value. Substituting T* , T * into the target function, we get not estimated but an actual delivery expenditure of implementing the plans T* , T * in real life. Let us denote F'* = F(T *) , Fˆ'* = F (T * ) . If F'* = Fˆ'* , then Approaches 1, 2 work equally, if F'* < Fˆ'* , Approach 1 showed itself better, if F'* > Fˆ'* , then it is the worst.

To understand how much we lost applying the predicted cost rather than real, we analyze the increments  = F'* − F'*, ˆ = Fˆ'* − Fˆ'*  =  / F'*,ˆ = ˆ / Fˆ'*. (25) (26) (27) representing absolute error if applying Approach 1 and Approach 2, respectively. The corresponding relative errors can be found using already found values

In our opinion, the metrics (26), (27) highlight specifics of the problem under consideration better than any classification metrics applicable to multi-classification problems [14, 15, 38, 39] such as the accuracy, balanced accuracy, recall, precision, F -score, G -mean, etc.

4. Discussion of Results and Future work

The absolute majority of research literary sources relating to Combinatorial Optimization dealing with exact solutions and Machine Learning, these two research fields are exiled. Perhaps, this is because the combinatorial optimization problems are highly complex, including designing qualitative heuristics, while Machine Learning is entirely based on heuristics.

The main contribution is that this paper presents an innovative approach to how these two research domains can be combined. The second one is the mathematical models developed here. Together with the scheme of both approaches, they provide a road map of implementing in programming languages such as Python, where powerful libraries in Machine Learning and Integer Linear Programming are embedded.

Areas of further work are: − Generalization to a higher class of practical problems.

− Collecting data on the location of Ukrainian open pits and the structure of their rocks, followed by a computational experiment using the presented approaches.

− Generalization to a class of stochastic programming problems, in which the predicted price is involved and predictions of the standard deviations from the average cost.

At the next stage of the research, we plan to apply a generalization of the classical regression model with one output for two outcomes - the predicted cost and standard deviation. In this regard, this stage is supposed to be carried out using neural networks.

5. Conclusion

This paper offers two hybrid approaches to minimize the total delivery expenditure for supplying customers by rock batches. They combine Machine Learning techniques with classical Linear Combinatorial Optimization methods. Approach 1 solves an auxiliary classification problem at the stage of Machine Learning, while Approach 2 solves an additional regression problem instead. Numerical evaluation characteristics for assessing the approaches are presented. Ways to develop the contribution theoretically and experimentally are outlined.

6. References

[1]

Y. G.

Stoyan ,

S. V.

Yakovlev , Theory and Methods of Euclidian Combinatorial Optimization: Current Status and Prospects, Cybernetics and Systems Analysis 56 ( 2020 ) 366 - 379 . doi: 10 . 1007/s10559-020-00253-6.

[2]

Pichugina ,

Matsyi , Boolean Satisfiability Problem: Discrete and Continuous Reformulations with Applications , in: 2020 IEEE 15th International Conference on Advanced Trends in Radioelectronics , Telecommunications and Computer Engineering (TCSET), 2020 , pp. 623 - 627 . doi: 10 .1109/TCSET49122. 2020 . 235507 .

[3]

Pichugina ,

Muravyova , Data Batch Processing: Modelling and Applications , in: 2020 IEEE International Conference on Problems of Infocommunications. Science and Technology (PIC S&T) , 2020 , pp. 765 - 770 . doi: 10 .1109/PICST51311. 2020 . 9467928 .

[4]

Pichugina , On Modelling of Computer Cluster Optimization Problem With Applications , in: 2020 IEEE KhPI Week on Advanced Technology (KhPIWeek) , 2020 , pp. 429 - 436 . doi: 10 .1109/ KhPIWeek51551. 2020 . 9250145 .

[5]

Pichugina , Mathematical Modeling Of Transport Logistics Problems, in: 2020 IEEE 2nd International Conference on System Analysis Intelligent Computing (SAIC) , 2020 , pp. 1 - 5 . doi: 10 .1109/SAIC51296. 2020 . 9239155 .

[6]

Pichugina , Diet-Menu Problem Modelling And Applications, in: 2020 IEEE 2nd International Conference on System Analysis Intelligent Computing (SAIC) , 2020 , pp. 1 - 5 . doi: 10 . 1109/SAIC512 96. 2020 . 9239149 .

[7]

Pichugina , New Approaches to Modelling Covering Problems in Monitoring Optimization, in: 2019 IEEE International Scientific-Practical Conference Problems of Infocommunications, Science and Technology (PIC S T) , 2019 , pp. 300 - 304 . doi: 10 .1109/PICST47496. 2019 . 9061386 .

[8]

Yakovlev ,

Kartashov ,

Pichugina ,

Korobchynskyi , Genetic Algorithms for Solving Combinatorial Mass Balancing Problem , in: 2019 IEEE 2nd Ukraine Conference on Electrical and Computer Engineering (UKRCON) , 2019 , pp. 1061 - 1064 . doi: 10 .1109/UKRCON. 2019 . 8879938 .

[9]

S. V.

Yakovlev , On Some Classes of Spatial Configurations of Geometric Objects and their Formalization , Journal of Automation and Information Sciences 50 ( 2018 ) 38 - 50 . doi:10.1615/ JAutomatInfScien.v5 0.i9.3 0.

[10]

Alpaydin , Introduction to machine learning, Adaptive computation and machine learning , 2nd ed., MIT Press, Cambridge, Mass, 2010 . OCLC: ocn317698631.

[11] C. M. Bishop , Pattern Recognition and Machine Learning , 1st ed. 2006 . corr. 2nd printing 2011 ed., Springer, New York, 2011 .

[12]

Kubat , An Introduction to Machine Learning, 2 ed ., Springer International Publishing, 2017 . doi: 10 .1007/978-3- 319 -63913-0.

[13]

Forsyth , Applied Machine Learning, Springer International Publishing, 2019 . URL: https:// www.springer.com/gp/book/9783030181130. doi: 10 .1007/978-3- 030 -18114-7.

[14]

Drummond , Classification, in: C. Sammut, G. I. Webb (Eds.), Encyclopedia of Machine Learning , Springer

, Boston, MA, 2010 , pp. 168 - 171 . doi: 10 .1007/978-0- 387 -30164-8\_ 111 .

[15]

I. H.

Witten , E. Frank,

M. A.

Hall ,

C. J.

Pal , Data Mining: Practical Machine Learning Tools and Techniques , 4 ed., Morgan Kaufmann, Amsterdam, 2016 .

[16]

T. G.

Dietterich , Machine-learning research; four current directions , AI Magazine 18 ( 1997 ) 97 - 137 .

[17]

Brown , Ensemble Learning, in: C. Sammut, G. I. Webb (Eds.), Encyclopedia of Machine Learning , Springer

, Boston, MA, 2010 , pp. 312 - 320 . doi: 10 .1007/978-0- 387 -30164-8\

[18]

Fan , Cost-sensitive, Scalable and Adaptive Learning Using Ensemble Methods: Theory and Application , VDM Verlag, Saarbrucken, 2009 .

[19]

McCullagh ,

J. A.

Nelder , Generalized Linear Models, 2nd edition ed., Chapman and Hall/CRC, Boca Raton, 1989 .

[20]

Cortes ,

Vapnik , Support-vector networks , Machine Learning 20 ( 1995 ) 273 - 297 . doi: 10 .1007/BF00994018 .

[21]

Domingos ,

Pazzani , On the Optimality of the Simple Bayesian Classifier under Zero-One Loss , Machine Learning 29 ( 1997 ) 103 - 130 . doi: 10 .1023/A: 1007413511361 .