An Approach to User Feedback Processing in Order to Increase Clustering Results Quality Pavel V. Dudarin and Nadezhda G. Yarushkina Ulyanovsk State Technical University, Ulyanovsk, Russia Abstract Dataset clustering could have more than one “right” result depending on a user intention. For example, texts could be clustered according to their topic, style or author. In case of unsatisfactory results, a data scientist needs to re-construct a feature space in order to change the results. The relation between the feature space and the result are often quite complicated. The latter results in building several clustering models to explore useful relations. Interactive clustering with feedback is aimed to cope with this problem. In this paper an approach to user feedback processing during clustering is presented. The approach is based on end-to-end clustering and uses an autoencoder neural network. This technique allows to adjust iteratively the computing clusters without changing feature space. Keywords 1 Clustering, Interactive Clustering, Mixed-Initiative Clustering, Constrained Clustering, Semi- Supervised Clustering, End-to-End Clustering, Learning to Cluster, Clustering with Intent, Deep Embedding, Deep Representation, Feedback, Neural Networks. 1. Introduction Clustering methods traditionally considered as unsupervised. Unsupervised learning is possible as long as data contains its meta information inside. Clustering methods are aimed to retrieve this meta information and use it to partition or hierarchically organize user data [18]. However in practice data scientists usually have background knowledge or at least a hypothesis about explored dataset. This is true for any kind of domain: economical [11], data collected from sensors, industrial devices [28] or data collected by computer program [15]. Almost in every clustering case an expert assistance is vital for data correction and validation, clustering structure correction or hierarchy changes, or it is useful for significant improvement of clustering result by means of expert knowledge which is not included in the data itself. An expert assistance is of big importance in text clustering [12]. Clustering of short text is one of the most challenging tasks [14]. It is almost impossible to get a partition without providing additional information about user intent [13]. Apart from the obvious partitioning by topic, many other partitioning could be useful: type of person story, target auditory, legal status or a combination. It is not clear what is the rule to construct clusters. This problem is summarized as follows: “there is no right clustering, but there are useful” [Bae J. et al., 2020]. Expert feedback may be unavailable, but its embedding in the clustering process greatly improves the results and is much in demand by users [27]. But it is important that an expert be involved seamlessly in this process, and internal understanding of algorithm details was not necessary. A clustering algorithm should provide clear connection between expert knowledge and the result. In [Bae J. et al., 2020] authors show growing interest in interactive clustering i.e., clustering with user feedback. Fig. 1. shows amount of studies and clustering methods used. Russian Advances in Artificial Intelligence: selected contributions to the Russian Conference on Artificial intelligence (RCAI 2020), October 10-16, 2020, Moscow, Russia EMAIL: p.dudarin@ulstu.ru (P. Dudarin); jng@ulstu.ru (N. Yarushkina) ORCID: 0000-0003-2354-6527 (P. Dudarin); 0000-0002-5718-8732 (N. Yarushkina) © 2020 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Figure 1: Amount of developed interactive clustering methods by years. In this paper an approach to clustering under expert feedback is proposed. This approach allows to incorporate user feedback into a wide range of clustering methods based on neural networks, for example into DEC algorithm (Unsupervised Deep Embedding for Clustering Analysis) [31]. The rest of the paper is organized as follows: Section 2 contains a review and analysis of related work. In Section 3 we give a formal description of the problem. Section 4 introduces the proposed approach. In Section 5 we describe experiments and discuss their results. Section 6 concludes the paper. 2. Related work In order to determine a circle of related work it is necessary to clarify definitions and classification of clustering methods. Modern scientific studies tend to name clustering methods where additional information is used, which is not included in the dataset, as semi-supervised clustering methods or constrained clustering methods [6]. The majority of studies in this field have additional information ‘a priory’, as input information with explored dataset. Usually, this information is given as pair-wise constraints [9], partially marked labels [5], constraints on a hierarchy structure or background information is given by means of transfer learning [30] (for example, as node weights from neural network trained on related classification problem). Besides, all the constraints and labels could be fuzzy (for example, soft labels) [24]. Meanwhile, there are methods receiving additional information during clustering process. A comprehensive review of these methods is done in [2]. These methods are called as interactive clustering methods. One of the first methods in this area was fuzzy and was proposed in [26]. Depend on interaction character and type of received information all the methods could be divided in groups: active clustering as a part of active learning field [16, 8]; reinforcement clustering, where feedback is received from natural or artificial environment [3]; interactive clustering with user feedback or mixed- initiative clustering, where feedback is received from a user as a reaction, correction command or evaluation of the clustering result. The last mentioned methods allow to reveal latent intentions of the user and obtain really useful clustering, because the user recognizes the right result when looks on it. Many authors point that there are some types of studies considered as interactive clustering by mistake: methods with interactive visualization of clustering results, methods of interactive choice of clustering algorithms and some others [2]. Methods of assisting clustering also should be mentioned [7]. The leading role in these methods plays a user, which defines clusters, their structure, while algorithm just suggests candidate objects for each cluster. These methods are not widely used. Interactive clustering methods with user feedback could be grouped according to the type of feedback. The first group, which includes the majority of studies, contains studies where a user iteratively and interactively could directly change algorithm parameters, similarity metric, clustering features [20]. The second group contains methods where a user interacts directly with the result of clustering. The user could point which clusters should be merged or split, directly move elements between clusters or decide what to do with outliers [4] and [2]. An approach proposed in this paper relates to the second group. The user does not need to know the algorithm details in order to use it and could switch between similar methods without changing the main character of its work. The first step of clustering, evidently, is just an unsupervised clustering. So, all the interactive clustering methods are based on top of some unsupervised clustering algorithm by adding feedback processing into it. According to review [2] the majority of interactive clustering methods are based on: k-means, c-means, agglomerative clustering and graph clustering. A few studies use neural networks with SOM (Kohonen self-organized maps) architecture. The classical clustering methods are successfully used in many cases and show high results, although the most of best results are shown by modern clustering methods. The majority of modern clustering methods are based on deep neural networks [22, 29, 32, 33]. Theirs advantage could be explained by ability of neural networks to transfer learning and learning to cluster, by complicated non-linear embedding used for feature construction (representation learning, embedding learning) which is more appropriate for clustering methods (for example, dimension reduction is a good way to increase clustering result quality) [35]. But the major contribution of neural network technique into clustering is a way to construct end-to-end clustering methods. There is no division into feature construction phase and partitioning phase in end-to-end clustering [23] and [17]. This allows to learn simultaneously clustering features that suites the best for the current task and to perform partitioning. There are studies where transfer learning abilities are shown: a neural network trained to cluster images in one domain was used to do the same task in another but related domain. There are some semi-supervised methods among those based on a neural network [30, 19], but they just use labels and pair-wise constraints to tune the loss function and these constraints could not be changed during the clustering process. 3. Related work A study by [1] is dedicated to construction of meta-schema that generalizes taxonomy of modern clustering methods based on neural networks. The most of modern methods correspond to this schema (see Fig. 2.) [10, 31, 34]. The schema shows that feedback usage in a loss function allows to simultaneously fine-tune latent feature space according to user intent and perform clustering itself. Figure 2: Generalized schema of clustering methods architecture based on neural networks. Current study is aimed to construct interactive clustering methods with user feedback based on generalized schema described above. As a basic clustering method for this task method based on neural network with Kullback-Leibler divergence as a common loss function could be used. In particular, in this paper proposed a method based on DEC (Unsupervised Deep Embedding for Clustering Analysis) [31]. Feedback processing could be added to this method due to special properties of loss function based on Kullback-Leibler divergence that intuitively could be seen as a gravity controller between dataset objects and cluster centers. So, slight changes to the gravity between them lead to managed changes in clustering results. 4. An approach to user feedback processing The proposed approach suggests the idea that the clustering result criticism is the most easy and precise feedback. So it allows two types of feedback: “object Xi should be included into cluster Cj” and “object Xi should not be included into cluster Cj”. One portion of feedback could contain any number of constraints. For example, the swapping two elements between clusters implies to establish two constraints of the first type. As it was mentioned above, the DEC (Unsupervised Deep Embedding for Clustering Analysis) algorithm has been chosen as a base for the proposed approach, but the same technique could be applied to many other methods, for example to DEPICT [10]. Input dataset X={xi | i ∈ [0, N) }, N – amount of objects in the dataset. The initial vector space of objects with a help of encoder (which is a part of pre-trained autoencoder) projects to another vector space of lower dimensionality: fθ: X → Z, where θ – neural network parameters (weights), Z – latent feature space. Feature space Z is called latent, because it is constructed in unsupervised way during clustering process as a hidden layer of neural network. In this study an encoder of following structure is used: d-50-50-20-k, where d - dimensionality of input dataset, k – amount of clusters. The result of clustering algorithm is a set of cluster centers in a space Z: {µj ∈ 𝑍𝑍| j ∈ [0, k)}, where k – demanded amount of clusters. Cluster centers initialization is provided by k-means clustering over initial object representation obtained from autoencoder. The process of cluster centers searching and the process of feature space construction performed simultaneously by means of common loss function. As a similarity measure between object and cluster center a metric based on Student’s t-distribution with one degree of freedom (Q) is used. (1 + ||𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 ||2 )−1 𝑞𝑞𝑖𝑖𝑖𝑖 = 𝑘𝑘 ). ∑𝑙𝑙=0(1 + ||𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑙𝑙 ||2 )−1 Objective function (loss function) is composed as Kullback-Leibler divergence between actual distribution Q and auxiliary target distribution P. 𝑝𝑝𝑖𝑖𝑖𝑖 𝐿𝐿 = 𝐾𝐾𝐾𝐾(𝑃𝑃||𝑄𝑄) = � � 𝑝𝑝𝑖𝑖𝑖𝑖 log . 𝑖𝑖 𝑗𝑗 𝑞𝑞𝑖𝑖𝑖𝑖 Auxiliary target distribution P is defined as: 𝑞𝑞𝑖𝑖𝑖𝑖 2 �𝑓𝑓𝑗𝑗 𝑝𝑝𝑖𝑖𝑖𝑖 = 𝑘𝑘 , 𝑓𝑓𝑗𝑗 = � 𝑞𝑞𝑖𝑖𝑖𝑖 . ∑𝑙𝑙=0 𝑞𝑞𝑖𝑖𝑖𝑖 2 ⁄𝑓𝑓𝑙𝑙 𝑗𝑗 This distribution has some important properties: (1) strengthen predictions (i.e., improve cluster purity), (2) put more emphasis on data points assigned with high confidence, and (3) normalize loss contribution of each centroid to prevent large clusters from distorting the hidden feature space. It is quite obvious that the loss function is aimed to make qij greater than pij. Partial derivatives are: 𝜕𝜕𝐿𝐿 2 −1 = 2 � �1 + ��𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 �� � ∗ �𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑞𝑞𝑖𝑖𝑖𝑖 � ∗ �𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 � , 𝜕𝜕𝑧𝑧𝑖𝑖 𝑗𝑗 𝜕𝜕𝐿𝐿 2 −1 = −2 � �1 + ��𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 �� � ∗ �𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑞𝑞𝑖𝑖𝑖𝑖 � ∗ �𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 � . 𝜕𝜕𝜇𝜇𝑗𝑗 𝑖𝑖 It is evident that in the case of a negative value (pij-qij) < 0 an object Xi will be pushed out of the cluster Cj, even though there is no loss from objective function. This loss function is used during the first iteration of the algorithm while no feedback was provided. Then the user looking at the clustering result provides a feedback. In order to add feedback processing into this algorithm it is proposed to use following equation for partial derivatives instead: 𝜕𝜕𝐿𝐿 2 −1 = 2 � �1 + ��𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 �� � ∗ �𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑞𝑞𝑖𝑖𝑖𝑖 � ∗ 𝑡𝑡𝑖𝑖𝑖𝑖 ∗ �𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 � , 𝜕𝜕𝑧𝑧𝑖𝑖 𝑗𝑗 𝜕𝜕𝐿𝐿 2 −1 = −2 � �1 + ��𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 �� � ∗ �𝑝𝑝𝑖𝑖𝑖𝑖 − 𝑞𝑞𝑖𝑖𝑖𝑖 � ∗ 𝑡𝑡𝑖𝑖𝑖𝑖 ∗ �𝑧𝑧𝑖𝑖 − 𝜇𝜇𝑗𝑗 � , 𝜕𝜕𝜇𝜇𝑗𝑗 𝑖𝑖 Using T = {tij} –feedback matrix (user’s tips), where : > 0 , 𝑡𝑡𝑡𝑡 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖 𝑖𝑖𝑖𝑖𝑖𝑖𝑖𝑖 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑗𝑗 𝑡𝑡𝑖𝑖𝑖𝑖 = �< 0, 𝑡𝑡𝑡𝑡 𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒 𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜𝑜 𝑖𝑖 𝑜𝑜𝑜𝑜𝑜𝑜 𝑜𝑜𝑜𝑜 𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐𝑐 𝑗𝑗 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Absolute value tij defines a velocity (gravity force) of the objects and cluster centers movements. Also the learning rate of neural network influences this velocity. In experiments performed in this study value |tij| = 1000 was used for all the cases. Although experiments have shown that in cases with moving object out of the cluster it is better to use higher absolute values of tij. Iteration with a user feedback are performed till the satisfactory result of clustering. Each iteration the user provides new feedback matrix. The neural networks weights and cluster centers are tuned according to received feedback. 5. Experiments description and result analysis To demonstrate usefulness and effectiveness of the proposed approach two types of experiments have been done. Firstly, synthetic generated data set was used. This dataset consists of simple one-hot vectors. It seems that clustering of this data set is trivial, but in practice any partition of one-hot vectors could be right, depending on user intention. It will be shown below how the user could change a wrong partition to make it right. Secondly, an experiment with Fishers’ Irises has been done. Fishers’ Irises is a common dataset for classification and clustering tasks. There is no standard benchmark dataset for iterative clustering. However by clustering Fishers’ Irises dataset the comparison with other clustering methods could be done. An ability to significantly increase clustering result quality will be demonstrated in this experiment. 5.1. Experiments with synthetic generated dataset There were generated synthetic dataset consisted of 400 items of one-hot vectors, as follows: 1. As a base 4 classes of one-hot vectors {(1,0,0,0); (0,1,0,0); (0,0,1,0); (0,0,0,1)} have been defined. 2. Random noise from uniform distribution U[0, 1/10] was added to each vector to generate 125 vector variations just in order to augment data. 3. Projection into latent feature space for the first 12 vectors will be shown as a clustering result. 3 samples from each group. This is done to make figures clear without uninformative details. A clustering process aimed to produce 2 clusters in this dataset was performed. Clustering results are shown in Table 1. Final value of autoencoder loss function was 0.000341. Figure 3 shows a distribution of first 12 vectors (in a latent feature space) from dataset on the plane. From this starting point 3 experiments were conducted according to different possible user feedback cases: vector X1 should be included in C1; vector X1 should be moved out from C1; complex feedback with a command to swap vectors X2 and X3. All the experiments were performed from one starting point in order to reduce amount of figures, but sequential user feedback processing is also possible without any limitations. Table 1 Sample vectors list of synthetic generated dataset X Coordinates Cluster X0 1.0191519 0.06221088 0.04377278 0.07853585 С0 X1 0.07799758 1.0272592 0.02764643 0.08018722 С0 X2 0.09581394 0.08759326 1.0357817 0.05009951 С1 X3 0.06834629 0.07127021 0.03702508 1.0561196 С1 Figure 3: Clustering result (12 first vectors are shown). The first experiment suggests that a user knows that vector X1 is semantically closer to vectors X2 and X3 that to X1. According to this, a feedback demanding that vector X1 should be included into the cluster C1 in a form of feedback matrix was provided: T[500,4] = {tij | i ∈ [0, 500), j ∈ [0, 4)}, where 1000, 𝑖𝑖 = 1, 𝑗𝑗 = 1; 𝑡𝑡𝑖𝑖𝑖𝑖 = � 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Result of 100 epochs of clustering is shown on Fig. 4. As it could be seen all the 125 vectors from the second group have been moved with X1 into C1 (feedback has been provided joust for 1 vector). Also it is worth pointing out that semantic distance between the third and the fourth classes has been preserved, the same as for mutual arrangement of vectors inside each class. Figure 4 (a,b,c,d): Clustering results of moving X1 into C1. 4a – result of the first stage of clustering, 4b – results after a user feedback provided, 4c and 4d – zoomed in resulting clusters. The second experiment suggests that a user knows that vector X2 is semantically could not belong to the same cluster with X3. In this case user does not point out the cluster where X2 should be included. According to this, a feedback demanding that vector X2 should be moved out of the cluster C1 in a form of feedback matrix was provided: T[500,4] = {tij | i ∈ [0, 500), j ∈ [0, 4)}, where −1000, 𝑖𝑖 = 2, 𝑗𝑗 = 1; 𝑡𝑡𝑖𝑖𝑖𝑖 = � 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Result of 100 epochs of clustering is shown on Fig. 5. As it could be seen vector X2 and all the vectors from the third class have been moved out of C1 and were included into C0. It worth pointing out that as long as “push out” constraint was used, the third class is located at the farthest possible position regarding to C1. Also this experiment has shown lower velocity of convergence that the previous one. Figure 5 (a,b): Clustering results of moving vector X2 out of C1. 5a – the first clustering stage results, 5b – results of iteration with a user feedback. The third experiment suggests that a user knows that vector X1 and X2 should be swapped. According to this, a feedback matrix was provided: T[500,4] = {tij | i ∈ [0, 500), j ∈ [0, 4)}, where 1000, 𝑖𝑖 = 1, 𝑗𝑗 = 1; 𝑡𝑡𝑖𝑖𝑖𝑖 = �1000, 𝑖𝑖 = 2, 𝑗𝑗 = 0; 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Result of 100 epochs of clustering is shown on Fig. 6. Not only the vectors X1 and X2 has been swapped, but the whole classes 2 and 3 too. Figure 6 (a,b): Clustering results of swapping two vectors (X1 and X2). 6a – the first clustering stage results, 6b – results of iteration with a user feedback. For the fourth experiment noise level was 10 times increased using another uniform distribution U[0, 1]. This is done to show general clustering abilities of proposed method. Four samples of the input dataset are shown in Table 2. Fig.7a demonstrates the first 12 vectors partitioned by the first clustering iteration. Vectors X0 and X2 were partitioned incorrectly. To correct this result a feedback matrix was constructed: T[500,4] = {tij | i ∈ [0, 500), j ∈ [0, 4)} where 1000, 𝑖𝑖 = 0, 𝑗𝑗 = 0; 𝑡𝑡𝑖𝑖𝑖𝑖 = � 1000, 𝑖𝑖 = 2, 𝑗𝑗 = 1; 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Result of 100 epochs of clustering is shown on Figure 7b, just the wrong portioned vectors were moved, while earlier correctly partitioned vectors have saved their cluster labels. This shows how the user feedback gently corrects results without dramatic changes in partitioning itself. Table 2 Vector samples with increased noise X Coordinates Cluster X0 1.191519 0.6221088 0.4377278 0.7853585 С0 X1 0.7799758 1.272592 0.2764643 0.8018722 С0 X2 0.9581394 0.8759326 1.357817 0.5009951 С1 X3 0.6834629 0.7127021 0.3702508 1.561196 С1 Figure 7 (a,b): Clustering results of highly noised vectors. 7a – the first clustering stage results, 7b – results of iteration with a user feedback. 5.2. Experiments with synthetic Fishers’ Irises dataset Fishers’ Irises dataset is a standard dataset for classification and clustering tasks [21]. It has petal and sepal length and width as features. This dataset has 3 classes ‘setosa’, ‘versicolor’, ‘virginica’. Table 3 shows vector samples for each class. As long as Fishers’ Irises dataset has 150 items only, in order to augment data slight random noise from uniform distribution U[0, 1/10] has been added to each vector to multiply this data set in 4 times. In total 600 items have been obtained. Table 2 Vector samples with increased noise X Coordinates Cluster X0 5.1191519 3.5622108 1.4437727 0.2785358 С0 (setosa) X1 7.0779975 3.2272592 4.7276464 1.4801872 С1 (versicolor) X2 6.3958139 3.3875932 6.0357817 2.5500995 С2 (virginica) Figure 8a demonstrates the result of the first clustering iteration. Cluster for the first class ‘setosa’ has been detected without any errors. However other two classes have 13 and 16 incorrectly partitioned flowers respectively. Classes ‘versicolor’ and ‘virginica’ are actually close to each other and even classification algorithms could not distinguish them without any errors. However, let the user knows the real classes just for two items (in this example vectors with numbers 7 and 50) that have been partitioned incorrectly, so they should be swapped. Feedback matrix form the user provided: T[600,3] = {tij | i ∈ [0, 600), j ∈ [0, 3)}, where 1000, 𝑖𝑖 = 7, 𝑗𝑗 = 2; 𝑡𝑡𝑖𝑖𝑖𝑖 = �1000, 𝑖𝑖 = 50, 𝑗𝑗 = 1; 1, 𝑜𝑜𝑜𝑜ℎ𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒𝑒. Result of 100 epochs of clustering is shown on Figure 8b (to project 3-dimensional vectors from latent feature space on a plane the t-SNE algorithm from python sklearn library has been used [25]). Incorrectly partitioned items 7 and 50 were moved to their classes, besides more errors were corrected by clustering method in unsupervised manner. After the 100-th epoch of the second iteration just 8 and 2 items partitioned incorrectly according to ground truth in 2-nd and 3-rd classes respectively. An accuracy of the resulting partitioning is 0.98(3). To compare, the average result of clustering performance state-of-the-art unsupervised clustering methods is 0.85 and for the state-of-the-art classification method is 0.971. Many clustering algorithms are not able to distinguish 2-nd and 3-rd classes at all [21]. Figure 8 (a,b): Clustering results for the Fishers’ Irises dataset. Amount of incorrectly partitioned items has been decreased from 26 (8a) to 10(8b) due to feedback provided. 6. Conclusion In this paper the recent methods of interactive clustering with user feedback are discussed. There are a lot of modern unsupervised clustering methods demonstrating as good results as the state-of-the- art approaches, but a few methods could process user feedback. To compensate the lack of interactive methods an approach to user feedback processing as interactive clustering technique with user feedback was proposed. This approach is based on neural network and the DEC algorithm was used as a base to demonstrate the core idea of proposed technique. Experiments have shown usability and effectiveness of the proposed approach. Although, lack of sensibility to “moving out of the cluster” operation was detected. Future studies will be dedicated to investigation of possible alternatives to auxiliary target distribution and further exploring its properties. Also it is planned to add more types of feedback to the approach, for example, pair-wise feedback or structure of desired hierarchy. Acknowledgements The report study was funded by RFBR and the government of Ulyanovsk region according to the research project Num. 18-47-00019. References [1] Aljalbout E., Golkov V., Siddiqui Y., Strobel M., Cremers D., Clustering with Deep Learning: Taxonomy and New Methods, 2018. arXiv:1801.07648. [2] Bae J., Helldin T., Riveiro M. Nowaczyk S., Bouguella M., Falkman G., Interactive Clustering: A Comprehensive Review, in: ACM Comput. Surv., 2020, Vol. 53, No. 1. [3] Bagherjeiran A., Eick C. F., Chen C.-S., Vilalta R., Adaptive clustering: obtaining better clusters using feedback and past experience, in: Fifth IEEE International Conference on Data Mining (ICDM'05), Houston, TX, 2005. [4] Balcan M.F., Blum A., Clustering with Interactive Feedback, in: Freund Y., Györfi L., Turán G., Zeugmann T. (eds) Algorithmic Learning Theory. Lecture Notes in Computer Science, vol 5254. Springer, Berlin, Heidelberg, 2008. [5] Basu S., Banerjee A., Mooney R., Semi-supervised Clustering by Seeding., in: Proceedings of 19th International Conference on Machine Learning, 2002. [6] Basu S., Davidson I., Wagstaff K., Constrained Clustering: Advances in Algorithms, Theory, and Applications, CRC Press, 2008. [7] Basu S., Fisher D., Drucker S.M., Lu H., Assisting Users with Clustering Tasks by Combining Metric Learning and Classification, in: Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence, 2010. [8] Dasgupta S., Ng V., Which Clustering Do You Want? Inducing Your Ideal Clustering with Minimal Feedback, 2014. arXiv:1401.5389. URL: https://arxiv.org/abs/1401.5389. [9] Demiriz A., Bennett K.P., Embrechts M.J., A Genetic Algorithm Approach for Semi-Supervised Clustering, in: International Journal of Smart Engineering System Design, 2002, vol. 4. [10] Dizaji K.G., Herandi A., Deng C., Cai W., Huang H., Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization, in: IEEE International Conference on Computer Vision (ICCV), Venice, 2017. [11] Dudarin P., Pinkov A., Yarushkina N., Methodology and the algorithm for clustering economic analytics object, in: Automation of Control Processes. 2017. Vol. 47, № 1. P. 85-93. [12] Dudarin P.V., Yarushkina N.G., An Approach to Fuzzy Hierarchical Clustering of Short Text Fragments Based on Fuzzy Graph Clustering, in: Proceedings of the Second International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'17). IITI 2017. Advances in Intelligent Systems and Computing. 2018. vol 679. Springer. Cham. [13] Dudarin P., Samokhvalov M., Yarushkina N., An Approach to Feature Space Construction from Clustering Feature Tree, in: Kuznetsov S., Osipov G., Stefanuk V. (eds) Artificial Intelligence. RCAI 2018. Communications in Computer and Information Science, vol 934. Springer, Cham, 2018. [14] Dudarin P.V., Tronin V.G., Svyatov K.V., A Technique to Pre-trained Neural Network Language Model Customization to Software Development Domain, in: Kuznetsov S., Panov A. (eds) Artificial Intelligence. RCAI 2019. Communications in Computer and Information Science, vol 1093. Springer, Cham, 2019. [15] Dudarin P.V., Tronin V.G., Svatov K.V., Belov V.A., Shakurov R.A., Labor intensity evaluation technique in software development process based on neural networks, in: Proceedings of the Second International Scientific Conference "Intelligent Information Technologies for Industry" (IITI'19). Advances in Intelligent Systems and Computing, Springer. Cham, 2020. [16] Fatehi K., Bozorgi A., Zahedi M.S., Asgarian E., Improving semi-supervised constrained k- means clustering method using user feedback, in: Journal of Computing and Security, 2014, Volume 1, Number 4. [17] Greff K., van Steenkiste S., Schmidhuber J., Neural Expectation Maximization, in: Advances in Neural Information Processing Systems 30, 2017. [18] Hastie T., Tibshirani R., Friedman J., The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Second Edition, in: Springer Series in Statistics book series, 2009. [19] Hoffer E., Ailon N., Deep Metric Learning Using Triplet Network, in: Feragen A., Pelillo M., Loog M. (eds) Similarity-Based Pattern Recognition. Lecture Notes in Computer Science, vol 9370. Springer, Cham, 2015. [20] Huang Y. Mixed-Iterative Clustering, PhD thesis at Language Technologies Institute School of Computer Science Carnegie Mellon University Pittsburgh, PA 15213, 2010. [21] Leela V., Sakthipriya K., Manikandan R., Comparative Study of Clustering Techniques in Iris Data Sets, in: World Applied Sciences Journal 29 (Data Mining and Soft Computing Techniques), 2014. [22] Li L., Kameoka H., Deep Clustering with Gated Convolutional Networks, in: IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, 2018. [23] Meier B.B., Elezi I., Amirian M., Dürr O., Stadelmann T. Learning Neural Models for End-to- End Clustering, in: Artificial Neural Networks in Pattern Recognition edited by Pancioni L., Schwenker F., Trentin E., Lecture Notes in Computer Science, vol 11081. Springer, Cham, 2018. [24] Nebu C.M., Joseph S., Semi-supervised clustering with soft labels, in: International Conference on Control Communication & Computing India (ICCC), Trivandrum, 2015. [25] Pedregosa F., Varoquaux G, Gramfort A., Michel V., Thirion B., Grisel O., Blondel M., Prettenhofer P., Weiss R., Dubourg V., Vanderplas J., Passos A., Cournapeau D., Brucher M., Perrot M., Duchesnay É., Scikit-learn: Machine Learning in Python, in: Journal of Machine Learning Research, 2011, vol. 12. [26] Pedrycz W., Algorithms of fuzzy clustering with partial supervision, in: Pattern Recognition Letters, Volume 3, 1985. [27] Shelekhova N.V., Rimareva L.V., Management of Technological Processes of Production of Alcohol Products with the Application of Information Technology, in: Storage and processing of agricultural raw materials, Moscow, 2017(3). [28] Shelekhova N.V., Polyakov V.A., Serba E.M., Shelekhova T.M., Veselovskaya O.V., Skvortzova L.I., Information technology in the analytical quality control of alcoholic beverage, in: Food Industry, Moscow, 2018(12). [29] Suresh T., Meena Abarna K.T., LSTM Model for Semantic clustering of user-generated content using AI Geared to wearable Device, in: Semanticscholar.org Corpus ID: 212585860, 2017. URL: https://www.semanticscholar.org/paper/LSTM-Model-for-Semantic-clustering-of-content- using-Suresh-Abarna/7b72349284b78803fe2581a041e5c7a19a081bdc [30] Wang Z., Mi H., Ittycheriah A., Semi-supervised Clustering for Short Text via Deep Representation Learning, in: Proceedings of The 20th SIGNLL Conference on Computational Natural Language Learning, Association for Computational Linguistics, Berlin, Germany, 2016. [31] Xie J., Girshick R., Farhadi A., Unsupervised deep embedding for clustering analysis, in: ICML'16: Proceedings of the 33rd International Conference on International Conference on Machine Learning, 2002. [32] Xu J., Xu B., Wang P., Zheng S., Tian G., Zhao J., Self-Taught Convolutional Neural Networks for Short Text Clustering, in: IEEE Neural Networks, 2017, Volume 88. [33] Yang C., Shi X., Jie L., Han J., I Know You'll Be Back: Interpretable New User Clustering and Churn Prediction on a Mobile Social Application, in: the 24th ACM SIGKDD International Conference, 2018. [34] Yang J., Parikh D., Batra D., Joint Unsupervised Learning of Deep Representations and Image Clusters, in: IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Las Vegas, NV, 2016. [35] Yang B., Fu X., Sidiropoulos N.D., Hong M., Towards K-means-friendly spaces: Simultaneous deep learning and clustering, in: Proceedings of the 34th International Conference on Machine Learning, Volume 70, 2017.