Introduction

Comparison of Matrix Reordering Algorithms Based on Monotone Systems

Grete Lind

grete.lind@taltech.ee 0

Rein Kuusik

rein.kuusik1@taltech.ee 0 0 Tallinn University of Technology , Tallinn , Estonia

281 286

The aim of this paper is to shed light on matrix reordering methods: namely scale of conformity, minus technique, plus technique and mixed technique. All of them are based on the theory of monotone systems. Presented methods are applicable both to NxN (entityto-entity) and NxM (entity-to-attribute) data tables. All the methods can use larger set of discrete values (not only binary ones). Rows and columns are reordered separately. The result does not depend on the initial order of rows and columns. We compare the results of these methods through stress measure (both in the von Neumann neighborhood and in the Moore neighborhood) using binarized representations of well-known data sets.

Matrix Reordering Monotone Systems Conformity Stress

Introduction

Seriation is an exploratory data analysis technique to reorder objects into a sequence along a one-dimensional continuum so that it best reveals regularity and patterning among the whole series [ 1 ]. Seriation is often called matrix reordering, when applied to two-way datasets [ 2 ]. Two-way one-mode data table means entity-to-entity (NxN) data table and two-way two-mode is entity-to-attribute (MxN) data table [ 1 ].

Matrix reordering has been used in many different fields, from archaeology to operations research. A thorough historical overview is given by Liiv [ 1 ].

In this paper, we introduce little known matrix reordering methods that are based on the theory of monotone systems (MS) created by Mullat [ 3–5 ]. These methods – scale of conformity, minus technique, plus technique – have been created by Leo V˜ohandu at Tallinn University of Technology, department of Informatics [ 6–8 ]. We propose a novel MS-based method by Rein Kuusik called mixed technique [ 8 ].

According to Liiv [ 1 ] the main future goal for seriation is to make it ubiquitously usable, reordering the matrices should be a common practice for everybody inspecting any data table. Recently Liiv and V˜ohandu [ 2 ] experiment with asymmetric one-mode two-way (NxN) data tables. They propose that “seriation methods can be applied to analyze asymmetric one-mode two-way datasets as if they were two-mode two-way datasets while continuing to keep the information about entities actually belonging to one class”.

The paper is organized as follows. In the following subsections, we introduce stress measures and describe MS-based reordering methods. Section 2 is dedicated to the mixed technique. Experiments are presented in section 3 and conclusion in section 4. 1.1

Stress

Stress is a dissimilarity measure, it compares the values in a matrix with their neighbors. For an n × m matrix X, the local stress measure for element xij is defined for two types of neighborhood [ 9 ]: 1. in the Moore neighborhood–a square-shaped neighborhood comprising (at most) eight adjacent entries: sij = Σkm=inm(anx,i(+1,1i)−1)Σlm=imn(amx(,1j+,j−1)1)(xij − xkl)2 , (1) (3) sij = Σkm=inm(anx,i(+1,1i)−1)(xij − xkj )2 + Σlm=imn(amx(,1j+,j−1)1)(xij − xil)2 . (2)

A global stress measure for the whole matrix is the sum of the local stresses (in either neighborhood) for all entries of the table:

ST RESS = Σin=1Σjm=1sij .

In case of binary data, the local stress sij is just the count of the neighbors that have different values. 2. in the von Neumann neighborhood–a diamond-shape neighborhood comprising (at most) four adjacent entries: 1.2

Reordering Methods Based on Monotone Systems

Here we briefly introduce three methods for reordering data tables: scale of conformity, minus technique and plus technique [ 6–8 ]. Because of limited space we cannot present their algorithms and examples.

All the methods reorder rows and columns separately. Thus, the same algorithm can be applied for both rows and columns, using transposed table for one of them. Their result does not depend on the initial order of rows/columns.

The scale of conformity [ 6 ] reorders the data by object’s typicality using the conformity measure that is the sum of all attribute-value frequencies (of the row).

In case of iterative techniques [ 7 ], to wit minus technique, plus technique, rows/columns are removed from table one-by-one, after each removal the weights of the remaining rows/columns are recomputed. Iterative techniques use conformity as a weight function. Actually different weight functions (e.g. influence [ 6 ]) can be used.

Algorithm: Mixed technique for matrix rows

These reordering methods allow to process non-binary data and zeros are treated the same way as other values. Using a different weight function, we can treat zeros differently.

The data table will be reordered in order to better visualize the data. In case of conformity scale and plus technique, the most homogeneous group forms in the upper-left corner and the most atypical in the lower-right corner. In case of minus technique–on the contrary.

While these techniques help finding homogeneous groups, we miss a method offering possibly smooth changes. In the next section we will propose such a method. 2

Mixed Technique

Here we present a novel iterative method for matrix reordering, called mixed technique [ 8 ]. It is aimed to create a gradual way of changes, starting from the first object/attribute. The user chooses from which object/attribute to start. If the user cannot decide, then it can be chosen by minimal weight (as in minus technique). Further the closest object/attribute to the just eliminated one (by the number of coincidences) is chosen for removal. The weight is not based on the frequencies of (diminishing) initial table (as in case of plus technique and minus technique), but on the (growing) table of already removed objects/attributes.

Assume that we have an N × M data matrix X. Every element Xij , i = 1, . . . , N , j = 1, . . . , M , has a discrete value from an interval [1, K].

Algorithm (see above) starts like the minus technique (S1..S5) and after the first iteration, it continues similarly to the plus technique (S6..S13).

In order to demonstrate the mixed technique we use data given in Table 1 (a). The conformity i.e. weight of O1 is 2+2+4+2+2=12 (count(A1 = 1) + count(A2 = 2) + count(A3 = 2) + count(A4 = 2) + count(A5 = 2)). O1 has the smallest weight among objects, therefore it is selected first. After that starts the iterative part of the algorithm.

Table 1 (b) shows the weights of objects (rows) during 6 iterations and the order of elimination of rows, while Table 1 (c) shows the same for columns (during 5 iterations). Reordered data table is presented in Table 1 (d). Weights of rows W (Oi) and weights of columns W (Aj) present the weights at the moment of elimination of a row/column.

Compared to the previous techniques, mixed technique gives different information. By maximizing the similarity between consecutive rows/columns, it reveals an “evolutionary” way of changing/developing the initial object/attribute. 3

Experiments

We compare the results of four introduced algorithms by two stress measures [ 9 ] (see section 1.1) using 20 different data sets.

We use binarized representations of 20 data sets from UCI Machine Learning Repository1, the list of data sets, ordered by size, is given in Table 2.

Table 3 shows global stress values in both neighborhoods for all the considered data sets for all 4 reordering algorithms based on monotone systems: conformity scale (conf), plus technique (plus), minus technique (min) and mixed technique (mixed). 1 http://archive.ics.uci.edu/ml/

The smaller the stress value, the better is the reordering. In each row, the smallest values for both neighborhoods are shown in bold. In all cases the mixed technique achieves the best result and conformity scale has the worst result. In most cases, minus technique gives better result than plus technique (16 data sets out of 20 by both measures). This complies with observation by Liiv ([ 10 ], p.61): “the “minus” technique outperformed the “plus” technique on the average with all three measures”, although used data sets and evaluation measures are different. 4

Conclusion

In this paper, we have introduced three matrix reordering methods based on monotone systems and proposed a new one, called mixed technique. We have evaluated all four methods using stress measures. We have found that the novel mixed technique performs better than the other three algorithms. Future challenges include handling big data and dealing with incremental update. Acknowledgements. The authors would like to thank prof. Sadok Ben Yahia for helpful comments and suggestions.

1. Liiv , I. : Seriation and matrix reordering methods: An historical overview . Statistical analysis and data mining 3(2) , 70 - 91 ( 2010 )

2. Liiv , I. , Vohandu , L. : Seriation and Matrix Reordering Methods for Asymmetric One-Mode Two-Way Datasets . In: Imaizumi, T. , Nakayama , A. , Yokoyama , S. (eds.), Advanced Studies in Behaviormetrics and Data Science: Essays in Honor of Akinori Okada, Behaviormetrics: Quantitative Approaches to Human Behavior 5 , https://doi.org/10.1007/ 978 -981-15-2700-5_ 10 ( 2020 )

3. Mullat , J.E. : On the Maximum Principle for some Set Functions . Proceedings of the Tallinn Technical University , 313 , 37 - 44 ( 1971 ). http://www.datalaundering. com/download/modular.pdf, last accessed 2016/02

4. Mullat , J.E. : Extremal Subsystems of Monotonic Systems. I. Automation and Remote Control , 37 ( 5 ), 758 - 766 ( 1976 ). www.datalaundering.com/download/ extrem01.pdf, last accessed 2016/02

5. Mullat , J.E. : Extremal Subsystems of Monotonic Systems. II. Automation and Remote Control , 37 ( 8 ), 1286 - 1294 , ( 1977 ). www.datalaundering.com/downlaods/ extrem02.pdf, last accessed 2016/02

6. Vyhandu , L. : Rapid data analysis methods . Transactions of Tallinn Technical University, 464 , 21 - 39 ( 1979 ). (in Russian)

7. Vyhandu , L : Fast Methods in Exploratory Data Analysis . Transactions of Tallinn Technical University, 705 , 3 - 13 ( 1989 )

8. V˜ohandu, L., Kuusik , R. , Lind , G.: Monotone systems and their applications . Tallinn: Tallinn University of Technology, Department of Software Science ( 2018 ). (in Estonian) issuu .com/monotsys/docs/monotoonsed_s_steemid, last accessed 2019/07

9. Hahsler , M. , Hornik , K. , Buchta , C. : Getting things in order: An introduction to the R package seriation , https://rdrr.io/cran/seriation/, last accessed 2019 /06/27

10. Liiv , I. : Pattern Discovery Using Seriation and Matrix Reordering: A Unified View, Extensions and an Application to Inventory Management . Ph.D. Thesis , Tallinn University of Technology ( 2008 )