=Paper=
{{Paper
|id=Vol-154/paper-8
|storemode=property
|title=Finite State Automata - a concept for data representation
|pdfUrl=https://ceur-ws.org/Vol-154/mindek.pdf
|volume=Vol-154
}}
==Finite State Automata - a concept for data representation==
Finite State Automata – a concept for data representation
© Marian Mindek, Michal Burda
Department of Computer Science, FEI, VSB - Technical University of Ostrava,
17. listopadu 15, 708 33, Ostrava-Poruba, CZ
{marian.mindek, michal.burda}@vsb.cz
Abstract More about it can be found in chapter Compression).
Tail of the paper is intended to discuss the methods
In this paper, we introduce finite automata as a of storing automata to usual data tables, matrices or
tool for matrix specification and compression. XML documents.
We also describe, how to get additional
interesting information from such automata. At 2 How to represent matrix with automaton
last, we focus on techniques for storing
resultant automata as matrices, tables of an
SQL database, or as XML document. 2.1 Elementary theory
To understand the following, we allege here the
1 Introduction necessary background only. For more about automata
theory please read [1, 6].
Finite automata can be used as a tool for efficient
In the subsequent, we work with images instead of
matrix storage with the possibility of compression.
matrices. It is due to simplicity and better
Since matrices are general purpose data structures, this
understandability of this paper. To work with matrices,
approach could be used on images, text, sound files etc.
no additional effort is needed, as one may realize. Also,
Storing suitable data as automaton brings up also the
for more about using automata as a tool for specifying
benefit of obtaining the additional interesting
image, please read [3, 4, 5, 7, 8].
information about our data [1, 3, 4, 5, 7, 8].
A digitized image of the finite resolution m x n
Such technique is especially useful when dealing
consists of m x n pixels each of which takes a Boolean
with large binary matrices. A traditional approach
value (1 for black, 0 for white) for bi-level image, or
(compression using common algorithms) solves only a
real value (practically digitized to an integer value from
part of the problem – they save a lot of space, but there
0 to 256) for a gray-scale image, or 24bit information
is no way how to make any changes on the compressed
(RGB) for true-color image.
matrix.
In the subsequent, we consider only square images
In the next chapter, we allege only the necessary
of resolution 2n x 2n. In order to facilitate the
background of the automata theory. After that, we
application of finite automata to image description, we
describe simple algorithm for compressing matrices by
can assign unique word (path through the automaton) of
creating the finite state automata.
length n over the alphabet
As one can see in the following sections, we can use
the created automaton to get additional interesting Σ={0, 1, 2, 3}
information about the compressed data, namely the to each pixel of the 2n x 2n resolution image.
patterns of similar parts of source matrix. Such patterns Each Σ’s symbol in the word represents the address
may be used in special searching algorithms to find e.g. of a sub-square of the square addressed with the
similar parts of faces, medical pictures, buildings preceding symbols of the word. We choose ε as an
tracing, parts of large sparse matrices, similar noise, address of the whole unit square.
similar trends, pieces of text, etc. Single digits, as shown in the left of figure 1,
Storing matrices as automata is not the read-only address the quadrants. Thus, the four sub-squares of a
compression. It is also possible to modify the square with address w have address w0, w1, w2 or w3,
compressed matrix. However, this process is not very respectively. The middle of fig. 1 shows addresses of all
straightforward, but there exist many variants how to pixels of a 4 x 4 image. The sub-square (pixel) with
enable update of compressed matrices: we can do the address 3203 is shown on the right of figure 1.
slow re-compression of the overall matrix to get best
compression ratio, it is also possible to re-compress the
changed part only, which is faster but saves less space.
Proceedings of the Spring Young Researcher's
Colloquium on Database and Information Systems
SYRCoDIS, St.-Petersburg, Russia, 2005
Example. Clearly, L1 = {1,2}*0 are addresses of the
infinitely many squares illustrated at the left of Fig. 4. If
we place the completely black square defined by
L2 = Σ* into all these squares we get the image specified
by the concatenation L1L2={1,2}*0Σ* which is the
triangle shown in the middle of Fig. 4.
Figure 1. The addresses of the quadrants, of the sub-
square of resolution 4 x 4, and the sub-square specified
by the string 3203.
We denote Σm the set of all words over Σ of the
length m, by Σ* the set of all words over Σ. Figure 4. The squares specified by {1,2}*0, a triangle
In order to specify a black-white image of resolution defined by {1,2}*0Σ*, and the corresponding
2m x 2m, we need to specify a language L ⊆ Σm where automaton.
the word w belongs to L iff the coresponding sub-
square on the image is black. Example. By placing the triangle L= L1L2 from the
The automaton coresponding to the given image previous example into all the squares with addresses
should be created as to recognize the language L. That L3={1,2,3}*0 we get the image L3={1,2,3}*0{1,2}*0Σ*
is, it must end in accept state iff the sub-square of given shown at the left of Fig. 5.
address is black.
To be able to compress color images (multi-valued Zooming [5] is easily implemented for images
matrices), each ending state of the automaton must represented by regular expressions and is very
contain the color (value) of each pixel (cell) in the sub- important for matrix compression shown in next
square. section.
Now, we are ready to give some examples. We
assume that the reader is familiar with the elementary
facts about finite automata and regular sets – see e.g.
[1, 6].
Example. Consider the 2 x 2 chess-board in the left of
figure 2. Its automata could be described by a regular
expression {1,2}Σ*. Please note, the regular set also
describes the 2 x 2 chess board in arbitrary resolutions Figure 5. The diminishing triangles defined by
(concretely, 2n x 2n for any positive integer n). {1,2,3}*0{1,2}*0Σ*, and the corresponding automaton.
The 8 x 8 chess-board in the right of figure 2 can be
described by the regular expression Σ2{1,2}Σ* or by
automaton A in Fig. 3. 2.2 Construction of Finite Automaton
We have just shown that a necessary condition for black
and white multi-resolution image (is evident, that the
same reads for binary matrices) to be represented by a
regular expression is that it must have only a finite
number of different sub-images in all the sub-squares
with addresses from Σ*. We will show that this
Figure 2. 2 x 2 and 8 x 8 chess-boards condition is also sufficient. Therefore, matrices that can
Notice that here we used the fact that the regular be perfectly (i.e. with infinite precision) described by
expression Σ2{1,2}Σ* is the concatenation of two regular expressions (finite automata) are images of
regular expressions Σ2 and {1,2}Σ*. regular or fractal character (matrices with many same
part). Self-similarity is a typical property of fractals.
Any image can by approximated by a regular
expression (finite automaton) however; an
approximation with a smaller error might require a
larger automaton.
Now we will give a theoretical procedure which,
given a multi-resolution image or multi-frequency
Figure 3. Finite automaton A defining the 8 x 8 chess- sound, finds a finite automaton perfectly specifying it, if
board. such an automaton exists. (Multi-resolution principle
can be applied to mathematical binary matrices, but from word w (a, b, c ∈ {0,1,2,3} - represent sub-square
only occasionally. For text is not useful!) of matrices).
Procedure Construct Automaton
For given matrix M, we denote Mw the zoomed part of
M in the square addressed w. The (sub) matrix
represented by state numbered x is denoted by ux.
Procedure Construct Automaton
i=j=0
create state 0 and assign u0 = M
assume ui = M w
loop
for k ∈ {0,1,2,3} do
if M wk = uq for some state q then Figure 6. Analysis of possible cases in constructed
create an edge labelled k automaton.
from state i to state q
else
j=j+1
uj = M wk
create an edge labelled k
from state i to the new state j
end if Figure 7. Analysis of possible cases in constructed
end for automaton from figure 6 without loop.
if i == j than
Stop (all states have been processed) The cases (ii) and (iii) in figure 6 correspond with
else case (II) in figure 7. We can depress increasing count of
i=i+1 state at approach where is not loop.
end if
end loop Procedure for reconstruction matrix from automaton
end procedure is very simple, than we describe only procedure
Reconstruct matrix for compressed matrix at next
The procedure Construct Automaton terminates if section.
there exists an automaton that perfectly specifies the
given matrix and produces a deterministic automaton 3 Compression
with the minimal number of states. Our algorithm for
non-binary matrix (e.g. grey-scale image, sound, text)
is based on this procedure, but it will use evaluated 3.1 Basic compression
finite automata (as like WFA) introduced in the next In this part of chapter, we will demonstrate in brief a
section and only replacing binary information 0/1 to a method for matrix compression / decompression
real value (e.g. 256 colour, or greyness image), no applicable on construction of matrix storage, or matrix
creating loop and add some option for set-up database. Our algorithm is based on algorithm shown in
compression. previous section. There lead 4 edges from each node at
most and they are labelled with numbers representing
Example: For the Image diminishing triangles in Fig. matrix part. Every state can store information of
5, the procedure constructs the automaton shown at the average value of given sub-square.
right-hand side of Fig. 5. First, the initial state D is The procedure Construct Automaton for
created a processed. For 0 a new state T is created, for compression terminates if there exists an automaton that
1,2 and 3 a loop to itself. Then state T is processed for 0 perfectly (or with the defined error) specifies the given
a new state S is created, for 1 and 2 a loop to T. There is matrix and produces a deterministic automaton with the
no edge labelled 3 coming out of T since the quadrant 3 minimal number of states. The number of states can be
for T (triangle) is empty. Finally the state S (square) is a little reduced or extended by changing error threshold.
processed by creating loops back to S for all 4 inputs. This principle is naturally useful only for matrices
where it is possible to apply loss-compression.
In this example we can see, that state D does not Changing the part (or only one matrix element) in
contain edge labelled 0 and state T edge labelled 0 and source matrix changes the number of states in resultant
3. If either one of them is not present several cases may automata.
happen, as shown in figure 6. The same example, but In the following, we propose one of the simpler and
without loop is shown in figure 7. Where 0 ... X, X∈ faster algorithms for reconstructing the matrix from the
{1,2,3,4} denote some state of automaton (not automata. It is recursive algorithm, but it does not dive
necessarily adjoining), and a, b, c represent symbol very deeply.
Procedure Construct Automaton for Compression
For given matrix M, we denote Mw the zoomed part of Procedure Reconstruct Matrix
M in the square addressed w. The matrix represented by qo = w = {ε} = M (matrix represented by the empty
state numbered x is denoted by ux. word)
i(qo)=1
Procedure Construct Automaton for Compression t(qo)=∅(ε) (average greyness of the matrix M)
i=j=0
create state 0 and assign u0 = M (matrix represented recursively for state q do
by empty word and define average value of M) for u = u0, u1, u2, u3do
assume ui = M w u = Mua
loop if u = t(qo) and word has shorter then
for k ∈ {0,1,2,3} do requested then
if M wk = uq (or with small error) or if RC q = uX
the matrix Mwk can be expressed as a else
part or expanded part of the matrix uq RC q = q(uY)
for some state q then end if
create an edge labelled k end for
from state i to state q if u = {ε} or u = {requested word}
else stop
j=j+1 end if
uj = M wk end recursively
create an edge labelled k
from state i to the new state j where: X denotes part of image
end if Y is a next part of image
end for ∅(x) is average value of the matrix part Mux, we can
if i == j than change to computed value, if we wont that.
Stop (all states have been processed)
else Example: Original computed word is w = {3203}, see
i=i+1 figure 1. The length of word is 4 ⇒ the number of
end if elements of matrix is 256 (clearly x2 where x is y2 where
end loop y is z2 where z is 2). If we process all words we obtain
end procedure all matrix elements, if we process only first 2 symbols
from the word we obtain only matrix with 22 elements.
The procedure Reconstruct Matrix stops for every The average value of element (if value is from interval
automata computed by a procedure Construct 0-255) with address w = {32} is approximately 16, then
Automaton (for Compression), or other similar we obtain matrix, where sub-square with address w =
algorithm. The procedure computes original matrix only {32} have every elements equal 16. This principle is
if we process every word from input alphabet and useful only for images, sounds or el. signals. Text will
alphabet was computed by loss free procedure. be destroyed. The procedure Reconstruct Matrix can
Otherwise, we obtain approximately same matrix. The reconstruct all matrix elements (same dimension as
percent similarity is in relation with length of processed original) with defined error if we use word with first 2
word and original computed word. In figure eight is symbol and append empty word w={32εε}.
depicted other approach to storing and reconstructing
the image (real evaluated matrix).
3.2 Compression and changes
Procedure Reconstruct Matrix (RC)
For given automata A, make matrix Mw the zoomed part Now we describe, how to re-compress the matrix after
of M in the square addressed w. The matrix represented some updates.
by state number x is denoted by ux. If we compress the matrix with traditional algorithm
and some element is changed, we must re-compress all
matrices every time. But if we represent matrix as a
Finite State Automata, we can change / re-compress
only corresponding part with changed element.
There exist at least three basic solutions for selection
of corresponding part of Finite State Automata or
corresponding sub-square of source matrix:
1) Re-calculating the biggest corresponding sub-
square:
Figure 8. Reconstructing image from root to the leaf
node.
This approach leads to a big quantum of data We can easily get the group of equal matrix parts. The
manipulation (as much as one quarter), but with this algorithm for assembling resultant automata together is
approach we can reach the high compression ratio. very simple.
Method is useful for all types of source matrices.
2) Re-calculating the least corresponding sub-
square:
This approach re-calculates the least quantum of
data (radix ten elements), but with this approach we
gain very small compression ratio and often changes
lead to the grow in size of the automata. Compression is
after that disutility, but if we want only obtain the
interesting information from our resultant automata, this
method is useful too. This approach is useful for all
types of source matrices.
3) Re-calculating the optimal corresponding sub-
square:
Retrieval of such sub-square may be difficult, but in
most cases, it shows that it hasn't sense to work with
sub-square greater than three or four least
corresponding sub-squares. Naturally, it greatly depends
on the character of the matrices. If we know that our
matrices contain many equal blocks (e.g. wiring gate in
microprocessor), we can state the amount of levels,
which we should still take in consideration. This choice
naturally has not influence on algorithm, but only on
machine time and resultant size of compressed matrices.
4 Resultant automata
4.1 Coalescence
There exist many methods for assembling matrices
represented by finite automata. We will depict one of Figure 9. Two sample images with common parts and
better ones, namely assembling resultant automata in the corresponding automata.
direction from leaf node to the root. Suppose a couple
of matrices and their representation by means of final
state automata computed with procedure 1, depicted in Procedure Composition Automaton for Storage
figure 9. For simplicity, black and white pictures with For given automaton A and automaton B - resultant
resolution 8 × 8 pixel were used. Moreover, the states of automaton from previously composition compute new
automata contain also the average greyness of resultant automaton B´ ∈ A∪B and combine similar
corresponding picture parts. These values are in interval parts of both.
0 − 255, where 0 represents black and 255 white colour.
Procedure Composition Automaton (Automaton A,
Example: It is clear that images in figure 9 have some Stored automaton B)
common parts, highlighted in following figure 10. Assign state qx from A to the corresponding state in
These common parts could be joined into the same state stored automaton B.
in composed resultant automata (see figure 10 on the If such case does not exists, assign a new state and
bottom). The corresponding automata with images in take qx+1 from A.
figure 9 have some common state and edge labelled end if
with same symbol. There exists some similar or the foa all state of automaton A do
equal word w depicting the way from the root to the if not exist edge from state qi labelled with
corresponding part represented by a state. same word w as edge from correspond state in
stored automaton then
This principle can be used for no-loss compression. create a new edge labelled w to a new
Additional information can be obtained from the state i
structure of the resultant automaton, for example the otherwise
information about the similarity of the stored matrices, take next edge
common lines, and alternatively equal parts in matrix. end if
if all edge from actual state is processed, take automaton has more states then the new states represent
next state the differences between input matrices. With procedure
end if Construct Automaton for Compression, we can control
end for the type of acceptable (or unacceptable) changes, e.g.
end procedure trivial differences in value, small noise, etc. With this,
we can separate interesting changes from those that are
rather to be ignored.
For example, in medicine, we can obtain many
similar images and only medical specialist can mark an
interesting parts. We can help him or her to reduce the
number of non-interesting changes and of course notice
him or her to some small differences, which can be
important, but hard to find.
With procedure Composition Automaton for
Storage, we can increase the ability to make the
difference between interesting and non-interesting parts
of the resultant automaton. We can even do it by
involving the tolerance in value (average value), or
similarity of words, etc.
If we store our source matrix in more than one
automaton, we can focus on our interesting part of the
matrix and there compute the profundity automaton. On
other part of matrix, we can compute automaton with
less number of states. For this purpose, we can use the
pattern matrix shown in table 1, where the values in
cells are the counts of profundity of automaton, which
represents that part of matrix. This principle can be used
only for loss compression (e.g. images, signals, etc.).
The part with less count of states stores much fewer
information than the part with more states.
It is clear, with this principle we can save much
more space and also preserve high information value of
our data. We can transfer only interesting part of matrix
or any nearest part and save machine time or network
capacity. It is sufficient to choose a state from resultant
Figure 10. Composite automata from figure 9 automata, which represents our interesting part of the
respecting the common parts. matrix, and operate with this as with the root. This
principle is used with the principle automata
The procedure Composition Automaton for Storage composition. Pattern may be arbitrary.
stops for every automaton computed by a procedure Now we have a background for using finite state
Construct Automaton for Compression or other similar automata as a database with included information.
algorithm. Resultant automaton is in most cases smaller
than original automata and contains interesting
information from both automata. In some cases, the 3 3 3 3 3 3
resultant automaton could be larger, but if we will store 3 5 5 5 5 3
more and more matrices, the resultant automaton will 3 5 10 10 5 3
increase less and less. On the other side we can store 3 5 10 10 5 3
two (or more) almost same matrices in automaton with 3 5 5 5 5 3
almost non-increasing number of states. It is clear, 3 3 3 3 3 3
when we are storing matrices of the same domain (e.g. Table 1: Pattern of approach with 36 automata.
medical super-sound radiograph, X-ray, pictures of
building, el. signal, similar noise, etc.) we could obtain
an interesting knowledge about stored matrices hand in 4.3 Representation
hand with saving the memory space. In this chapter we introduce ideas how to represent
Finite State Automata as a matrix, as a table for SQL
4.2 Interesting information in matrix database and as a XML structure.
If we store some similar matrices in one automaton, 1) Matrix representation
we can obtain interesting information about changes in Suppose automata as depicted in figure 5. Such
resultant automaton. Concretely, if our resultant automata could be transformed into matrix with at least
four columns. The first to four columns contain Table representation is very similar to matrix
destination for each transition. We can append fifth representation. Instead of matrix, we use common table
column with the source state. If the matrix is encoded to store all information needed to describe the
then the average value is usually stored in the sixth automaton. We store simply the source state, average
column. To support additional features we may store value, etc. in the table. Procedure Store Automaton to
subsequent information but for basic functionality, it is Table is adequate to procedure Store Automaton to
enough. The matrix itself is built using following Matrix then we describe only resultant table at Table 3.
procedure: The first to fifth columns contain information about
the state and the sixth column contains the average
Procedure Store Automaton to Matrix value (in our example it contains the average colour of
Input is resultant automaton from procedure Construct sub-square in image depicting the diminishing triangles,
Automaton for Compression or other similar algorithm. where background of image is black colour denoted
with 0 and other colour is only white - 255.
Procedure Store Automaton to Matrix (automaton A)
for all qi ∈ Q do 0 1 2 3 source a. value
create new line in matrix and assign value i in T D D D D 96
source if appended S T T T 128
for all edge ej outgoing from qi do S S S S S 255
assign value i in column j * Table 3: Table representation of automaton in fig. 5
end for
end for This representation is best for storing in SQL
end procedure database particularly if we append additional interesting
information in additional column (e.g. average value,
where: Q is se of states of automaton A state depth, most similar state, etc.) witch emphasise
j starts from 0 but 3 at most included benefit. It is very fine for indexing and
* unused transitions leave empty facilitates manipulating very large resultant structure
source - number of processed state (composed FSA).
The matrix representation of the automata presently 3) XML representation
used is straightforward encoding of states and This representation is very useful for using Finite
transitions that could be supplied with additional State Automata as means to database with added
information. interesting information (e.g. multimedia database,
database of medical images, el. signals, trends, etc.) and
For automaton in figure 5: is best for sharing resultant FSA on internet/intranet. It
state D = 1 is very easy to manipulate with only a part of the
T=2 automaton, in this structure. There are many approaches
S = 3 and S is final state of how to solve the problem and we offer the on of the
simpler. For more about XML please read [9].
2 1 1 1
3 2 2 Example: The automaton representing a diminishing
3 3 3 3 triangles in figure 5 represented by simplified XML
Table 2: Matrix representation structure.
Simple resultant matrices without additional
information contain only integral numbers from interval
(0 - count of state). To reconstruct the source matrix
from such representation is very simple and fast. We
can use procedure Reconstruct Matrix above-mentioned
or other similar procedure. This representation of FSA
is useful for direct hardware-based processing or for
running on machines with simple operating system. It is
easy to convert it to any other representation and to
manipulate with it. Computation costs no many
machine time and space. Over all this advantages, the
matrix representation is not so good for database
manipulation (e.g. searching, updating, etc.). Example: The exemplary composed automaton
For such purposes, there are some other represented by simplified XML structure.
representations.
2) Table representation
matrices (e.g. multimedia database) with the ability to
search of interesting parts of our data.
In our future work, we focus on manipulating with
... data in database (at various structures, e.g. table or
XML), describe language or a set of tools that is able to
query stored data.
...
References
[1] Alur, R. and Dill, D. L. A Theory of Timed
... Automata. In Theoretical Computer Science,
126(2):183–235, 1994.
[2] Daniela BERARDI, Fabio DE ROSA, Luca DE
SANTIS and Massimo MECELLA. Finite State
Automata as Conceptual Model for E-Services. In
Integrated Design and Process Technology, IDPT-
... 2003, June 2003.
[3] K. Culik II and J. Kari. Image compression using
17:305–313, 1993.
[4] K. Culik II and V. Valenta. Finite automata based
... compression of bi-level and simple color images. In
... Computers & Craphics, 21:61–68, 1997.
... [5] K. Culik II and J. Kari. Image compression Using
Weighted Finite Automata, in Fractal Image
Compression. In Theory a Techniques, Ed. Yuval
Fisher, Springer Verlag, pp 243-258, 1994.
[6] J.E.Hopcroft and J.D.Ullman. Introduction to
... automata theory, languages and computation. In
Addison-Wesley, 1979.
[7] Marian Mindek. Finite State Automata and Images.
In WOFEX 2004, PhD Workshop, Ed. V. Snášel, ISBN:
Where: 80-248-0596-0, 2004
X is average value of part of matrix [8] Marian Mindek. Finite State Automata and Image
wC, C∈{0,1,2,3} is sub-square (pixel), if wC is not Recognition. In DATESO 2004, Ed. V. Snášel, J.
present, then sub-square with symbol C is represented Pokorný, K. Richta, pp 132-143, ISBN: 80-248-0457-3,
itself (with the state witch represent - recursively) 2004
S is value: 0 - not present, 1 - present [9] W3C (2004) XML Protocol. XML Protocol Web
Page (link checked January, 10th 2005):
In this structure we can store every automata http://www.w3.org/XML/
computed by procedure Construct Automaton for
Compression or Composition Automaton for Storage or
other similar algorithm. Final state can obtain other
information e.g. following automaton increasing deep.
5 Conclusions and future work
In this paper we have presented a Finite State Automata
for storing and compressing data represented as a
matrix. FSA allow us to capture a large class of data
represented as matrices and obtaining interesting
information without using other algorithm to compute
it. Additionally, data stored in FSA save more space,
make it possible to access and manipulate with them
without the decompressing process. Storing data in
automaton allows us to send via network only the part
of data of our interest. We do not need to have available
the whole matrix (e.g. picture) if we are interested in a
small part of it only. Additional benefit is included.