A Comparison of Multi-agent State-space Exploration Approaches for the Verification of Mixed and Partially Observable Domains Galina Novikova[0000-0002-4949-9135] and Esteban Azofeifa[0000-0003-0604-4234] RUDN University, M oscow, Russia Abstract. Validation problems concern practically any domain theory that de- scribes the types and properties of objects, classes of states and relations, and the laws of functioning and interaction between objects. Recognition and verifi- cation of the state space associated to a determined system is key to solving management problems. Such tasks in the context of state-based systems often require a reachability analysis that displays an exponential expansion of the search space in certain domain theories. Coordinated and decentralized multi- agent exploration can offer efficient solutions to the problem, but the task be- comes intractable within certain probabilistic domains. Concretely, the associ- ated 'curse of dimensionality' of partially observable domains, in practice, is a considerable limitation that allows to tackle the recognition task only in terms of suboptimal solutions. However, the signals associated to properties in an en- vironment can be read by different kinds of sensors, which in turn can signifi- cantly influence the state exploration time in the recognition task. In this paper, we study multi-agent systems for state-space exploration in environments with partial and mixed observability, and we outline certain conditions under which interchangeability between approaches can be obtained according to efficiency goals. We performed a cost comparison of two multi-agent approaches consist- ing of automated agents without inner states, with the difference that one of them uses exact, noise immune sensors and the other utilizes unbiased sensors with a certain degree of imprecision. By representing the interpretation of a domain as a path exploration task, we obtained an equation comprising the nec- essary conditions for an equivalence between the approaches. Keywords: Verification, uncertainty, domain, multi-agent, partially observable, reachability, search space. 1 Introduction Creating tools for recognition and verification of object states is a key task in the de- velopment of intelligent control systems for complex dyn amic systems. The solution for this problem lies in the field of research and development of multi-agent systems, but the problem becomes unsolvable in certain probabilistic areas. In particular, the so-called 'curse of dimensionality' of partially observed domains can become a signif- icant limitation for the search of optimal solutions to the task. We consider that the Proceedings of the XXII International Conference “Enterprise Engineering and Knowledge M anagement” April 25-26, 2019, M oscow, Russia solution to the recognition problem should be based on a formal approach, in relation to both the definition of the concept of 'state' and the formalization of the recognition process. The concept of 'state' is fundamental in the field of control theory, since control, in a broad sense, is the definition and application of actions in order to change or pre- serve the state of an object. Thus, one of the most important tasks arising in the con- trol process is the recognition and verification of the states associated to the objects in a domain or subject area for the correct application of control actions. State changes occur according to a specified control objective. The purpose of the control system can be regulation: to guarantee that the object will function within a given state or, more precisely, within a given class of states. If we consider a dynamic system –such as an enterprise– as our subject area, the problem of regulation is solved at the level of operational management. On the other hand, the task of developing an action that will change the state of the control object in order to achieve some target state is solved at the level of strategic management [1]. In the field of corporate management, the technology known as management by exception is widely used: the Deming management cycle [2], for example, includes a stage to control and verify deviations of real values from the planned indicators (Fig. 1). This approach can be used for the purpose of management in any complex dynam- ic system, which in turn can be attributed to the class of quasi-stationary objects [3]. Fig. 1. Tasks in the Deming Cycle We understand as a quasi-stationary object a dynamic system whose behavior is de- termined by the state transitions of its elements and the system as a whole followed by stationary periods Δt, during which the state of the system is constant. It can be as- sumed that the minimal Δt is determined by the time of the minimal transaction, which nowadays is increasingly being reduced with the development of novel tech- nologies in telecommunications and other engineering -related fields. 2 State Space A description of the behavior of the system should include the possible state transition sequences of the individual elements and the system as a whole; concretely, its cur- rent status and the planned, possible, probable or objective states, among others. The state space includes various classes of states that can be selected:  The classes of valid states (Ωv).  The classes of conflicting states that require the development of control actions (Ωc).  The classes of objective states (Ωo ). Object states can be determined by observing the compliance of the instances to some boundaries on the property values of the object, e.g. 35 to 41 degrees for the property 'temperature'. Additionally, states can be distinguished by the presence of a determined property value of an object –especially if it is not quantifiable (e.g. the color red)– or by a process of change (adaptation) of this parameter. In this work, we assume that the state of the system as a whole is determined by the states of its elements. Concretely, control objects in a complex dy namic system corre- spond not only to single objects of the subject area, but also to the connections and relationships between objects. An example is the case of enterprise architecture de- velopment, where links between objects can give rise to a new object: a set of related positions in an organization forms a department, an organizational unit of a higher order. In such cases, identification of a global system state requires the recognition of both states, connections and relationships between objects. A simple connection of some kind between objects can be 'strong', 'weak', 'planned', 'undesirable'. On the other hand, states associated to relations such as 'part - whole' and 'element-set' could also be used to represent an inheritance mechanism of object states, case in which we would be talking about relationships. In particular, we consider that relationships between objects form objects of higher order, which can extend its properties and states to its constituent elements (inheritance) but can also display new properties and states based on the principle of 'emergence' [4]. For exam- ple, a 'sales' relation, which includes objects such as 'product', 'customer', 'sales chan- nel', can present states like 'low', 'high', and 'unprofitable'. The states are ident ified by the values of the properties of the object: profitability, profit, volume. The approach in [5] proposes the use of predicate calculus to describe domain the- ories in a simplified form. In detail, a formal system M can be expressed in the fol- lowing way: M   D, F , A, I , where D is the theory alphabet (a finite set of basic characters); F is a set of (counta- ble) formulas (also called well formed formulas of wff), built from elements of D using some set of syntax rules; A corresponds to a set of formulas called axioms, and I is a finite set of inference rules. Axioms constructed using the alphabet of the theory establish the correspondence between:  Object type (T) and the set of properties (P).  Types of objects and the set of relations (R).  Object type and the set of possible object states (S). Additionally, the statements about properties [6] and relations between objects de- scribed by axioms are accepted as true without the need of a proof. Finally, the threshold or current values of object properties are defined with the help of a set of constants. The current state of a domain is identified by the set of atomic axioms of the for- mal system characterized by terms with a constant value, as well as bound variables fixed by the equality predicate, which in turn correspond to terms of a subset of the atomic axioms. Thus, the complete set of states is obtained from atomic axioms with constants as terms, as well as axioms describing cause-effect, structural and functional relations between the elements of the domain. In particular, axioms that reflect the structural relations of the domain elements, as well as axioms that describe inher- itance, can be used along with the mechanism of logical inference to expand the set of facts that identify the current state. 3 Formalization of a State For the formal definition of the classes of states we use the model-theoretic approach, dividing the whole state space into subsets of equivalent state classes, each of which is described by a closed (ground) logical formula. Such formula can include atomic statements with constant terms, as well as statements in which free variables are bound by a quantifier: in general, the expressive power of the theory alphabet deter- mines the degree of uniqueness of the statements about domain properties and state classes. For example, not only traditional quantifiers ( ∀,∃) can be used to describe a class of states, but also other ones such as 'there exists one and only one' ( ∃!) and 'there is no' (∄), as well as modal operators such as 'possible', 'necessary', 'allowed' and 'forbidden' in addition to operators reflecting temporal and spatial modalities [7], among others. Let us now present a relational system G = {Q, W}, where Q is the carrier or under- lying set of elements of the system, and W is the set of relations wj ⊆ Qmj (j ∈ J) be- tween the elements in Q, where a predicate Zj (q*) = True indicates that the element q* = (q 1 ,q 2 ,...,q mj ) ∈ wj with arity mj defined on the carrier set by a function ar: Υz → ℤ +, Υz = {Zj | j ∈ J}. Thus, if a complex dynamic system functioning under a formal system M happens to be in one of the states (sg ∈ S g ) of G, the behavior of the system can now be described by a set of relational systems, and in this case the values that bound the free variables will determine the properties of the system. We first distinguish the classes of equivalent relational systems [8] and the corre- sponding classes of equivalent states containing the elements of the domain, and then we associate to each class a closed logical formula defined in the signature of the system and taking the value 'true' or 'false'. Concretely, the formal system describing the current state of the domain corresponds to a model of one of the equivalence clas- ses, Γ, which in this case contains all relational systems of signature σ = (Υz , ar). The formal system, in turn, yields an interpretation under which the closed formula identi- fying the class takes the value 'true' [9], and the formula can then be considered as an assertion about the properties of the model. Accordingly, the entire set of formal sys- tems describing the behavior of a complex dynamic system is divided into classes of models, each of them associated to a closed logical formula describing the corre- sponding class of states (Ωv, Ωc or Ωo ). This approach can be applied in the engineering phase or in the everyday control and monitoring operations in a complex organization. The engineering of a compa- ny’s processes, for example, should include guarantees that the system will av oid or minimize its state transitions to conflicting states (Ωc) while guaranteeing that it will reach objective states (Ωo ) in a cost-effective way. Control and monitoring, or anti- crisis operations, are destined to take the system out of those undesirable states. On the other hand, strategic management deals with finding new ways to maximize the organization’s revenue in a competitive environment with other players that share the same goals. This is expected to be accomplished by finding the correct combination of actions that maximize the frequency of encountering organizational states inside the objective set (Ωo ). If these processes are considered as the same optimization task in different mani- festations, then the benefit of this approach can be notably perceived in the automa- tion of such processes. Concretely, a software tool can predict the transitions into conflictive or undesirable states (or objective states) by looking at all the possible ways that their logical formulas can be bound, taking into account the formalized relational systems, historical data and/or experts’ knowledge. Such predictions, which fall outside the scope of this work, can be obtained from a probabilistic model of state transitions based on historical data, for example, such that the corresponding anti- crisis (or strategic) actions can be taken even before an undesirable state is reached or within an optimal time threshold after its parameters have been identified. This auto- mation yields a reduction in financial, material, technical and informational costs. 4 Approaches to State-space Exploration Verification of a domain to determine the corresponding classes of states is preceded by the task of recognizing all the possible object instances in order to bound the free variables. Having outlined the verification task, in this work we focus on the recogni- tion task, and in particular on the analysis of efficient methods of environment obser- vation taking into account the 'curse of dimensionality' in state-space exploration under resource constraints. Concretely, we explore the conditions under which it is feasible or even recom- mended that very expensive sens ors with a perfect capacity of observation (noise immune [10] and perfect accuracy) be replaced in multi-agent recognition tasks with more affordable sensors with a limited capacity of detecting precise signals but with the ability to guarantee a minimal observation confidence. We begin by defining a set P of possible (perceivable) properties of objects in a domain, so that each possible combination of properties is uniquely associated to an object type in T. In the initial case of full observability, the agents conforming the recognition multi-agent system are equipped with |P| sensors, each of them capable of detecting the value of one of the properties with full accuracy and precision. By means of the threshold function th: P → V, we associate every property with a numerical range from the set V that contains all the possible values that the property can take. The set of values is built in the following way: V = {iv(a,b) ∀ a, b ∈ N, a ≤ b}, where N denotes the set of natural numbers. Every range is constructed by the function iv(a,b) = {x ∈ N | a ≤ x ≤ b}, a, b ∈ N, a ≤ b. The process of binding free variables to their corresponding values or quantifiers is achieved by the functions lT and l S , which correspond to the labelings of object types and object states respectively. Both lT(X): Pow(X) → 2 ψ and lS (X): th(p 1 ) × th(p 2 ) × ... × th(x|X|) → 2 ψ map to the set of propositions ψ in M, with Pow(X) = {X*|X* ⊂ X} referring to the powerset of the set X. Once physically or virtually within a location g of an environment grid, which con- tains an undetermined number of objects and connections and relationships between them, an agent can perform an observation represented by the vector obs(X, g) = (x 1 , x2 , ..., xi , ..., x|X|), xi ∈ Xi , Xi ∈ X. For simplicity, we assume that the closest approxima- tion to the real parameters of a domain presented as an infinite grid is obtained by observing the maximum possible number of locations independently of the explora- tion path, which implies homogeneity of the grid in terms of the hidden characteristics of the domain and the possibility to use parallel multi-agent exploration to keep the recognition time less than Δt. Once an observation has been made, a set of tuples summarizing the collected in- formation can be obtained by the function dv(x*): {}( X ∈ Pow ({x*})), wh ere X corresponds to the set of object properties, Y the associated observed values and the third term is the labeling of X. Before any concrete implementation in a subject area, our approach aims to enumerate all the possible combinations of object types and values by means of the powerset function. This leads to a function that keeps within the analysis the worst case scenario in terms of complexity. O(a*)  U ls (Y ) (1)  X ,Y ,l* dv ( a*) Equation 1 yields all the possible object instances present in a location g according to the sensor inputs. This set, however, corresponds only to the first input in the proces s of determining the totality of objects and relations in the location, considering that connections between object instances and relationships of higher o rder are to be dis- covered after the initial signal perception. Theoretically, an infinite number of connections can be established between two entities, making it impossible to conclude a verification task. In our model, we reduce the recognition time of a binary relation to a constant (t = 1), and make it possible to delegate the deterministic interpretation (labeling) task to either one or multiple intel- ligent agents according to the desired multi-agent exploration strategy. However, the interpreting agents need to explore not only the totality of the recognized elements in the domain, but also the totality of the possible recognition sequences (paths) between entities in order to reveal the complete set of connections and relationships. We project such interpretation task as a recursive process among different orders or levels, beginning with objects and relations and continuing with second and higher order relationships. Nonetheless, we assume that all objects and relations on a certain interpretation level must be recognized before entering a higher order level. Addition- ally, we presuppose that all entities on a certain interpretation level can be discovered without the need to extend the sequence into a higher order. Fig. 2. M ulti-level interpretation process on a grid location of a domain Under these conditions, the interpretation process turns into the enumeration of all possible paths from the elements yielded in the previous level to every potential desti- nation on that level (Fig. 2). As such, the first iteration takes as inputs the set of de- tected entities from the perceived signals, while the next iterations take as input the set of explored paths on the last level. n n(cc(n  1)  1 ;n  0 cc(n)   n Pk   (2) k 1  0 ; otherwise Representing input elements as vertices in a complete undirected graph [11], Equation 2 enumerates all the possible paths between the elements and all the potential destina- tions. The equation makes use of the recursive function cc( k)(n) = cc(cc( k-1)(n)) with cc(0)(n) = n, where the result of the actual level is fed as an input to the interpretation process (path enumeration) on the next level. This results in the total path quantity in  cc (| A |) . h h levels by means of the function Path( A, h)  k 1 ( x) 5 Partial and Mixed Observability Until this point we have studied the case when agents perform an observation obs(g) with perfect sensor accuracy, meaning that the agents are required to perform no more than one obs(g) in order to get the true value of the signals underlying the real state o f the system. The case of partial observability [12], on the other hand, ends with this assumption and introduces uncertainty (noise) on the signal measurements, which in turn can compromise the domain verification process. We compare the full observability case with agents whose sensor measurements are unbiased –maintain a maximal accuracy– but display measurement errors that are normally distributed. The assumption of normality on the noise of the signals with full accuracy in respect to the true average of the data allows the agent to repeatedly per- form a measurement in order to improve the sensor precision, represented by the in- verted variance τ = 1/σ2 . Another approach that does not involve repeated interaction with the environment corresponds to the extension of the single observation obs(g) into a range of hypoth e- ses of the possible real values according to the non-modified precision of the sensors. The agent then tests the hypotheses and determines whether they result in the same interpretations or in completely different ones, case in which the first approach (re- peated measurement) should be preferred. eps( g )  ( X1 , X 2 ,..., X |P| ), X i  {x  th( pi ), pi  P | xi    x  xi   } (3) where xi ∈ obs(g) and ξ ∈ N. Equation 3 takes the initial observation as the central value and modifies the tuple with the addition of a range of size ξ around obs(g). Such approach can be feasible for low interpretation orders, as the interpretation state -space grows exponentially bigger with the addition of new levels and can quickly become intractable even for a large multi-agent system working in parallel. Thus, the newly generated ranges yield the new observation space ss(X*) = {(x1 , x2 , ..., xi , ..., x|X*|)}∀ xi ∈ Xi, Xi ∈ X*, waiting to be tested by the agent. However, because of the 'curse of dimensionality' it could be impossible to test every hypothesis, and the only feasible alternative would be choosing a subset of the observation space to perform sampling over the sp(X*) = |X1 |*|X2 |*...*|X|X*|| different states to reach conclusions with a lower level of confidence. If there is no difference in the cost of assuming multi-agent verification with noise immune sensors versus the error prone approach, then the immediate option to guar- antee an error free recognition of a domain environment would be the first one. How- ever, such superior sensing capabilities could mean in real life an elevated and even prohibitive cost which would force a change of method. a  Path(O(obs( P, g )), h)    b   obss ( eps ( g )) Path(O(ob), h) (4) Equation 4 refers to the necessary conditions for an equivalence in resource expenses between the full observability exploration (left) and the partial observability one. Constants βa and βb denote the unitary cost of performing an interpretation activity like searching for the next node in a path. On the other hand, μ represents the addi- tional cost that the optimal sensors represent with respect to the lower-quality ones, and it can be factored as μ = (βb – βa ) * γ, where   obss (eps ( g )) Path(O(ob), h)    Path(O(obs( P, g )), h) . Concretely, the left part of Equation 4 (full observability) yields a single path quan- tity whereas the right part (partial observability) yields a field of possible paths (an uncertainty range) in the form of a summation of the function Path. This path space can be exponentially bigger than the full observability space set, depending on the size of the set P of properties and its associated set of values, as well as the number h of higher order levels to take into account. The balancing factor μ is specific to each set of optimal sensors, and consequently to the set of properties being analyzed. In particular, this function serves the purpose of yielding an optimal mixed observability set of sensors; however, this is the case only when the quality of at least one of the optimal sensors is greater than the quality obtained when performing multiple obser- vations with normal sensors at the same cost of the optimal sensor. A middle point between both approaches corresponds to a significant improvement over the computa- tionally heavy method and the costly optimal one: this is the case of mixed observa- bility [13], the replacement of only a subset of the sensors for high -precision ones, which promises a balance between resource expense and efficiency. Even though the partial observability of the domain states will not be eliminated after the implementa- tion of such approach, the search space for path exploration is guaranteed to get sub- stantially trimmed. Concretely, the search space gets reduced by a factor of 1 / |P| for each sensor introduced. 6 Conclusion Recognition and verification of the state space associated to a determined system is key to solving management problems. Of particular interest is the task of recognizing the current state of object instances in a domain and assigning it to one of the possible state classes, which in turn can be associated to a certain control or regulation solu- tion. However, the variety of objects, properties, connections an d relationships in com- plex structured dynamic domains provoke the problem known as 'curse of dimension- ality', which directly affects finding optimal methods for the verification task. We consider necessary to integrate different types of agents and to apply parallelization in order for the recognition time to be less than Δt, the lapse in which the system and its elements remain in the same state before making a transition to another state. In order to achieve this goal, however, different approaches to performing recogni- tion tasks can be implemented. In particular, the signals associated to properties in an environment can be read by means of different kinds of sensors, which in turn can significantly influence the state exploration time in the recognition task. Assuming the utilization of one kind of sensor per object property and one sensor per agent, we compare the possibility of using exact, noise immune sensors versus the alternative of utilizing unbiased sensors with a certain degree of imprecision that yields normally distributed errors. By representing the interpretation of a domain as a path exploration task, we obtained Equation 4 with a coefficient γ as the basis to compare the cost- effectiveness of both sensing methods, and finally arrived at a mixed approach capa- ble of significantly trimming the search space while remaining cost -effective. This approach is meant to be applied in the engineering phase or in control and monitoring operations in a complex organization in order to reduce financial, ma teri- al, technical and informational costs. This recognition method is assumed to be per- formed by automated agents without inner states, or whose inner states, defined by the agent's own properties, do not affect the states of the object or relation to be recog- nized: as such, agents' internal structure is left out of the model due to its irrelevanc e in relation to the recognition task. Additionally, computational solutions can be de- veloped in order to predict the system’s transitions into conflictive or undesirable states (or objective states) by looking at all the possible ways that their logical formu- las can be bound. These automation methods can be complemented with the addition of probabilistic forecasts based on historical data. References 1. Novikova, G.M ., Azofeifa, E.J.: Semantics of big data in corporate management systems. RUDN Journal of M athematics, Information Sciences and Physics 26(4), 382-391 (2018). 2. Singh, J., Singh, H.: Continuous improvement approach: state-of-art review and future im- plications. International Journal of Lean Six Sigma 3(2), 88-111 (2012). 3. Repin, V.V., Eliferov, V.G.: Process approach to management. Business process modeling. Standards and quality (2008). 4. Krasnogor, N., Smith, J.: Emergence of profitable search strategies based on a simple in- heritance mechanism. In: Proceedings of the 3rd Annual Conference on Genetic and Evo- lutionary Computation. pp. 432-439. GECCO'01, M organ Kaufmann Publishers Inc., San Francisco, CA, USA (2001). 5. Novikova, G.M ., Azofeifa, E.J.: Domain Theory Verification Using M ulti-agent Systems, pp. 120-125. Procedia Computer Science (2016). 6. Kaneiwa, K., M izoguchi, R., Nguyen, P.H.P.: A logical and ontological framework for compositional concepts of objects and properties. New Generation Computing 33(2), 149- 172 (2015). 7. Blackburn, P., Rijke, M .d., Venema, Y.: M odal logic. Cambridge University Press (2001). 8. Chajda, I., Langer, H.: Quotients and Homomorphisms of Relational Systems, Acta Univ. Palacki. Olomuc., Fac. rer. nat., M athematica 49, 37–47 (2010). 9. Novikova, G.M .: A conceptual approach to the design of intelligent control systems. In: Proceedings of the III International Symposium "Intellectual Systems" INTELS 98, pp. 49- 51 (1998). 10. Fomina, M ., Eremeev, A., Vagin, V.: Noise models in Inductive Concept Formation, pp. 413-419. Lecture Notes in Business Information Processing (2013). 11. Addario-Berry, L., Broutin, N., Goldschmidt, C., M iermont, G.: The scaling limit of the minimum spanning tree of the complete graph. The Annals of Probability, 45(5), 3075- 3144 (2017). 12. Cai, K., Zhang, R., Wonham, W.M .: Relative observability and coobservability of timed discrete-event systems. IEEE Transactions on Automatic Control 61(11), 3382-3395 (2016). 13. Ong, S.C.W., Wei Png, S., Hsu, D., Lee, W.S.: Planning under uncertainty for robotic tasks with mixed observability. The International Journal of Robotics Research 29(8), 1053-1068 (2010).