Context-dependent Lexical and Syntactic Disambiguation in Ontology Population ? Natalia Garanina and Elena Sidorova A.P. Ershov Institute of Informatics Systems, Lavrent’ev av., 6, Novosibirsk 630090, Russia {garanina,lsidorova}@iis.nsk.su Abstract. We suggest an approach to resolution of context-dependent lexical and syntactic ambiguity in a framework of ontology population from natural language texts. We show that a set of maximally deter- mined ontology instances can be represented as a Scott information sys- tem with an entailment relation as a collection of information connec- tions. Moreover, consistent primary lexical instances form FCA-concepts. These representations are used to justify correctness of lexical disambi- guation and to define syntactic ambiguity and its resolution. This infor- mation system generates a multi-agent system in which agents resolve the ambiguity of both types. 1 Introduction Ontological databases are currently widely used for storing information obtained from a great number of sources. To complete such ontologies, formalisms and methods that allow one to automate the process are developed. Features of automatic information retrieval cause ontology population ambiguities. In lingui- stics several kinds of ambiguities are considered: lexical, syntactic, semantic, and pragmatic [2]. In a process of ontology population from natural language texts we use our algorithms [5] in which the following ambiguity types appear: (1) several ontology instances or data attributes correspond to the same text fragment, (2) some value is incorrectly assigned to some attribute of some instance, (3) some value is incorrectly assigned to attributes of several instances, (4) some value is incorrectly assigned to several attributes of some instance, (5) several values are assigned to one-valued attribute of some instance. The first type corresponds to lexical ambiguity, and other types are syntactic ambiguity. An algorithm for lexical disambiguation was represented in [6]. In this work we suggest the modified algorithm for resolving lexical ambiguity, a new algorithm for syntactic disambiguation, and we justify the correctness of both of them. In [6] we demonstrated that the process of retrieval of information in a form of a set of ontology instances can be presented as a Scott information system [13]. ? The research has been supported by Russian Foundation for Basic Research (grant 15-07-04144) and Siberian Branch of Russian Academy of Science (Integration Grant n.15/10 “Mathematical and Methodological Aspects of Intellectual Information Sys- tems”). This process produces maximally determined instances for ontology population. In this paper we prove that consistent sets of instances and lexical objects which values assign attributes of these instances form FCA concepts [3]. This fact grantees that information states of ambiguous conflicting agents do not intersect. This implies correctness of lexical disambiguation. Besides, now we use a representation of ontologies which does not consider on- tology relations as special structures. Only classes are allowed in these ontologies, and relations are represented as special attributes of classes. Well-known ontolo- gy representation language OWL uses the notation of this kind. This representa- tion is a good solution for specification of polyadic relations. Our algorithms for ontology population are simpler with this representation because class and rela- tion instances are packed in the same item. Automatic techniques of disambiguation usually do not use an input data context in full. This can lead to incomplete and incorrect ambiguity resolution [1, 9, 8, 7]. Our approach tries to ease these drawbacks. For disambiguation we use a distributed approach. Every retrieved instance is related to agent. These agents detect and resolve ambiguities with help of a special master agent. This approach takes polynomial time for disambiguation. The rest of the paper is organized as follows. In Section 2, an approach to ontology population in the framework of information systems is discussed. Section 3 describes lexical and syntactic disambiguation in terms of the system defined in the previous section. The next Section 4, gives definitions for a multi- agent system of context-dependent ambiguity resolution. Section 5 informally describes agents of our systems, their action protocols, and the main conflict resolution algorithm. In the concluding Section 6, directions of future researches are discussed. 2 Scott Information Systems in Ontology Population Let we be given an ontology of a subject domain, the ontology population rules, semantic and syntactic model for a sublanguage of the subject domain and a data format, and input data as a finite natural language text with information for population of the ontology. We consider ontology O of a subject domain which includes (1) finite nonempty set CO of classes for concepts of the subject domain, (2) a finite set of attributes with names in DatO ∪ RelO , each of which has values in some data domain (data attributes in DatO ) or is some instance of the ontology (relation attributes in RelO , which model relations), and (3) finite set DO of data types. Every class c ∈ CO is defined by a tuple of typed attributes: c = (Datc , Relc ), where every data attribute α ∈ Datc ⊆ DatO has type dα ∈ DO with values in Vdα and every relation attribute ρ ∈ Relc ⊆ RelO is of class cρ ∈ CO . Let a set of all values of all attribute be VO = ∪dα ∈DO Vdα . Information content ICO of ontology O is a set of class instances, where every instance a ∈ ICO is of form (ca , Data , Rela ), where ca is a class of the instance, every data attribute in Data has name α ∈ Datca with value(s) in Vdα and every relation attribute in Rela has name ρ ∈ Relca with a value as an instance of class cρ . Ontology population problem is to compute an information content for a given ontology from given input data. Input data for the ontology population process are natural language texts. These data are finite and our algorithms of ontology-oriented text analysis can generate a finite set of ontology instances [5]. Finiteness of the set is guaranteed by prohibition for the rules from generating infinite information items by one position. We suggest to consider this process of forming ontology instances as work with Scott information systems. A Scott information system T is a triple (T, Con, `), where – T is a set of tokens and F in(T ) is a set of finite subsets; – Con is a consistency predicate such that Con ⊆ F in(T ), and 1. Y ∈ Con and X ⊆ Y ⇒ X ∈ Con, 2. a ∈ T ⇒ {a} ∈ Con; – ` is an entailment relation such that `⊆ Con \ {∅} × T and 3. X ` a ⇒ X ∪ {a} ∈ Con, 4. X ∈ Con and a ∈ X ⇒ X ` a, 5. ∀b ∈ Y : X ` b and Y ` c ⇒ X ` c. The information retrieval system based on an ontology, finite input data, and rules of the ontology population and the data processing is defined as a triple R = (A, Con, `). Set of tokens A consists of a set of all (underdetermined) ontology p-instances formed by the rules in the determination process of initial p-instances which are retrieved from an input text by the special preprocess. Every p-instances a ∈ A has form (ca , Data , Rela , Pa ), where – class ca ∈ CO , and – every data p-attribute αa ∈ Data is of form (α, IVα ), where – name α ∈ Datca , where – its information values v̄ ∈ IVα has form (vv̄ , gv̄ , sv̄ ) with – data value vv̄ ∈ dα , a set of all values of α is V alαa = {vv̄ | v̄ ∈ IVα }, – gv̄ is grammar information (morphological and syntactic features), and – sv̄ is structural information (position in input data); – every relation p-attribute ρa ∈ Rela is of form (ρ, Oρa ), where – a name ρ ∈ Relca , and – every ō ∈ Oρa has form (o, po ), where – o is an instance of class cρa , and – po ∈ Po is its position, – a set of all relation objects of a is O(Rela ) = {a} ∪ρa ∈Rela {o|ō ∈ Oρa }; – Pa is structural information (a set of positions in input data). We consider a special set of tokens: a set of lexical objects LO corresponding to values of data attributes retrieved from input data. Every lexical object is a p-instance which has only a single data attribute with a single information value. P-instances correspond to ontology instances in a natural way. Let a = (ca , Data , Rela , pa ) be p-instance, then its corresponding ontology instance is a0 = (ca , Data0 , Rela0 ), where every α ∈ Data0 has value(s) in V alαa and every ρ ∈ Rela0 has value o with (o, po ) ∈ Oρa . Further we omit prefix “p-” if there is no ambiguity. An information order relation ≺ is defined on ontology instances. Let a, a0 ∈ A: a ≺ a0 , if a = a0 everywhere except for at least one attribute, with the number of values of this attribute in a being strictly less than that in a0 . For x, x0 ∈ A: if x ≺ x0 , then x0 is information extension of x. Rules of ontology population and data processing Rules = {rule1 , . . . , rulen } map finite sets of instances of ontology classes to an instance which is an informa- tional extension of some instance of the domain set or a new instance. This sets must be linguistically and ontologically compatible: specified sets of their attributes and some instances have to satisfy conditions on values, grammatical and structural information [10]: rulei : Domi 7→ A, Domi ⊆ 2A , such that ∀X ∈ Domi : LingConsi (∪x0 ∈X Datx0 ∪ Relx0 ) = true ∧ ∀x0 ∈ X : cx0 ∈ Classi , where predicate LingConsi and set of classes Classi ⊆ CO detect linguistic and ontological compatibility of the instance set, correspondingly. Let for X ∈ Domi , x ∈ A: rulei (X) = x iff ((∃y ∈ X : y ≺ x∧cx = cy )∨(∀y ∈ X : y ⊀ x∧cx = geni (X)))∧ (Datx = ∅ ∨ Datx = ∪α (α, ∪{(fi (V¯α ), gi (V¯α ), si (V¯α )) | ∃Yα ⊆ X : datα ⊆ ∩y∈Yα Daty ∧ V¯α = ∪β∈datα {v̄|v̄ ∈ IVβ }})) ∧ (Relx = ∅ ∨ ∀o ∈ O(Relx ) : o ∈ X ∪y∈X O(Rely )), where geni (X) generates a new class for a new instance, fi (V̄ ) produces a value based on values in (V̄ ) for an attribute of instance x, and gi (V̄ ) and si (V̄ ) inherit grammatical and structural information from set of information values V̄ . Consistency predicate Con and entailment relation ` correspond to the rules of ontology population and data processing. Let x, x0 ∈ A and X ⊆ A. The entailment relation connects informationally associated tokens: – X ` x, iff x ∈ X, or x ∈ /X∧ ((∃X 0 ⊆ X, X 00 ⊆ A, rulei ∈ Rules : rulei (X 0 ∪ X 00 ) = x) ∨ (∃x0 ∈ A, X 00 ⊆ A, rulei ∈ Rules : X ` x0 ∧ rulei ({x0 } ∪ X 00 ) = x)), i.e. instance x is entailed from X, if it is in this set, or information from tokens of this set is used for evaluating attributes of x. The consistency predicate defines informationally consistent sets of tokens: – X ∈ Con, iff for some rulei ∈ Rules holds ∀X 0 ⊆ X(∪x0 ∈X 0 cx ⊆ Classi ) ⇒ (∃x ∈ A, X 00 ⊆ A : rulei (X 0 ∪ X 00 ) = x), i.e. if there exists some rule which can find in a set of tokens some instances satisfying its class compatibility then these instances should be consistent with some other set of tokens with respect to the rule. Class compatible, but linguistically incompatible sets cannot be processed by rules, hence we do not consider them consistent. Let us prove the following theorem for the system R: Theorem 1. Triple R = (A, Con, `) is a Scott information system. Proof. Let us show that the consistency predicate Con and the entailment relation ` satisfy properties 1–5 of information systems. 1. Y ∈ Con and X ⊆ Y ⇒ X ∈ Con. This fact follows from the definition of the consistency predicate directly because the condition of definition should hold for every subset of a consistent set. 2. a ∈ A ⇒ {a} ∈ Con. By the definition for every rulei ∈ Rules the class of a is not included in Classi or single {a} can be complemented by some set of tokens in such a way that the rulei produces a new token. 3. X ∈ Con and X ` a ⇒ X ∪{a} ∈ Con. X ∪{a} is consistent because a ∈ X or lexical information of a is inherited from X by definitions of the entailment relation and process rules. 4. X ∈ Con and a ∈ X ⇒ X ` a. By the def. of the entailment relation. 5. ∀b ∈ Y : X ` b and Y ` c ⇒ X ` c. Let Yb = Y \ {b}. X ` c iff (∃b ∈ Y, Yb ⊆ A, rule ∈ Rules : X ` b ∧ rule({b} ∪ Yb ) = c)) by the def. of the entailment relation. The proposition below directly follows from monotonicity of the entailment relation and finiteness of input data. Proposition 1. Information retrieval process of ontology population terminates. For token x ∈ A: x↑ = {x} ∪ {x0 | x ≺ x0 } and x↓ = {x} ∪ {x0 | x0 ≺ x} are upper and down cones of x. Let a set of maximally determined instances (maximal instances or tokens), which is the result of the analysis of input data, be A↑ = {x ∈ A | x↑ = {x}}. These instances may populate an ontology. Obviously, Proposition 2. Triple I = (A↑ , Con, `) is a Scott information system. An information descendants of token a ∈ A↑ are all maximal tokens (all information) that can be obtained from this token by the entailment relation: Ds(a) = {x ∈ A↑ |{a} ` x}. An information ancestors of token a ∈ A↑ are all maximal tokens from which a can be obtained: An(a) = {x ∈ A↑ |{x} ` a}. In our framework for lexical objects the following equality holds: An(a) = {a} because ontology instances are based on retrieved lexical objects. Information descendants are a particular case of Scott information states [14]. Like in the cited paper, we show that tokens from LO and their information descendants form a concept lattice. Proposition 3. Consistent sets of lexical objects form FCA concepts. Every consistent set of instances is a base for FCA concepts. Proof. Let every set x of information descendants of LO be an object, and every l ∈ LO be an attribute. Lexical object l is an attribute of x iff l ∈ x. The extension of a set of attributes L ⊆ LO is the set L0 = {x|L ⊆ x} and the intension of L0 is the set {l|∀x ∈ L0 , l ∈ x}. L is a concept iff the condition on the intension of the extension of L holds: L = {l|∀x ∈ L0 , l ∈ x} iff L is an information state of information system I iff L is a consistent set. The intension of a set of infostates X is the set X 0 = {l|∀x ∈ X, l ∈ x} and the extension of X 0 is the set {x|X 0 ⊆ x}. X is a concept iff the condition on the extension of the intension of X holds: X = {x|X 0 ⊆ x} iff a set of all instances in set of infostates X forms an infostate too: X i = {a | a ∈ x ∈ X}, hence X i is a consistent set. Hence every consistent set of instances is a base for FCA concepts. 3 Ambiguity and Resolution (1) Lexical ambiguity. Let l, l0 ∈ LO be in a conflict l ! l0 iff s(l) ∩ s(l0 ) 6= ∅. Let set AmbLO be a set of conflict lexical objects and Lex be a set of their descendants. We consider that rules in Rules cannot generate instances which include inconsistent information. I.e. for every rulei ∈ Rules holds ∀X ∈ Domi , a, a0 ∈ X, l, l0 ∈ An(a) ∩ An(a0 ) ∩ LO : ¬(l ! l0 ). Hence for lexical objects l and l0 in conflict: Ds(l) ∩ Ds(l0 ) = ∅. For the lexical disambiguation of two conflicting lexical objects we prefer a lexical object which is more incorporated in an input text than its competitor. For l, l0 ∈ LO if |Des(l)| > |Des(l0 )| we take l for evaluating attributes of ontology instances and ignore l0 . (2) Syntactic ambiguity. Detection of syntactic ambiguity frequently requires analysis of homogeneous groups. Syntactic ambiguity is defined for ontology instances, not for lexical ob- jects. Our types of syntactic ambiguity can depend on an ontology specification, hence here we consider syntactic-semantic ambiguity really. We omit “-semantic” for the brevity. Syntactic ambiguity usually can be expressed by corresponding single lexical object to several ontology items in various ways. For disambigua- tion it is necessary to find in an input text an evidence of correctness of the correspondence. This could be performed using the following inequalities. For every instance a and its data attribute α ∈ Data a set of information values equal to v ∈ dα is EQ(a, α, v) = {v̄ ∈ IVα | vv̄ = v}. For every instance a and its relation attribute ρ ∈ Rela a set of relation objects with an instance equal to e ∈ cρa is EQ(a, ρ, e) = {(o, po ) ∈ Oρ | o = e}. A power of these sets is an evidence power. A triple (a, α, v̄) denotes information value v̄ ∈ IVα of data attribute α ∈ Data of instance a. A couple (a, ρ) denotes a value of relation attribute ρ ∈ Rela of instance a. A set of information values which effects on (a, α, v̄) is V (a, α, v̄) = {(c, γ, w̄) | ∃rulei ∈ Rules, X ⊆ A↑ : rulei (X) = a ∧ c ∈ X ∧ γ ∈ datα ∩ Datc ∧ w̄ ∈ IVγ ∧ v̄ = (f (V¯α ), g(V¯α ), s(V¯α ))}. A set of instances which effects on (a, ρ) is I(a, ρ) = {e ∈ A↑ | ∃rule ∈ Rules, X ⊆ A↑ : rule(X) = a ∧ e ∈ X ∧ Oρ ∩ O(Rele ) 6= ∅}. Now we define a method of syntactic disambiguation. (1) Some value is incorrectly assigned to some attribute of an instance (Synt11). An example: “The old men and women sat on the bench.” The women may or may not be old. Hence, attribute “age” of instance “women” may not has value “old”. Let a set of instances with ambiguity of this type be denoted as Synt11. Let in instance a information value (c, γ, w̄) effect on (a, α, v̄): (c, γ, w̄) ∈ V (a, α, v̄). Then in a case of the ambiguity, (c, γ, w̄) is declared as effecting on (a, α, v̄) iff |EQ(a, α, vv̄ )| > 1. Let in instance a instance e effect on (a, ρ): e ∈ I(a, ρ). Then in a case of the ambiguity instance e is declared as effecting on (a, ρ) iff |EQ(a, ρ, e)| > 1. (2) Some value is incorrectly assigned to attributes of several instances (Synt12). An example: “Someone shot the maid of the actress who was on the balcony.” Either the actress or the maid was on the balcony. Hence, either attribute “place” of instance “actress” or attribute “place” of instance “maid” may has value “balcony”. Let a set of instances with ambiguity of this type be denoted as Synt12. Let in instances a and b information value (c, γ, w̄) effect on (a, α, v̄) and (b, β, ū): (c, γ, w̄) ∈ V (a, α, v̄) ∩ V (b, β, ū). Then in a case of the ambiguity (c, γ, w̄) is declared as effecting on (a, α, v̄) and not on (b, β, ū) iff |EQ(a, α, vv̄ )| > |EQ(b, β, ū)|. Let in instances a and b instance e effect on (a, ρ) and (b, o): e ∈ I(a, ρ)∩I(b, o). Then in a case of the ambiguity instance e is declared as effecting on (a, ρ) and not on (b, o) iff |EQ(a, ρ, e)| > |EQ(b, o, e)|. (3) A value is incorrectly assigned to several attributes of an instance (Synt112). An example: “Cuban jazz band.” A group of Cuban musicians performing jazz music or a group of musicians performing Cuban jazz. Hence, attribute “country” or attribute “style” of instance “band” may has value “Cuban”. Let a set of instances with ambiguity of this type be denoted as Synt112. Let in instance a information value (c, γ, w̄) effect on (a, α, v̄) and (a, β, ū): (c, γ, w̄) ∈ V (a, α, v̄) ∩ V (a, β, ū). Then in a case of the ambiguity (c, γ, w̄) is declared as effecting on (a, α, v̄) and not on (a, β, ū) iff |EQ(a, α, vv̄ )| > |EQ(a, β, vū )|. Let in instance a instance e effect on (a, ρ) and (a, o): e ∈ I(a, ρ) ∩ I(a, o). Then in a case of the ambiguity instance e is declared as effecting on (a, ρ) and not on (a, o) iff |EQ(a, ρ, e)| > |EQ(a, o, e)|. (4) Several values are assigned to one-valued attribute of an instance (Synt211). An example: “Shakespeare is an author of the piece.” A gender of Shakespeare may be either male or female. Hence, one-valued attribute “gender” of instance “person” may has value either “male” or “female”. Let a set of instances with ambiguity of this type be denoted as Synt211. Let in instance a information val- ues (b, β, ū) and (c, γ, w̄) effect on (a, α, v̄) and (a, α, v¯0 ), respectively: (b, β, ū) ∈ V (a, α, v̄) and (c, γ, w̄) ∈ V (a, α, v¯0 ). Then in a case of the ambiguity (b, β, ū) is declared as effecting on (a, α, v̄), and (c, γ, w̄) is declared as not effecting on α iff |EQ(a, α, vv̄ )| > |EQ(a, α, vv¯0 )|. Let in instance a instance e and e0 effect on (a, ρ): e, e0 ∈ I(a, ρ). Then in a case of the ambiguity instance e is declared as effecting on (a, ρ), and e0 is declared as not effecting on (a, ρ) iff |EQ(a, ρ, e)| > |EQ(a, ρ, e0 )|. In a case of equalities of evidence powers the conflict is not resolved. We consider systems in which all these ambiguities are independent, i.e. pairwise intersections of sets Lex, Synt11, Synt12, Synt112 and Synt211 are empty. Informal description of action protocols for instance agents presents resolution of independent lexical and syntactic ambiguities. These protocols work correctly if resolution of references and detection of syntactic ambiguities are correct. 4 Multi-agent Ambiguity Resolution Let a set of lexical objects which effect on some information value of data at- tribute α of instance a be L(a, α) = {l ∈ LO | ∃rulei ∈ Rules, X ⊆ A↑ : rulei (X) = a ∧ (∃x ∈ X : x ∈ Ds(l) ∧ (∃β ∈ datα ∩ Datx : l ∈ L(x, β)))}. For every x ∈ / A↑ , the corresponding maximally determined instance is x̃ such ↑ that x̃ ∈ A ∧ x ≺ x̃. Entailment relation ` generates information connections between maximally determined instances. Let X ` x and y ∈ X ∧ y ∈ / x↓ . Then – information connections between ỹ and x̃ are ω̃ – ỹ −→ x̃ iff (∃α ∈ Datx , β ∈ Daty : ω ∈ L(x, α) ∩ L(y, β)) ∨ (ω ∈ O(Relx ) ∩ O(Rely ) ω̃ u – of updating type ỹ −→ x̃ iff ∃x0 ∈ X : x0 ≺ x, ω̃ g – of generating type ỹ −→ x̃ iff @x0 ∈ X : x0 ≺ x. An information system of information retrieval R generates a multi-agent system with typed connections. Agents of the system resolve the ambiguities by computing and comparing the context cardinalities and evidence powers. Information system (A, Con, `) generates Multi-agent System of Ambiguity Re- solution (MASAR) as a tuple S = (A, C, I, T ), where – A = {ax | x ∈ A↑ } is a finite set of agents corresponded to maximally determined instances; ω̃ – C = {ω̃ | ∃x, y ∈ A : x̃ −→ ỹ} is a finite set of connections; A×A – mapping I : C −→ 2 is an interpretation function of ordered c connections between agents: I(c) = (ax , ay ) iff x̃ −→ ỹ; – mapping T : C ×A×A −→ {gen, upd} is types of connections: T (c, ax , ay ) = cg gen iff I(c) = (ax , ay ) → (x̃ −→ ỹ), and T (c, ax , ay ) = upd iff I(c) = (ax , ay ) → cu (x̃ −→ ỹ). Let (conflict) lexical agents correspond to (conflict) lexical objects. Not every instance from A↑ is used for ontology population. There is a set of utility instances U tl. They do not resolve ambiguities or populate an ontology. They just transfer information to its descendants. Hence A↑ = LO ∪ Ont ∪ U tl, where only instances from Ont may populate an ontology. For every agent a ∈ A we define the following sets of agents and connec- tions. We omit symmetric definitions of ancestors Anc∗ (for Des∗) and utility predecessors U tP ∗ (for U tS∗) for the brevity: – Ca = {c ∈ C|∃a0 ∈ A : (a, a0 ) ∈ IC (c) (a0 , a) ∈ IC (c)} is connections of a; W – Scgac = {a0 ∈ A | (a, a0 ) ∈ IC (c) ∧ T (c, a, a0 ) = gen} is a set of generated successors by c connection; – Scuca = {a0 ∈ A | (a, a0 ) ∈ IC (c) ∧ T (c, a, a0 ) = upd} is a set of updated successors by c connection; – Scca = Scgac ∪ Scuca is a set of all successors by c connection; – P rac = {a0 ∈ A | (a0 , a) ∈ IC (c)} is a set of predecessors by c connection; – U tSac = {a0 ∈ UStl | (a, a0 ) ∈ IC (c)} is a set of utility successors by c; – Desca = Scca ∪ a0 ∈Scca Desca0 is descendants by c connection; MASAR is a multiagent system of information dependencies. In these systems agents can use information from predecessors and can pass the (processed) in- formation to successors. Hence Desca ∩ Ancca = ∅, i.e. every connection has no cycle because of information transfer. A weight of an agent corresponds to the number and the quality (in a case of generation) of itsP non-utility ancestors and descendants. For every a ∈ A 0 – wtaP r (c) = 1 + a0 ∈P rac wtaP r (c) is the weight of connection ancestors, 0 – wtaSc (c) = 1+ a0 ∈Scgac wt(a0 )+ a0 ∈Scuca wtaSc (c) is the weight of connection P P descendants, 0 – wtaU t(P/S) (c) = 1 + a0 ∈U t(P/S)c wtaU t(P/S) (c) is the weight of connection P a utility ancestors/descendants, – wt(a) = 1 + c∈Ca (wtaP r (c) + wtaSc (c) − (wtaU tP (c) + wtaU tS (c))) is the weight P of information agents. P Weight of system S is wt(S) = a∈Ont wt(a). Problem of conflict resolution in MASAR is to get a conflict-free MASAR of the maximal weight. A multiagent algorithm below produces such system. 5 Conflict Resolution in MASAR In this paper we consider independent ambiguities only. In this case an order of their resolution is irrelevant. But it is naturally to resolve lexical ambiguity first, because this disambiguation effects on existence of ontology instances. Syntactic disambiguation refines distribution of information among instances. Action protocols for conflict resolution used by MASAR agents form a multi- agent system of conflict resolution MACR. The system MACR includes a set of MASAR agents and an agent-master. Note, that a fully distributed version of our algorithm could be developed but it should be very ineffective. The result of agents’ interactions by protocols described below is the conflict-free MASAR. All agents execute their protocols in parallel until the master detects termination. The system is dynamic because MASAR agents can be deleted from the system. The agents are connected by synchronous duplex channels. The master agent is connected with all agents, MASAR agents are connected with their successors and predecessors, and conflict lexical agents are connected too. Messages are transmitted via a reliable medium and stored in channels until being read. For correct lexical disambiguation it is necessary to find groups of lexical agents which effect on weights of each other in a case of removing. Let us denote these groups of relatives as Relatives. Agents of groups from Relatives have common descendants: ∀Rlt ∈ Relatives(∀a ∈ Rlt(∃b ∈ Rlt : Ds(a) ∩ Ds(b) 6= ∅∧∀c ∈ AmbLO \Rlt : Ds(a)∩Ds(c) = ∅)). Due to the mutual effect of relatives on their weights it is necessary to resolve conflicts between groups of relatives. Let GR1 ⊆ Rlt1 and GR2 ⊆ Rlt2 , where Rlt1 , Rlt2 ∈ Relatives. Relative groups GR1 and GR2 are in a conflict GR1 ! GR2 iff (∀a ∈ GR1 ∃b ∈ GR2 : a ! b) ∧ (∀b ∈ GR2 ∃a ∈ GR1 : b ! a). Let sets G1 = ∪ni=1 GR1i = ∪m i i=1 Rlt1 and n i m i i i G2 = ∪i=1 GR2 = ∪i=1 Rlt2 , where Rlt1 , Rlt2 ∈ Relatives for every i ∈ [1..m] be groups of friends. These groups of friends are in a conflict iff (∀i ∈ [1..n] : GR1i ! GR2i ). Note that due to proposition 3 set AmbLO can be disjoined to nonintersecting subsets of relatives. A Pconflict is resolved P for a benefit of the group with the greater weight, i.e. if a∈G1 wt(a) > b∈G2 wt(b), then agents of group G2 are removed from the system, and their descendants delete their inherited values of attributes or the descendant is removed itself if the lexical value from a lexical agent in G2 is generating for this descendant. Hence, for resolving all conflicts in the system it is necessary to perform the following steps: (1) to compute weights of agents, (2) to detect relative groups, (3) to compute independent conflict groups of friends, (4) to resolve lexical conflicts between the groups, (5) to make the corresponding change in the system, and (6) to resolve all kinds of syntactic ambiguity. An agent-master coordinates MASAR agents. It computes conflict groups and detects agents to be removed. All other activities are performed by MASAR agents asynchronously. Due to parallel execution all computations take polynomial time. (1) An interface protocol for system agents This protocol specifies agent’s reactions for incoming messages. These messages include information which actions should be performed by the agent: (1) Start: to start; (2) CompWeight: to compute its weight; (3) FindRlt: to find relatives; (4) Remove: to remove connections or itself; (5) ResSynt*: to resolve some syntactic ambiguity. Until an input message causes an agent to react the agent stays in a wait mode. Messages for an agent are stored in its input channel. (2) The main algorithm for conflict resolution Let us give an informal description of protocol Master. First, the agent-master computes set of lexical agents LO, then it finds set of conflict lexical agents AmbLO. After that it sends Start to all agents and launches parallel computing agents’ weights and finding relatives for conflict lexical agents. After all agents finish their job, the master computes conflict groups of relatives, then detects conflict groups of friends. By comparing weights of conflict groups of friends, it forms a list of agents to be removed. After finishing of this resolution of group conflicts, the master launches the corresponding system changes. After termination of the changing, it initiates all kinds of syntactic disambiguation for instance agents in parallel. Below we give informal descriptions of several protocols of the system agents. (3) Computing agents’ weight Following the definitions of the weights agent a computes in parallel weights of (utility) descendants and (utility) ancestors by every connection c ∈ Ca , launching the corresponding subprocesses for each c ∈ Ca . These non-utility subprocesses send the weights of their descendants (ancestors) increased by 1 to predecessors (successors) respectively. Utility subprocesses do not increase the weights. If connection c is of type gen then the corresponding descendants’ sub- processes send the weight of a to the predecessors. When these parallel computa- tions are finished, the agent computes its own weight. The protocol of weights computing belongs to the class of wave echo algorithms [12]. (4) Computing agents’ relatives Let agents from AbmLO be numerated. Computing relatives consists of two stages. Agents act asynchronously. (1) Pairwise search. Elder agent a using id of every younger agent b sends couple of ids (a.id, b.id) to its descendants via its successors. If some descendant of a finds both numbers among its connections then it returns to a the id of b. After receiving agent a adds b to set of its relatives. Termination of this computation can be detected by AB-algorithm from [4]. (2) Merging. Elder agent a sends a request to every younger agent b for a set of its relatives b.Rlt. If a.Rlt ∩ b.Rlt 6= ∅, then agent a merges both sets and agent b removes its set of relatives and stops its computation. After termination of the computation there are several agents with nonempty sets of relative groups. (5) Removing LO-agents from the system If agent a has to be removed from the system, then (1) all its predecessors remove all connections with it and delete a from sets of successors; (2) its descendants remove a) all connections with it, b) the corresponding predecessors c) the cor- responding attribute value; and d) if the removing connection is of generating type then the descendant has to be removed from the system. Resolution of syntactic ambiguities Synt11 and Synt12 consists of two steps. (6) Synt11 resolution. (1) Ambiguity detection. Every agent, using sets of its successors, checks if some attribute value effects on values of several instances. If yes and these instances form a homogeneous group, and satisfy a predefined grammar condition, then it sends a message with the type of the conflict and the conflict value to every agent in the group excluding the first agent in the group. (2) Agents in the group resolve the ambiguities following the resolving formulas for Synt11. For this they compute an evidence power of the ambiguous value. (7) Synt12 resolution. (1) Ambiguity detection. Every agent, using sets of its successors, checks if some attribute value effects on values of several instances. If yes and these instances do not form a homogeneous group, and satisfy a predefined grammar condition, then it sends a message with the type of the conflict, the conflict value, and ids of the competitors to every agent in the group. (2) Agents in the group send their evidence power to the competitors. Then they resolve the ambiguities following the resolving formulas for Synt12. (8) Synt112 resolution. If an agent finds attributes ω1 and ω2 with value c then it compares evidence powers EQ(a, ω1 , c) and EQ(a, ω2 , c). The attribute value is removed from values of an attribute with the less power. (9) Synt211 resolution. If an agent finds attribute ω with values c1 and c2 then it compares evidence powers EQ(a, ω, c1 ) and EQ(a, ω1 , c2 ). The attribute value with the less power is removed from values of the attribute. 6 Conclusion In this paper, we show that maximal instances of the ontology classes that take part in the process of population form, together with the rules of data processing and ontology population, a Scott information system. This result justifies reso- lution of context-dependent lexical ambiguity by calculating context cardinali- ties. The Scott information system is also a basis for our approach to syntactic context-dependent ambiguity resolution. This system generates a multi-agent system in which agents resolve the ambiguities by computing the cardinality of their contexts and evidence powers. The suggested algorithm of lexical ambi- guity resolution chooses the most powerful group of agents and removes their competitors. The choice is based on agents’ weights and their effect on the sys- tem. We considered independent lexical and syntactic ambiguities only. In the near future we plan to study disambiguation of combination of various types syntactic and lexical ambiguities. In this work it is useful to introduce a membership probability of attribute ambiguity values and a degree of their effect on other instances. We would like to try the developed technique for resolving references also. References 1. Alfawareh H.M., Jusoh S. Resolving Ambiguous Entity through Context Knowl- edge and Fuzzy Approach // International Journal on Computer Science and En- gineering (IJCSE). ISSN: 0975-3397, Vol. 3, No. 1, 2011. pp. 410–422 2. Berry, D.M., Kamsties, E., Krieger, M.M. From contract draft- ing to software specification: Linguistic sources of ambiguity (2003), http://se.uwaterloo.ca/d̃berry/handbook/ambiguityHandbook.pdf (31.01.2016) 3. Ganter B., Wille R. Formal Concept Analysis. Mathematical Foundations. Springer Verlag, 1996. 4. N. O. Garanina, E. V. Bodin. Distributed Termination Detection by Counting Agent // Proc. of the 23nd International Workshop on Concurrency, Specification and Programming (CS&P 2014), Chemnitz, Germany, 29. September - 01. Oktober 2014. Humboldt-Universitat zu Berlin, 2014, pp. 69–79. 5. Garanina N., Sidorova E., Bodin E. A Multi-agent Approach to Unstruc- tured Data Analysis Based on Domain-specific Onthology // Proc. of the 22nd In- ternational Workshop on Concurrency, Specification and Programming, Warsaw, Poland, Sept. 25-27, 2013. CEUR Workshop Proceedings, Vol. 1032, pp. 122–132 6. Garanina N., Sidorova E. An Approach to Ambiguity Resolution for Ontol- ogy Population // Proc. of the 24th International Workshop on CS&P. Rzeszow, Poland, Sep. 28-30, 2015. – University of Rzeszow, 2015, Vol. 1, pp. 134–145. 7. Gleich B., Creighton O., Kof L. Ambiguity Detection: Towards a Tool Ex- plaining Ambiguity Sources // Proc. of 16th International Working Conference Requirements Engineering: Foundation for Software Quality, Essen, Germany, June 30–July 2, 2010, LNCS Vol. 6182, pp. 218-232. 8. Kim D.S., Barker K., Porter B.W. Improving the Quality of Text Understand- ing by Delaying Ambiguity Resolution // Proc. of the 23rd International Conference on Computational Linguistics, Beijing, 2010. pp. 581–589 9. Navigli R. Word sense disambiguation: a survey. ACM Computing Surveys, 41(2):1–69, 2009. 10. Sidorova E., Kononenko I., Zagorulko Yu. Knowledge-based approach to doc- ument analysis // International Journal ”Information Technologies and Knowl- edge”. Vol.2, No. 1, 2008. pp. 17–22. 11. Spasic I., Zhao B., Jones C., Button K. KneeTex: an ontology-driven system for information extraction from MRI reports.// J. Biomedical Semantics, 6, 2015, p. 34. 12. Tel G. Introduction to Distributed Algorithms. Cambridge University Press, 2000. 13. Winskel G. The Formal Semantics of Programming Languages: An Introduction. MIT Press, 1993. 14. Zhang G.-Q. Chu Spaces, Concept Lattices, and Domains // Electronic Notes in Theoretical Computer Science (ENTCS). Vol. 83, Jan 2013, pp. 287–302