Context-dependent Lexical and Syntactic
        Disambiguation in Ontology Population ?

                       Natalia Garanina and Elena Sidorova

                     A.P. Ershov Institute of Informatics Systems,
                     Lavrent’ev av., 6, Novosibirsk 630090, Russia
                         {garanina,lsidorova}@iis.nsk.su


        Abstract. We suggest an approach to resolution of context-dependent
        lexical and syntactic ambiguity in a framework of ontology population
        from natural language texts. We show that a set of maximally deter-
        mined ontology instances can be represented as a Scott information sys-
        tem with an entailment relation as a collection of information connec-
        tions. Moreover, consistent primary lexical instances form FCA-concepts.
        These representations are used to justify correctness of lexical disambi-
        guation and to define syntactic ambiguity and its resolution. This infor-
        mation system generates a multi-agent system in which agents resolve
        the ambiguity of both types.


1     Introduction
Ontological databases are currently widely used for storing information obtained
from a great number of sources. To complete such ontologies, formalisms and
methods that allow one to automate the process are developed. Features of
automatic information retrieval cause ontology population ambiguities. In lingui-
stics several kinds of ambiguities are considered: lexical, syntactic, semantic, and
pragmatic [2]. In a process of ontology population from natural language texts we
use our algorithms [5] in which the following ambiguity types appear: (1) several
ontology instances or data attributes correspond to the same text fragment, (2)
some value is incorrectly assigned to some attribute of some instance, (3) some
value is incorrectly assigned to attributes of several instances, (4) some value is
incorrectly assigned to several attributes of some instance, (5) several values are
assigned to one-valued attribute of some instance. The first type corresponds
to lexical ambiguity, and other types are syntactic ambiguity. An algorithm
for lexical disambiguation was represented in [6]. In this work we suggest the
modified algorithm for resolving lexical ambiguity, a new algorithm for syntactic
disambiguation, and we justify the correctness of both of them.
    In [6] we demonstrated that the process of retrieval of information in a form
of a set of ontology instances can be presented as a Scott information system [13].
?
    The research has been supported by Russian Foundation for Basic Research (grant
    15-07-04144) and Siberian Branch of Russian Academy of Science (Integration Grant
    n.15/10 “Mathematical and Methodological Aspects of Intellectual Information Sys-
    tems”).
This process produces maximally determined instances for ontology population.
In this paper we prove that consistent sets of instances and lexical objects which
values assign attributes of these instances form FCA concepts [3]. This fact
grantees that information states of ambiguous conflicting agents do not intersect.
This implies correctness of lexical disambiguation.
     Besides, now we use a representation of ontologies which does not consider on-
tology relations as special structures. Only classes are allowed in these ontologies,
and relations are represented as special attributes of classes. Well-known ontolo-
gy representation language OWL uses the notation of this kind. This representa-
tion is a good solution for specification of polyadic relations. Our algorithms for
ontology population are simpler with this representation because class and rela-
tion instances are packed in the same item.
     Automatic techniques of disambiguation usually do not use an input data
context in full. This can lead to incomplete and incorrect ambiguity resolution
[1, 9, 8, 7]. Our approach tries to ease these drawbacks. For disambiguation we
use a distributed approach. Every retrieved instance is related to agent. These
agents detect and resolve ambiguities with help of a special master agent. This
approach takes polynomial time for disambiguation.
     The rest of the paper is organized as follows. In Section 2, an approach
to ontology population in the framework of information systems is discussed.
Section 3 describes lexical and syntactic disambiguation in terms of the system
defined in the previous section. The next Section 4, gives definitions for a multi-
agent system of context-dependent ambiguity resolution. Section 5 informally
describes agents of our systems, their action protocols, and the main conflict
resolution algorithm. In the concluding Section 6, directions of future researches
are discussed.


2   Scott Information Systems in Ontology Population

Let we be given an ontology of a subject domain, the ontology population rules,
semantic and syntactic model for a sublanguage of the subject domain and a
data format, and input data as a finite natural language text with information
for population of the ontology. We consider ontology O of a subject domain
which includes (1) finite nonempty set CO of classes for concepts of the subject
domain, (2) a finite set of attributes with names in DatO ∪ RelO , each of which
has values in some data domain (data attributes in DatO ) or is some instance
of the ontology (relation attributes in RelO , which model relations), and (3)
finite set DO of data types. Every class c ∈ CO is defined by a tuple of typed
attributes: c = (Datc , Relc ), where every data attribute α ∈ Datc ⊆ DatO has
type dα ∈ DO with values in Vdα and every relation attribute ρ ∈ Relc ⊆ RelO
is of class cρ ∈ CO . Let a set of all values of all attribute be VO = ∪dα ∈DO Vdα .
Information content ICO of ontology O is a set of class instances, where every
instance a ∈ ICO is of form (ca , Data , Rela ), where ca is a class of the instance,
every data attribute in Data has name α ∈ Datca with value(s) in Vdα and
every relation attribute in Rela has name ρ ∈ Relca with a value as an instance
of class cρ . Ontology population problem is to compute an information content for
a given ontology from given input data. Input data for the ontology population
process are natural language texts. These data are finite and our algorithms of
ontology-oriented text analysis can generate a finite set of ontology instances [5].
Finiteness of the set is guaranteed by prohibition for the rules from generating
infinite information items by one position. We suggest to consider this process
of forming ontology instances as work with Scott information systems. A Scott
information system T is a triple (T, Con, `), where
    – T is a set of tokens and F in(T ) is a set of finite subsets;
    – Con is a consistency predicate such that Con ⊆ F in(T ), and
         1. Y ∈ Con and X ⊆ Y ⇒ X ∈ Con,
         2. a ∈ T ⇒ {a} ∈ Con;
    – ` is an entailment relation such that `⊆ Con \ {∅} × T and
         3. X ` a ⇒ X ∪ {a} ∈ Con,
         4. X ∈ Con and a ∈ X ⇒ X ` a,
         5. ∀b ∈ Y : X ` b and Y ` c ⇒ X ` c.

     The information retrieval system based on an ontology, finite input data,
and rules of the ontology population and the data processing is defined as a
triple R = (A, Con, `). Set of tokens A consists of a set of all (underdetermined)
ontology p-instances formed by the rules in the determination process of initial
p-instances which are retrieved from an input text by the special preprocess.
Every p-instances a ∈ A has form (ca , Data , Rela , Pa ), where
    – class ca ∈ CO , and
    – every data p-attribute αa ∈ Data is of form (α, IVα ), where
       – name α ∈ Datca , where
       – its information values v̄ ∈ IVα has form (vv̄ , gv̄ , sv̄ ) with
          – data value vv̄ ∈ dα , a set of all values of α is V alαa = {vv̄ | v̄ ∈ IVα },
         – gv̄ is grammar information (morphological and syntactic features), and
          – sv̄ is structural information (position in input data);
    – every relation p-attribute ρa ∈ Rela is of form (ρ, Oρa ), where
       – a name ρ ∈ Relca , and
       – every ō ∈ Oρa has form (o, po ), where
          – o is an instance of class cρa , and
          – po ∈ Po is its position,
       – a set of all relation objects of a is O(Rela ) = {a} ∪ρa ∈Rela {o|ō ∈ Oρa };
    – Pa is structural information (a set of positions in input data).
We consider a special set of tokens: a set of lexical objects LO corresponding
to values of data attributes retrieved from input data. Every lexical object is
a p-instance which has only a single data attribute with a single information
value. P-instances correspond to ontology instances in a natural way. Let a =
(ca , Data , Rela , pa ) be p-instance, then its corresponding ontology instance is
a0 = (ca , Data0 , Rela0 ), where every α ∈ Data0 has value(s) in V alαa and every
ρ ∈ Rela0 has value o with (o, po ) ∈ Oρa . Further we omit prefix “p-” if there is
no ambiguity. An information order relation ≺ is defined on ontology instances.
Let a, a0 ∈ A: a ≺ a0 , if a = a0 everywhere except for at least one attribute, with
the number of values of this attribute in a being strictly less than that in a0 . For
x, x0 ∈ A: if x ≺ x0 , then x0 is information extension of x.
     Rules of ontology population and data processing Rules = {rule1 , . . . , rulen }
map finite sets of instances of ontology classes to an instance which is an informa-
tional extension of some instance of the domain set or a new instance. This
sets must be linguistically and ontologically compatible: specified sets of their
attributes and some instances have to satisfy conditions on values, grammatical
and structural information [10]:
rulei : Domi 7→ A, Domi ⊆ 2A , such that
∀X ∈ Domi : LingConsi (∪x0 ∈X Datx0 ∪ Relx0 ) = true ∧ ∀x0 ∈ X : cx0 ∈ Classi ,
where predicate LingConsi and set of classes Classi ⊆ CO detect linguistic
and ontological compatibility of the instance set, correspondingly. Let for X ∈
Domi , x ∈ A:
rulei (X) = x iff ((∃y ∈ X : y ≺ x∧cx = cy )∨(∀y ∈ X : y ⊀ x∧cx = geni (X)))∧
(Datx = ∅ ∨ Datx = ∪α (α, ∪{(fi (V¯α ), gi (V¯α ), si (V¯α )) |
          ∃Yα ⊆ X : datα ⊆ ∩y∈Yα Daty ∧ V¯α = ∪β∈datα {v̄|v̄ ∈ IVβ }})) ∧
                                 (Relx = ∅ ∨ ∀o ∈ O(Relx ) : o ∈ X ∪y∈X O(Rely )),
where geni (X) generates a new class for a new instance, fi (V̄ ) produces a value
based on values in (V̄ ) for an attribute of instance x, and gi (V̄ ) and si (V̄ ) inherit
grammatical and structural information from set of information values V̄ .
     Consistency predicate Con and entailment relation ` correspond to the rules
of ontology population and data processing. Let x, x0 ∈ A and X ⊆ A. The
entailment relation connects informationally associated tokens:
    – X ` x, iff x ∈ X, or x ∈ /X∧
                       ((∃X 0 ⊆ X, X 00 ⊆ A, rulei ∈ Rules : rulei (X 0 ∪ X 00 ) = x) ∨
              (∃x0 ∈ A, X 00 ⊆ A, rulei ∈ Rules : X ` x0 ∧ rulei ({x0 } ∪ X 00 ) = x)),
i.e. instance x is entailed from X, if it is in this set, or information from tokens
of this set is used for evaluating attributes of x.
     The consistency predicate defines informationally consistent sets of tokens:
    – X ∈ Con, iff for some rulei ∈ Rules holds ∀X 0 ⊆ X(∪x0 ∈X 0 cx ⊆ Classi ) ⇒
(∃x ∈ A, X 00 ⊆ A : rulei (X 0 ∪ X 00 ) = x), i.e. if there exists some rule which can
find in a set of tokens some instances satisfying its class compatibility then these
instances should be consistent with some other set of tokens with respect to the
rule. Class compatible, but linguistically incompatible sets cannot be processed
by rules, hence we do not consider them consistent.
     Let us prove the following theorem for the system R:
Theorem 1. Triple R = (A, Con, `) is a Scott information system.
Proof. Let us show that the consistency predicate Con and the entailment
relation ` satisfy properties 1–5 of information systems.
    1. Y ∈ Con and X ⊆ Y ⇒ X ∈ Con. This fact follows from the definition
of the consistency predicate directly because the condition of definition should
hold for every subset of a consistent set.
    2. a ∈ A ⇒ {a} ∈ Con. By the definition for every rulei ∈ Rules the class
of a is not included in Classi or single {a} can be complemented by some set of
tokens in such a way that the rulei produces a new token.
    3. X ∈ Con and X ` a ⇒ X ∪{a} ∈ Con. X ∪{a} is consistent because a ∈ X
or lexical information of a is inherited from X by definitions of the entailment
relation and process rules.
    4. X ∈ Con and a ∈ X ⇒ X ` a. By the def. of the entailment relation.
    5. ∀b ∈ Y : X ` b and Y ` c ⇒ X ` c. Let Yb = Y \ {b}. X ` c iff
(∃b ∈ Y, Yb ⊆ A, rule ∈ Rules : X ` b ∧ rule({b} ∪ Yb ) = c)) by the def. of the
entailment relation.
    The proposition below directly follows from monotonicity of the entailment
relation and finiteness of input data.
Proposition 1. Information retrieval process of ontology population terminates.
   For token x ∈ A: x↑ = {x} ∪ {x0 | x ≺ x0 } and x↓ = {x} ∪ {x0 | x0 ≺ x}
are upper and down cones of x. Let a set of maximally determined instances
(maximal instances or tokens), which is the result of the analysis of input data,
be A↑ = {x ∈ A | x↑ = {x}}. These instances may populate an ontology.
Obviously,
Proposition 2. Triple I = (A↑ , Con, `) is a Scott information system.
    An information descendants of token a ∈ A↑ are all maximal tokens (all
information) that can be obtained from this token by the entailment relation:
Ds(a) = {x ∈ A↑ |{a} ` x}. An information ancestors of token a ∈ A↑ are all
maximal tokens from which a can be obtained: An(a) = {x ∈ A↑ |{x} ` a}.
In our framework for lexical objects the following equality holds: An(a) = {a}
because ontology instances are based on retrieved lexical objects. Information
descendants are a particular case of Scott information states [14]. Like in the
cited paper, we show that tokens from LO and their information descendants
form a concept lattice.
Proposition 3. Consistent sets of lexical objects form FCA concepts. Every
consistent set of instances is a base for FCA concepts.
Proof. Let every set x of information descendants of LO be an object, and
every l ∈ LO be an attribute. Lexical object l is an attribute of x iff l ∈ x.
The extension of a set of attributes L ⊆ LO is the set L0 = {x|L ⊆ x} and
the intension of L0 is the set {l|∀x ∈ L0 , l ∈ x}. L is a concept iff the condition
on the intension of the extension of L holds: L = {l|∀x ∈ L0 , l ∈ x} iff L is an
information state of information system I iff L is a consistent set. The intension
of a set of infostates X is the set X 0 = {l|∀x ∈ X, l ∈ x} and the extension of X 0
is the set {x|X 0 ⊆ x}. X is a concept iff the condition on the extension of the
intension of X holds: X = {x|X 0 ⊆ x} iff a set of all instances in set of infostates
X forms an infostate too: X i = {a | a ∈ x ∈ X}, hence X i is a consistent set.
Hence every consistent set of instances is a base for FCA concepts.


3   Ambiguity and Resolution
(1) Lexical ambiguity.
Let l, l0 ∈ LO be in a conflict l ! l0 iff s(l) ∩ s(l0 ) 6= ∅. Let set AmbLO
be a set of conflict lexical objects and Lex be a set of their descendants. We
consider that rules in Rules cannot generate instances which include inconsistent
information. I.e. for every rulei ∈ Rules holds ∀X ∈ Domi , a, a0 ∈ X, l, l0 ∈
An(a) ∩ An(a0 ) ∩ LO : ¬(l ! l0 ). Hence for lexical objects l and l0 in conflict:
Ds(l) ∩ Ds(l0 ) = ∅.
    For the lexical disambiguation of two conflicting lexical objects we prefer a
lexical object which is more incorporated in an input text than its competitor.
For l, l0 ∈ LO if |Des(l)| > |Des(l0 )| we take l for evaluating attributes of
ontology instances and ignore l0 .
(2) Syntactic ambiguity.
Detection of syntactic ambiguity frequently requires analysis of homogeneous
groups. Syntactic ambiguity is defined for ontology instances, not for lexical ob-
jects. Our types of syntactic ambiguity can depend on an ontology specification,
hence here we consider syntactic-semantic ambiguity really. We omit “-semantic”
for the brevity. Syntactic ambiguity usually can be expressed by corresponding
single lexical object to several ontology items in various ways. For disambigua-
tion it is necessary to find in an input text an evidence of correctness of the
correspondence. This could be performed using the following inequalities.
    For every instance a and its data attribute α ∈ Data a set of information
values equal to v ∈ dα is EQ(a, α, v) = {v̄ ∈ IVα | vv̄ = v}. For every instance
a and its relation attribute ρ ∈ Rela a set of relation objects with an instance
equal to e ∈ cρa is EQ(a, ρ, e) = {(o, po ) ∈ Oρ | o = e}. A power of these
sets is an evidence power. A triple (a, α, v̄) denotes information value v̄ ∈ IVα
of data attribute α ∈ Data of instance a. A couple (a, ρ) denotes a value of
relation attribute ρ ∈ Rela of instance a. A set of information values which effects
on (a, α, v̄) is V (a, α, v̄) = {(c, γ, w̄) | ∃rulei ∈ Rules, X ⊆ A↑ : rulei (X) =
a ∧ c ∈ X ∧ γ ∈ datα ∩ Datc ∧ w̄ ∈ IVγ ∧ v̄ = (f (V¯α ), g(V¯α ), s(V¯α ))}. A set of
instances which effects on (a, ρ) is I(a, ρ) = {e ∈ A↑ | ∃rule ∈ Rules, X ⊆ A↑ :
rule(X) = a ∧ e ∈ X ∧ Oρ ∩ O(Rele ) 6= ∅}. Now we define a method of syntactic
disambiguation.
(1) Some value is incorrectly assigned to some attribute of an instance (Synt11).
An example: “The old men and women sat on the bench.” The women may
or may not be old. Hence, attribute “age” of instance “women” may not has
value “old”. Let a set of instances with ambiguity of this type be denoted as
Synt11. Let in instance a information value (c, γ, w̄) effect on (a, α, v̄): (c, γ, w̄) ∈
V (a, α, v̄). Then in a case of the ambiguity, (c, γ, w̄) is declared as effecting
on (a, α, v̄) iff |EQ(a, α, vv̄ )| > 1. Let in instance a instance e effect on (a, ρ):
e ∈ I(a, ρ). Then in a case of the ambiguity instance e is declared as effecting
on (a, ρ) iff |EQ(a, ρ, e)| > 1.
(2) Some value is incorrectly assigned to attributes of several instances (Synt12).
An example: “Someone shot the maid of the actress who was on the balcony.”
Either the actress or the maid was on the balcony. Hence, either attribute “place”
of instance “actress” or attribute “place” of instance “maid” may has value
“balcony”. Let a set of instances with ambiguity of this type be denoted as
Synt12. Let in instances a and b information value (c, γ, w̄) effect on (a, α, v̄)
and (b, β, ū): (c, γ, w̄) ∈ V (a, α, v̄) ∩ V (b, β, ū). Then in a case of the ambiguity
(c, γ, w̄) is declared as effecting on (a, α, v̄) and not on (b, β, ū) iff |EQ(a, α, vv̄ )| >
|EQ(b, β, ū)|. Let in instances a and b instance e effect on (a, ρ) and (b, o): e ∈
I(a, ρ)∩I(b, o). Then in a case of the ambiguity instance e is declared as effecting
on (a, ρ) and not on (b, o) iff |EQ(a, ρ, e)| > |EQ(b, o, e)|.
(3) A value is incorrectly assigned to several attributes of an instance (Synt112).
An example: “Cuban jazz band.” A group of Cuban musicians performing jazz
music or a group of musicians performing Cuban jazz. Hence, attribute “country”
or attribute “style” of instance “band” may has value “Cuban”. Let a set of
instances with ambiguity of this type be denoted as Synt112. Let in instance a
information value (c, γ, w̄) effect on (a, α, v̄) and (a, β, ū): (c, γ, w̄) ∈ V (a, α, v̄) ∩
V (a, β, ū). Then in a case of the ambiguity (c, γ, w̄) is declared as effecting on
(a, α, v̄) and not on (a, β, ū) iff |EQ(a, α, vv̄ )| > |EQ(a, β, vū )|. Let in instance
a instance e effect on (a, ρ) and (a, o): e ∈ I(a, ρ) ∩ I(a, o). Then in a case of
the ambiguity instance e is declared as effecting on (a, ρ) and not on (a, o) iff
|EQ(a, ρ, e)| > |EQ(a, o, e)|.
(4) Several values are assigned to one-valued attribute of an instance (Synt211).
An example: “Shakespeare is an author of the piece.” A gender of Shakespeare
may be either male or female. Hence, one-valued attribute “gender” of instance
“person” may has value either “male” or “female”. Let a set of instances with
ambiguity of this type be denoted as Synt211. Let in instance a information val-
ues (b, β, ū) and (c, γ, w̄) effect on (a, α, v̄) and (a, α, v¯0 ), respectively: (b, β, ū) ∈
V (a, α, v̄) and (c, γ, w̄) ∈ V (a, α, v¯0 ). Then in a case of the ambiguity (b, β, ū)
is declared as effecting on (a, α, v̄), and (c, γ, w̄) is declared as not effecting
on α iff |EQ(a, α, vv̄ )| > |EQ(a, α, vv¯0 )|. Let in instance a instance e and e0
effect on (a, ρ): e, e0 ∈ I(a, ρ). Then in a case of the ambiguity instance e is
declared as effecting on (a, ρ), and e0 is declared as not effecting on (a, ρ) iff
|EQ(a, ρ, e)| > |EQ(a, ρ, e0 )|.
    In a case of equalities of evidence powers the conflict is not resolved. We
consider systems in which all these ambiguities are independent, i.e. pairwise
intersections of sets Lex, Synt11, Synt12, Synt112 and Synt211 are empty.
Informal description of action protocols for instance agents presents resolution
of independent lexical and syntactic ambiguities. These protocols work correctly
if resolution of references and detection of syntactic ambiguities are correct.


4    Multi-agent Ambiguity Resolution

Let a set of lexical objects which effect on some information value of data at-
tribute α of instance a be L(a, α) = {l ∈ LO | ∃rulei ∈ Rules, X ⊆ A↑ :
rulei (X) = a ∧ (∃x ∈ X : x ∈ Ds(l) ∧ (∃β ∈ datα ∩ Datx : l ∈ L(x, β)))}.
For every x ∈   / A↑ , the corresponding maximally determined instance is x̃ such
             ↑
that x̃ ∈ A ∧ x ≺ x̃. Entailment relation ` generates information connections
between maximally determined instances. Let X ` x and y ∈ X ∧ y ∈    / x↓ . Then
   – information connections between ỹ and x̃ are
             ω̃
       – ỹ −→ x̃ iff (∃α ∈ Datx , β ∈ Daty : ω ∈ L(x, α) ∩ L(y, β)) ∨ (ω ∈
O(Relx ) ∩ O(Rely )
                              ω̃ u
      – of updating type ỹ −→ x̃ iff ∃x0 ∈ X : x0 ≺ x,
                               ω̃ g
      – of generating type ỹ −→ x̃ iff @x0 ∈ X : x0 ≺ x.
    An information system of information retrieval R generates a multi-agent
system with typed connections. Agents of the system resolve the ambiguities
by computing and comparing the context cardinalities and evidence powers.
Information system (A, Con, `) generates Multi-agent System of Ambiguity Re-
solution (MASAR) as a tuple S = (A, C, I, T ), where
   – A = {ax | x ∈ A↑ } is a finite set of agents corresponded to maximally
determined instances;
                                ω̃
   – C = {ω̃ | ∃x, y ∈ A : x̃ −→ ỹ} is a finite set of connections;
                           A×A
   – mapping I : C −→ 2            is an interpretation function of ordered
                                                            c
     connections between agents: I(c) = (ax , ay ) iff x̃ −→ ỹ;
   – mapping T : C ×A×A −→ {gen, upd} is types of connections: T (c, ax , ay ) =
                                     cg
gen iff I(c) = (ax , ay ) → (x̃ −→ ỹ), and T (c, ax , ay ) = upd iff I(c) = (ax , ay ) →
    cu
(x̃ −→ ỹ). Let (conflict) lexical agents correspond to (conflict) lexical objects.
     Not every instance from A↑ is used for ontology population. There is a set of
utility instances U tl. They do not resolve ambiguities or populate an ontology.
They just transfer information to its descendants. Hence A↑ = LO ∪ Ont ∪ U tl,
where only instances from Ont may populate an ontology.
     For every agent a ∈ A we define the following sets of agents and connec-
tions. We omit symmetric definitions of ancestors Anc∗ (for Des∗) and utility
predecessors U tP ∗ (for U tS∗) for the brevity:
    – Ca = {c ∈ C|∃a0 ∈ A : (a, a0 ) ∈ IC (c) (a0 , a) ∈ IC (c)} is connections of a;
                                                  W
    – Scgac = {a0 ∈ A | (a, a0 ) ∈ IC (c) ∧ T (c, a, a0 ) = gen} is a set of generated
successors by c connection;
    – Scuca = {a0 ∈ A | (a, a0 ) ∈ IC (c) ∧ T (c, a, a0 ) = upd} is a set of updated
successors by c connection;
    – Scca = Scgac ∪ Scuca is a set of all successors by c connection;
    – P rac = {a0 ∈ A | (a0 , a) ∈ IC (c)} is a set of predecessors by c connection;
    – U tSac = {a0 ∈ UStl | (a, a0 ) ∈ IC (c)} is a set of utility successors by c;
    – Desca = Scca ∪ a0 ∈Scca Desca0 is descendants by c connection;
MASAR is a multiagent system of information dependencies. In these systems
agents can use information from predecessors and can pass the (processed) in-
formation to successors. Hence Desca ∩ Ancca = ∅, i.e. every connection has no
cycle because of information transfer.
     A weight of an agent corresponds to the number and the quality (in a case
of generation) of itsP non-utility ancestors and descendants. For every a ∈ A
                                      0
    – wtaP r (c) = 1 + a0 ∈P rac wtaP r (c) is the weight of connection ancestors,
                                                         0
    – wtaSc (c) = 1+ a0 ∈Scgac wt(a0 )+ a0 ∈Scuca wtaSc (c) is the weight of connection
                      P                   P
descendants,
                                                  0
    – wtaU t(P/S) (c) = 1 + a0 ∈U t(P/S)c wtaU t(P/S) (c) is the weight of connection
                              P
                                             a
utility ancestors/descendants,
    – wt(a) = 1 + c∈Ca (wtaP r (c) + wtaSc (c) − (wtaU tP (c) + wtaU tS (c))) is the weight
                    P
of information agents.           P
Weight of system S is wt(S) = a∈Ont wt(a).
    Problem of conflict resolution in MASAR is to get a conflict-free MASAR of
the maximal weight. A multiagent algorithm below produces such system.


5    Conflict Resolution in MASAR

In this paper we consider independent ambiguities only. In this case an order of
their resolution is irrelevant. But it is naturally to resolve lexical ambiguity first,
because this disambiguation effects on existence of ontology instances. Syntactic
disambiguation refines distribution of information among instances.
    Action protocols for conflict resolution used by MASAR agents form a multi-
agent system of conflict resolution MACR. The system MACR includes a set of
MASAR agents and an agent-master. Note, that a fully distributed version of
our algorithm could be developed but it should be very ineffective. The result of
agents’ interactions by protocols described below is the conflict-free MASAR. All
agents execute their protocols in parallel until the master detects termination.
The system is dynamic because MASAR agents can be deleted from the system.
The agents are connected by synchronous duplex channels. The master agent is
connected with all agents, MASAR agents are connected with their successors
and predecessors, and conflict lexical agents are connected too. Messages are
transmitted via a reliable medium and stored in channels until being read.
    For correct lexical disambiguation it is necessary to find groups of lexical
agents which effect on weights of each other in a case of removing. Let us denote
these groups of relatives as Relatives. Agents of groups from Relatives have
common descendants: ∀Rlt ∈ Relatives(∀a ∈ Rlt(∃b ∈ Rlt : Ds(a) ∩ Ds(b) 6=
∅∧∀c ∈ AmbLO \Rlt : Ds(a)∩Ds(c) = ∅)). Due to the mutual effect of relatives
on their weights it is necessary to resolve conflicts between groups of relatives.
Let GR1 ⊆ Rlt1 and GR2 ⊆ Rlt2 , where Rlt1 , Rlt2 ∈ Relatives. Relative groups
GR1 and GR2 are in a conflict GR1 ! GR2 iff (∀a ∈ GR1 ∃b ∈ GR2 : a !
b) ∧ (∀b ∈ GR2 ∃a ∈ GR1 : b ! a). Let sets G1 = ∪ni=1 GR1i = ∪m                 i
                                                                         i=1 Rlt1 and
         n      i      m     i              i    i
G2 = ∪i=1 GR2 = ∪i=1 Rlt2 , where Rlt1 , Rlt2 ∈ Relatives for every i ∈ [1..m]
be groups of friends. These groups of friends are in a conflict iff (∀i ∈ [1..n] :
GR1i ! GR2i ). Note that due to proposition 3 set AmbLO can be disjoined to
nonintersecting subsets of relatives. A   Pconflict is resolved
                                                           P      for a benefit of the
group with the greater weight, i.e. if a∈G1 wt(a) > b∈G2 wt(b), then agents
of group G2 are removed from the system, and their descendants delete their
inherited values of attributes or the descendant is removed itself if the lexical
value from a lexical agent in G2 is generating for this descendant.
    Hence, for resolving all conflicts in the system it is necessary to perform
the following steps: (1) to compute weights of agents, (2) to detect relative
groups, (3) to compute independent conflict groups of friends, (4) to resolve
lexical conflicts between the groups, (5) to make the corresponding change in
the system, and (6) to resolve all kinds of syntactic ambiguity. An agent-master
coordinates MASAR agents. It computes conflict groups and detects agents to be
removed. All other activities are performed by MASAR agents asynchronously.
Due to parallel execution all computations take polynomial time.
(1) An interface protocol for system agents
This protocol specifies agent’s reactions for incoming messages. These messages
include information which actions should be performed by the agent: (1) Start:
to start; (2) CompWeight: to compute its weight; (3) FindRlt: to find relatives; (4)
Remove: to remove connections or itself; (5) ResSynt*: to resolve some syntactic
ambiguity. Until an input message causes an agent to react the agent stays in a
wait mode. Messages for an agent are stored in its input channel.
(2) The main algorithm for conflict resolution
Let us give an informal description of protocol Master. First, the agent-master
computes set of lexical agents LO, then it finds set of conflict lexical agents
AmbLO. After that it sends Start to all agents and launches parallel computing
agents’ weights and finding relatives for conflict lexical agents. After all agents
finish their job, the master computes conflict groups of relatives, then detects
conflict groups of friends. By comparing weights of conflict groups of friends,
it forms a list of agents to be removed. After finishing of this resolution of
group conflicts, the master launches the corresponding system changes. After
termination of the changing, it initiates all kinds of syntactic disambiguation for
instance agents in parallel.
    Below we give informal descriptions of several protocols of the system agents.
(3) Computing agents’ weight
Following the definitions of the weights agent a computes in parallel weights
of (utility) descendants and (utility) ancestors by every connection c ∈ Ca ,
launching the corresponding subprocesses for each c ∈ Ca . These non-utility
subprocesses send the weights of their descendants (ancestors) increased by 1 to
predecessors (successors) respectively. Utility subprocesses do not increase the
weights. If connection c is of type gen then the corresponding descendants’ sub-
processes send the weight of a to the predecessors. When these parallel computa-
tions are finished, the agent computes its own weight. The protocol of weights
computing belongs to the class of wave echo algorithms [12].
(4) Computing agents’ relatives
Let agents from AbmLO be numerated. Computing relatives consists of two
stages. Agents act asynchronously. (1) Pairwise search. Elder agent a using id of
every younger agent b sends couple of ids (a.id, b.id) to its descendants via its
successors. If some descendant of a finds both numbers among its connections
then it returns to a the id of b. After receiving agent a adds b to set of its relatives.
Termination of this computation can be detected by AB-algorithm from [4]. (2)
Merging. Elder agent a sends a request to every younger agent b for a set of its
relatives b.Rlt. If a.Rlt ∩ b.Rlt 6= ∅, then agent a merges both sets and agent b
removes its set of relatives and stops its computation. After termination of the
computation there are several agents with nonempty sets of relative groups.
(5) Removing LO-agents from the system
If agent a has to be removed from the system, then (1) all its predecessors remove
all connections with it and delete a from sets of successors; (2) its descendants
remove a) all connections with it, b) the corresponding predecessors c) the cor-
responding attribute value; and d) if the removing connection is of generating
type then the descendant has to be removed from the system.
    Resolution of syntactic ambiguities Synt11 and Synt12 consists of two steps.
(6) Synt11 resolution.
(1) Ambiguity detection. Every agent, using sets of its successors, checks if some
attribute value effects on values of several instances. If yes and these instances
form a homogeneous group, and satisfy a predefined grammar condition, then
it sends a message with the type of the conflict and the conflict value to every
agent in the group excluding the first agent in the group. (2) Agents in the group
resolve the ambiguities following the resolving formulas for Synt11. For this they
compute an evidence power of the ambiguous value.
(7) Synt12 resolution.
(1) Ambiguity detection. Every agent, using sets of its successors, checks if some
attribute value effects on values of several instances. If yes and these instances
do not form a homogeneous group, and satisfy a predefined grammar condition,
then it sends a message with the type of the conflict, the conflict value, and ids of
the competitors to every agent in the group. (2) Agents in the group send their
evidence power to the competitors. Then they resolve the ambiguities following
the resolving formulas for Synt12.
(8) Synt112 resolution.
If an agent finds attributes ω1 and ω2 with value c then it compares evidence
powers EQ(a, ω1 , c) and EQ(a, ω2 , c). The attribute value is removed from values
of an attribute with the less power.
(9) Synt211 resolution.
If an agent finds attribute ω with values c1 and c2 then it compares evidence
powers EQ(a, ω, c1 ) and EQ(a, ω1 , c2 ). The attribute value with the less power
is removed from values of the attribute.


6   Conclusion

In this paper, we show that maximal instances of the ontology classes that take
part in the process of population form, together with the rules of data processing
and ontology population, a Scott information system. This result justifies reso-
lution of context-dependent lexical ambiguity by calculating context cardinali-
ties. The Scott information system is also a basis for our approach to syntactic
context-dependent ambiguity resolution. This system generates a multi-agent
system in which agents resolve the ambiguities by computing the cardinality of
their contexts and evidence powers. The suggested algorithm of lexical ambi-
guity resolution chooses the most powerful group of agents and removes their
competitors. The choice is based on agents’ weights and their effect on the sys-
tem.
    We considered independent lexical and syntactic ambiguities only. In the near
future we plan to study disambiguation of combination of various types syntactic
and lexical ambiguities. In this work it is useful to introduce a membership
probability of attribute ambiguity values and a degree of their effect on other
instances. We would like to try the developed technique for resolving references
also.


References
 1. Alfawareh H.M., Jusoh S. Resolving Ambiguous Entity through Context Knowl-
    edge and Fuzzy Approach // International Journal on Computer Science and En-
    gineering (IJCSE). ISSN: 0975-3397, Vol. 3, No. 1, 2011. pp. 410–422
 2. Berry, D.M., Kamsties, E., Krieger, M.M. From contract draft-
    ing to software specification: Linguistic sources of ambiguity (2003),
    http://se.uwaterloo.ca/d̃berry/handbook/ambiguityHandbook.pdf (31.01.2016)
 3. Ganter B., Wille R. Formal Concept Analysis. Mathematical Foundations.
    Springer Verlag, 1996.
 4. N. O. Garanina, E. V. Bodin. Distributed Termination Detection by Counting
    Agent // Proc. of the 23nd International Workshop on Concurrency, Specification
    and Programming (CS&P 2014), Chemnitz, Germany, 29. September - 01. Oktober
    2014. Humboldt-Universitat zu Berlin, 2014, pp. 69–79.
 5. Garanina N., Sidorova E., Bodin E. A Multi-agent Approach to Unstruc-
    tured Data Analysis Based on Domain-specific Onthology // Proc. of the 22nd In-
    ternational Workshop on Concurrency, Specification and Programming, Warsaw,
    Poland, Sept. 25-27, 2013. CEUR Workshop Proceedings, Vol. 1032, pp. 122–132
 6. Garanina N., Sidorova E. An Approach to Ambiguity Resolution for Ontol-
    ogy Population // Proc. of the 24th International Workshop on CS&P. Rzeszow,
    Poland, Sep. 28-30, 2015. – University of Rzeszow, 2015, Vol. 1, pp. 134–145.
 7. Gleich B., Creighton O., Kof L. Ambiguity Detection: Towards a Tool Ex-
    plaining Ambiguity Sources // Proc. of 16th International Working Conference
    Requirements Engineering: Foundation for Software Quality, Essen, Germany, June
    30–July 2, 2010, LNCS Vol. 6182, pp. 218-232.
 8. Kim D.S., Barker K., Porter B.W. Improving the Quality of Text Understand-
    ing by Delaying Ambiguity Resolution // Proc. of the 23rd International Conference
    on Computational Linguistics, Beijing, 2010. pp. 581–589
 9. Navigli R. Word sense disambiguation: a survey. ACM Computing Surveys,
    41(2):1–69, 2009.
10. Sidorova E., Kononenko I., Zagorulko Yu. Knowledge-based approach to doc-
    ument analysis // International Journal ”Information Technologies and Knowl-
    edge”. Vol.2, No. 1, 2008. pp. 17–22.
11. Spasic I., Zhao B., Jones C., Button K. KneeTex: an ontology-driven system
    for information extraction from MRI reports.// J. Biomedical Semantics, 6, 2015,
    p. 34.
12. Tel G. Introduction to Distributed Algorithms. Cambridge University Press, 2000.
13. Winskel G. The Formal Semantics of Programming Languages: An Introduction.
    MIT Press, 1993.
14. Zhang G.-Q. Chu Spaces, Concept Lattices, and Domains // Electronic Notes in
    Theoretical Computer Science (ENTCS). Vol. 83, Jan 2013, pp. 287–302