Vocabulary Alignment for Agents with Flexible Protocols Paula CHOCRON, Marco SCHORLEMMER a IIIA-CSIC, Bellaterra, Catalonia, Spain Abstract. In recent work we presented an interaction-based alternative to the problem of vocabulary alignment in multi-agent systems. Previous approaches required exter- nal knowledge or a previously shared meta-language to fid a translation between the vocabularies of heterogeneous agents. Instead, we assume that agents share the knowledge of how to perform the tasks for which they need to collaborate, and we show how they can learn alignments from repeated interaction. The result is an online, on-demand alignment method, that uses only information agents need to interact. Until now, we have always required agents to share the complete procedu- ral knowledge of the task. In this paper we present an extension that allows us to consider, in a meaningful way, differences between the agents’ specifications. To this aim, we propose a new kind of protocols with constraints that have weights, to represent a punishment that is received when they are violated. We adapt the learn- ing techniques from previous work to these new protocols, and present preliminary experimentation for two possible uses. 1. Introduction The problem of aligning vocabularies has been identified as particularly relevant in the context of dynamic and open environments such as multi-agent systems [5]. Partici- pants in this kind of systems have different backgrounds and may therefore not share the same vocabulary, making alignments necessary to interact meaningfully with each other. This problem has been considered in the past decades from two perspectives. Some ap- proaches [9,4] assume the existence of external elements, such as physical objects, that all agents perceive in common, and explore how those can be used to explain the mean- ing of words. A second group of techniques [7,8] provide explicit ways of learning or agreeing on an alignment, such as argumentation techniques, explanations, or definitions. These techniques always require agents to share a meta-language. The question of how to communicate with heterogeneous interlocutors when neither a physical context nor a meta-language are available remains practically unexplored. In recent work [1,2,3] we proposed a novel approach to the problem of multi-agent vocabulary heterogeneity. In our techniques, that we call interaction-based, the align- ment is performed considering only the context given by the interactions in which agents engage. In this work we assume the interacting agents share the knowledge of how to perform a task, or, more concretely, the specification of an interaction. As an example, consider an ordering drinks interaction between an English speaking customer and an Italian waiter. Since the agents share the dynamics of the conversation they will, for ex- ample, both know that the waiter will ask for the colour if wine is ordered. However, the words used by each agent can be different (vino and colore instead of wine and colour). In previous work, we developed techniques that agents can use to learn progressively, from the experience of interacting, which mappings lead to successful interactions. After several dialogues, agents learn an alignment that they can use to order and deliver drinks with that particular interlocutor. An important restriction that we (until now) imposed on agents is that they have to share the entire structure of the interactions they perform. This means that they need to agree on exactly how each interaction should be performed, modulo an alignment. This is a reasonable decision considering that our goal was to study how agents can learn alignments from shared interaction specifications. However, it is in practice a strong restriction, and it always raises the question of what agents can learn if the protocol is not completely shared. Our answer was that if the differences are small, they will be ignored by the learning methods, which work statistically. If, instead, there are significant differences, agents have nothing to learn, since there is no alignment that is useful to perform the tasks together. However, we have never performed a systematic analysis of how these differences affect the learning process. In this paper we present an extension to the problem we studied in previous work, that considers the case when agents do not share the complete alignment. To this aim, we present a new kind of interaction specifications, based on the LTL constraint protocols that we used in [3]. These new protocols include a weight for each interaction rule, introducing a meaningful way to include differences between specifications. We provide an adaptation of our learning techniques to these protocols, and a way of measuring the quality of an alignment. These ideas can be seen from two points of views. One is to consider agents that use unrelated protocols, and to analyse how the techniques can find the alignment that minimize the punishment. Alternatively, we can suppose agents use protocols that have a shared portion and then diverge in some local rules, and use the method to find the alignment for the common part. In section 4 we present preliminary experiments for these two applications. 2. Vocabulary Alignment in (Rigidly) Openly Specified Interactions In this section we summarize the approach that we proposed in [3] to let agents infer alignments from the experience of interacting. We consider two agents a1 and a2 that have different vocabularies V1 and V2 respectively. Agents interact repeatedly using, for each interaction, the specification defined in a protocol. In this work we used open spec- ifications consisting of rules that constrain what can be said without specifying com- pletely the flow of the interaction. We chose to use a reduced version of the ConDec protocols [6], which define the constraints in linear temporal logic (LTL). To define the protocols formally, let a vocabulary V be a set of words and consider a pair of agent IDs A = {a1 , a2 }. The set of messages MV,A is the set of all tuples hai , vi where v ∈ V and i ∈ {1, 2}. An interaction protocol over V and A is a tuple hMV,A ,Ci, where C is a set of constraints on how messages can be uttered. Constraints are LTL sen- tences renamed conveniently, that are shown in Table 21 . In this table, n ∈ N, a, b ∈ MV,A , and ♦, , d are the LTL operators that mean eventually, globally and next respectively. These constraints are generally divided into two classes. Existential constraints (absence) predicate over how many times a message can be uttered, while relational constraints describe binary relations between two utterances. In a protocol hMV,A ,Ci, the constraints C needs to be a subset of the possible constraints over MV,A . Table 1. LTL definitions of constraints Constraint LTL meaning absence(a) ¬♦a !correlation(a, b) ♦a → ¬♦b !response(a, b) (a → ¬♦b) !be f ore(a, b) (♦b → ¬a) premise(a, b) ( cb → a) !premise(a, b) ( cb → ¬a) imm a f ter(a, b) (a → cb) !imm a f ter(a, b) (a → c¬b) During an interaction, a1 and a2 have protocols P1 = hMA,V1 ,C1 i and P2 = hMA,V2 ,C2 i respectively, and each agents send messages to its interlocutor respecting the constraints in its protocol. As an example, consider again the ordering drinks scenario where the waiter w speaks Italian and the customer c speaks English. In this situation, agents could be using the following the following two protocols Pw and Pc , where w is the waiter and c is the customer. Pw = {premise(hw, da berei, hc, birrai), premise(hw, da berei, hc, vinoi), !response(hc, birrai, hw, colorei), imm a f ter(hc, vinoi, hw, colorei)} Pc = {premise(hw,to drinki, hc, beeri), premise(hw,to drinki, hc, winei), !response(hc, beeri, hw, colori), imm a f ter(hc, winei, hw, colori)} A sequence of messages is called an interaction. We say an interaction is possible for a protocol if it does not violate any of its constraints, or, more formally, if it is an LTL-model of each of its constraints. As we mentioned, until now we required agents to share the complete structure of the protocols they use for an interaction. To formalize this restriction, we defined the notion of compatibility between protocols. Intuitively, two protocols are compatible if the interactions that are possible for each protocol are equivalent under an alignment. Given two vocabularies V1 and V2 , an alignment from V2 to V1 them is a partial function 1 The number of constraints in the original paper was longer, but we only include here the ones that we actually used in the learning method (the non-monotonic ones). α : V2 → V1 . An interaction in vocabulary V2 can be translated to an interaction in V1 using α if all its messages are in the domain of the alignment, by translating each word. Definition 1 Consider two interaction protocols P1 = hMA,V1 ,C1 i and P2 = hMA,V2 ,C2 i, and let Int(Pi ) be the set of possible interactions in Vi . P1 and P2 are compatible if there exists a partial function α : V2 → V1 such that Int(P1 ) = α(Int(P2 )) If we know the alignment α for which the condition holds, we can say they are compatible under α. Since agents send messages composed of their own words, when they receive a for- eign message they need to interpret it in their local vocabulary to continue the interaction. The objective of the learning techniques we propose in [3] is to infer the alignment α under which the protocols are compatible, something that would allow agents to interpret correctly the messages they receive. The approach we present for learning alignments from interactions is simple. From now on we will use the point of view of a1 , but everything is analogous for its inter- locutor. The agent maintains a confidence distribution ω : V20 × V1 → [0, 1] that assigns a value to each mapping between a known foreign word (V20 ⊆ V2 ) and a word in their vocabulary. These values are updated according to what agents observe in interactions. We present here only the simplest one of the methods we proposed in [3]. Briefly, when an agent receives a word, it punishes all interpretations that are not possible because they violate some constraint. Suppose a1 receives v2 after observing interaction i, and let r be a punishment parameter and i . m the operation of appending a message m to i. The update is described in Eq. (1). ( ω(v2 , v1 ) · (1 − r) if i . ha2 , v1 i is not possible for P1 ω(v2 , v1 ) := (1) ω(v2 , v1 ) otherwise For example, if the customer receives colore right after saying wine, it infers that it cannot mean to drink. By interacting repeatedly with different protocols, agents gradually learn an alignment between their vocabularies. 3. Flexible Protocols We now propose an approach that considers more carefully the question of to what extent agents can align their vocabularies when their protocols are different. To this aim, we introduce flexible protocols, in which each constraint has a weight that represents a pun- ishment received when that constraint is violated. This punishment can be interpreted, for example, as a way of expressing preferences (heavier constraints are those that agents prefer not to violate), or degrees of confidence on a constraint, when there is uncertainty about the interaction context. Definition 2 A flexible protocol P f over a vocabulary V and a set of agents A is a tuple hMV,A ,C f i. The set C f is composed of pairs hc, ρi, where c is one of the constraints in Table 2 over MV,A , and ρ ∈ [0, 1]. As an example, consider again the ordering drinks scenario. Assume they have the same constraints as before with high weight, but now the waiter also believes that the customer should not order two different alcoholic beverages in one interaction. This constraint, however, is less strict than the others, since the waiter is willing to accept that behaviour some times. The protocols would look as follows. Pw = {hpremise(hw, da berei, hc, birrai), 1i, hpremise(hw, da berei, hc, vinoi), 1i, h!response(hc, birrai, hw, colorei), 1i, himm a f ter(hc, vinoi, hw, colorei), 1i, h!correlation(hc, vinoi, hc, birrai), 0.5i} Pc = {hpremise(hw,to drinki, hc, beeri), 1i, hpremise(hw,to drinki, hc, winei), 1i, h!response(hc, beeri, hw, colori), 1i, himm a f ter(hc, winei, hw, colori), 1i} 3.1. Updating Technique The core contribution of our previous work consists in the learning techniques that we proposed to update the agent’s mapping confidence distribution. In rigid protocols the objective of the learning is to find a correct alignment α under which the protocols were compatible. In [3] we show how, to that aim, it is enough to consider whether a word is a possible interpretation for a received message. The notion of compatibility cannot be applied to a flexible protocol, since the question of whether an interaction is possible is not meaningful anymore. When interacting with a flexible protocol, any sequence of messages can be accepted, but at some cost. This implies that there is no reference align- ment α that agents can use to understand each other completely. Instead, each alignment will have an expected punishment that will make it better or worse. The goal now is to find an alignment that optimizes the punishment that is received while interacting. To do so, agents need to take into account the weight of the rules that would be violated for each interpretation. We propose only one simple approach, that combines the punishment that would be received in a particular moment as well as the information that the agent already had. As in the rigid cae, agents that use flexible protocols have a confidence distribution ω that assigns a value to each possible mapping. Consider again agents a1 , a2 with flexible protocols Pfj = hMVi ,A ,C jf i for j ∈ {1, 2}. Suppose a1 receives a word v2 from a2 after interaction i. For a word v ∈ V1 , let B(v, Pf1 ) be the set of all constraints c included in a tuple in C1f such that the interaction obtained by appending ha1 , vi to i violates c, but i does not. Let ρ(Pf , c) be the weight of constraint c in Pf . Then, for each v ∈ V1 , and for each br ∈ B(v, Pf1 ), agent a1 iteratively updates: ω(v2 , v) := ω(v2 , v) ∗ (1 − ρ(Pf 1 , br)) After all updates are done, the values are normalized to make ∑v∈V1 (ω(v2 , v)) = 1. 3.2. A Distance Measure An important question to address before adapting our techniques to the flexible case is the one of when an alignment is good for these new protocols. In the rigid case we measured the quality of an alignment by computing its distance to the reference alignment α, using the standard precision and recall measures. In flexible protocols, where the objective is to minimize the received punishment, a quality measure should reflect how much agents would pay when using them. We propose a method that evaluates a pair of alignments for a given pair of protocols. Alignments are evaluated in pairs, and not by themselves, because the punishment that an agent receives depends on how its interlocutor interprets the words it utters. The objective is, therefore, to measure how well they would interact together if using the obtained alignments. Our objective will be to measure the quality of the alignments that agents learn during with our techniques. Since agents have only confidence distributions, we first need to explain how to obtain an alignemnt from them. For simplicity, in this explanation we will assume that these alignments are, for agent a1 , a partial function α1 : V2 → V1 . This alignment maps each foreign word known by the agent with the local word for which the mapping value ω is maximal. In practice, agents frequently do not have enough information to distinguish between possible mappings, so there are many of them with maximal value. Therefore the alignment is actually a function α1 : V2 → P V1 , where P S is the set of all subsets of a set S. To adapt the measure we present now to this kind of alignments it is only necessary to consider α1 and its analogue for a2 , α2 , as a set of possible alignments, and compute the measure for each combination. The straightforward way to compute the expected punishment for a pair of protocols and a pair of alignments consists in obtaining all interactions, filtering the ones that could actually occur in an interaction, and computing the average of the received punishment for them. Even if the interactions have bounded length, this method is computationally very expensive, since the amount of interaction grows exponentially with the length. For- tunately, for our particular set of constraints, it is enough to consider only the punishment received by interactions of length 2 to have a measure of the alignment’s quality. This is because the constraints are unary or binary, so any violation that could happen would also occur in an interaction of length 2. In future work we plan to investigate in more depth the relation between the measure computed in this way and the actual expected punishment for longer interactions. We consider only interactions that could be uttered, and discard all sequences of messages that would never occur in a real exchange. These impossible interactions are the ones in which an agent sends an utterance that causes an unnecessary punishment. The notion of utterable interactions is defined over abstract interactions. These are se- quences that are actually said before any interpretation, where the messages uttered by a1 have words in V1 and the ones by a2 have words in V2 . Definition 3 A two-message abstract interaction i = [ha j , wi, ha0j , w0 i], for j ∈ {1, 2} is utterable for alignments α1 , α2 and protocols P1 and P2 if: • [ha j , wi] does not violate any constraint in P j and • α 0j (i) does not violate more constraints for P0j than α 0j ([ha j , wi]) does. We will call UP1 ,P2 ,α1 ,α2 to the set of all utterable sentences for those protocols and alignments. Next we need to define the punishment for an interaction with a given alignment and protocol. Definition 4 The punishment for an interaction i and a protocol P (noted ρ(P, i)) is the added weight of all constraints that are violated by i in P. The quality of alignments α1 , α2 for P1 , P2 is: ∑i∈UP1 ,P2 ,α1 ,α2 ρ(P1 , α1 (i)) + ρ(P2 , α2 (i)) |UP1 ,P2 ,α1 ,α2 | This quality measure is bilateral and measures the amount paid by two agents. It can also be made unilateral by only considering the punishment for one protocol, but it will still depend on both alignments. Note also that we defined the measure for a case in which all words have the same likelihood of being uttered, but it can be easily adapted for other cases. If we know the probability of uttering a given word, to take it into account in the measure it is enough to multiply the punishment for an interaction i ∈ UP1 ,P2 ,α1 ,α2 by the likelihood of i. The extension is not so simple when agents have more complicated distributions to choose an utterance, such as one in which the likelihood of sending a message depends on the previous interaction. 4. Experimental Evaluation Flexible protocols, along with the updating technique that we proposed, can be analysed from two points of view. First, we can consider agents that have structurally different protocols for the same task, without requiring any a priori relation between them. In this case, the technique can be used as a tool to find an alignment that minimizes the punishment received by agents for that specific protocol. This can also be used for small sets of protocols. A second point of view considers flexible protocols as local variations of a common compatible part. In this case we study how the technique lets agents find the correct alignment (the one that is useful for the common par) in spite of the differences. In the following sections we explain the experiments that we performed to study these two cases. 4.1. Learning an Alignment that Optimizes Punishment In this set of experiments we analysed how our method can be used by agents that have different protocols, without any similarity assumption. The objective of the interacting agents is to find the alignment that would make them pay as little as possible. For each experiment we created a pair of flexible protocols, gave one to each agent, and let them interact for some time, observing the alignment that they obtained in the end. To decide the quality of the alignments, we used two different measures: 1. The real punishment received by agents when they used the alignment 2. The quality of the alignments with respect to the protocols measured as explained in Section 3.2 Since we proposed the alignment quality measure as an expected value of the pun- ishment received by agents, our hypothesis is that the curves obtained with it are similar to the ones that show the average received punishment. However, the distance measure only considers the expected punishment for interactions of length 2. Since agents use longer interactions, the value of the real punishment will be larger. An experiment consists of two agents, each of them with a vocabulary and a protocol over it. In this preliminary experimentation the size of the vocabularies is of 4 words. We performed experiments with different protocol sizes and total weight (obtained by adding up the weight for each constraint). Concretely, we used protocols with 6 and 8 rules, and total weight 3.0 and 5.0. In each experiment, both protocols have the same number of constraints and the same added weight. In an experiment, we let agents go through a learning phase in which they interacted n times, performing the experiment for n = [0, 2, 5, 10, 15, 20, 30]. During these alignments they updated their confidence distributions on mappings. All interactions were of length 6. After this training phase, we ran a test phase of 20 interactions, during which agents did not learn new information. Instead, they interpreted foreign words using the alignments obtained in the previous phase. For the second measure, we added the total punishment that agents received during the test phase. We repeated the experimentation 10 times, for 10 different protocols. Figure 1. Results for a vocabulary of 4 words Figure 1 shows the results obtained for each experiment. We can see that with more training, agents find alignments that are better according to the measure, and that let them interact receiving lower punishment. The distance measure seems to change in the same way as the actual received punishment, which indicates it is a good estimate. We would need a larger set of experiments to answer some interesting questions that are not reflected in this data, such as the relative importance of the amount of constraints and the total weight. 4.2. Learning a Correct Alignment in Protocols with Variations The objective of the second type of experiments is to analuze the learning techniques when applied by agents with protocols that diverge in some degree from a shared com- patible part. To build these kind of protocols, we used four parameters: - rs : the number of shared constraints - rd : the number of different constraints - ws : the total weight for shared constraints - wd : the total weight for different constraints We started from two vocabularies V1 and V2 and an alignment α between them, and created two protocols with rs constraints that were compatible under α. This was the shared part. We distributed the weight ws over those constraints (to simplify the experimentation, we did this uniformly). Then, we added rd different constraints to each protocol, and distributed the weight wd between them (not necessarily uniformly). In each experiment, we let agents interact 100 times with different pairs of protocols, and measured when (and if) they converged to α. It is important to make one comment here. In our previous work, it was enough to consider that agents had found α when their learned alignments were equivalent to it. This is because in rigid protocols, using α would always lead to correct interactions. This is not the case any more, since agents could receive punishments even using α, that could made them change the alignment. For this reason, we considered an agent had converged when after they had spent 15 interactions without changing the alignment (but we recorded the number where they first got to α, before those 15 interactions). Figure 2. Results for a vocabulary of 4 words and protocols with 8 common constraints In our experiments, we considered a large set of shared constraints (8 constraints, with weight 0.8 each), and studied the convergence to α for different values of rd and wd The results can be seen in Figure 2. We performed the experiment 10 times, with different protocols. The figure shows the number of experiments after each amount of interactions. We used rd = {0, 1, 2} and wd = {0.0, 0.5, 1.0, 1.5, 2.0}. Some combinations were not performed, because the weight was impossible to distribute in the amount of rules. We see how more and more heavy different constraints imply more difficulty to find α. We also observe an unexpected effect. Even with the same weight, more constraints slow down the convergence. We think this is because of how agents choose which word to utter. Now, they only utter words that do not violate any rule, so they are more restricted, which could make interactions finish in failure earlier, or could make all interactions very similar between them. Another thing to comment is that the difference between small weights cannot be seen correctly in these experiments (for example, see the case od rd = 2, wd = 0.0 vs rd = 2, wd = 0.5). We plan to study this effect more exhaustively in the future, with a larger set of experiments. To have a deeper comprehension of how these techniques work, they should be com- pared to the performance of the original learning methods for rigid protocols, when used for protocols that have some diverging constraints. In this way, we would be able to detect if taking into account explicitly the weight of different constraints is better than considering them as noise. 5. Conclusions We introduced a new kind of protocols that allow us to investigate the problem of vocab- ulary alignment for agents that do not share exactly the same procedural knowledge of the tasks they perform together. This can mean either that they have unrelated specifica- tions, or that they share a common part that has a significant number of constraints and another portion differs. The technique that we propose to learn alignments in this case is a simple adaptation of the one we had introduced in previous work, and is useful for both situations. This work is still in a preliminary state, but we see it as a basis to many extensions. First of all, the experimental part needs to be improved, considering larger protocols and making a more accurate analysis. The current experiments should only be seen as pre- liminary tests that give an idea of how the method works and provide input for future experimentation. For example, in the first experiment it is possible to consider the opti- mality of the solution the agents find, something that would be useful to understand how the technique works. In addition to this, there are different questions to consider regard- ing the technique. We think it would be particularly interesting to consider interactions between agents with different kinds of preferences. For example, what kind of alignment would a pair of agents reach if one of them has very high preference values and the other one very low? References [1] Manuel Atencia and Marco Schorlemmer. An interaction-based approach to semantic alignment. Journal of Web Semantics, 12-13:131–147, 2012. [2] Paula Chocron and Marco Schorlemmer. Attuning ontology alignments to semantically heterogeneous multi-agent interactions. In ECAI 2016 - 22nd European Conference on Artificial Intelligence, 29 August- 2 September 2016, The Hague, The Netherlands - Including Prestigious Applications of Artificial Intelli- gence (PAIS 2016), pages 871–879, 2016. [3] Paula Chocron and Marco Schorlemmer. Vocabulary alignment in openly specified interactions. In Pro- ceedings of the 16th International Conference on Autonomous Agents and Multiagent Systems (AAMAS 2017) (To appear), 2017. [4] Jurriaan Van Diggelen, RobbertJan Beun, Frank Dignum, Rogier M. Van Eijk, and John Jules Meyer. On- tology negotiation: Goals, requirements and implementation. Int. J. Agent-Oriented Softw. Eng., 1(1):63– 90, April 2007. [5] Jérôme Euzenat and Pavel Shvaiko. Ontology Matching. Springer-Verlag New York, Inc., Secaucus, NJ, USA, 2007. [6] M. Pesic and W. M. P. van der Aalst. A declarative approach for flexible business processes manage- ment. In Proceedings of the 2006 International Conference on Business Process Management Workshops, BPM’06, pages 169–180, Berlin, Heidelberg, 2006. Springer-Verlag. [7] Gabrielle Santos, Valentina Tamma, Terry R. Payne, and Floriana Grasso. A dialogue protocol to sup- port meaning negotiation.: (extended abstract). In Proceedings of the 2016 International Conference on Autonomous Agents and Multiagent Systems, AAMAS ’16, pages 1367–1368, Richland, SC, 2016. International Foundation for Autonomous Agents and Multiagent Systems. [8] Nuno Silva, Gecad Isep Ipp, and Paulo Maio. An approach to ontology mapping negotiation. In Proceed- ings of the Workshop on Integrating Ontologies, pages 54–60, 2005. [9] Luc Steels. The Origins of Ontologies and Communication Conventions in Multi-Agent Systems. Au- tonomous Agents and MultiAgent Systems, 1:169–194, 1998.