<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Eighth Workshop on Natural Language for Artificial Intelligence, November</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>AI Multi-Agent Interoperability Extension for Managing Multiparty Conversations</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Diego Gosmar</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Deborah A. Dahl</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Emmett Coin</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff4">4</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>David Attwater</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Conversational Technologies</institution>
          ,
          <addr-line>Plymouth Meeting, Pennsylvania</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Linux Foundation AI &amp; Data</institution>
          ,
          <addr-line>Open Voice Interoperability Initiative</addr-line>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Talkmap</institution>
          ,
          <addr-line>Southport, Merseyside</addr-line>
          ,
          <country country="UK">United Kingdom</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>XCALLY</institution>
          ,
          <addr-line>Torino, TO 10100</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff4">
          <label>4</label>
          <institution>ejTalk</institution>
          ,
          <addr-line>Bellville, Michigan</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2024</year>
      </pub-date>
      <volume>2</volume>
      <fpage>6</fpage>
      <lpage>27</lpage>
      <abstract>
        <p>This paper presents a novel extension to the existing Multi-Agent Interoperability specifications of the Open Voice Interoperability Initiative (originally also known as OVON from the Open Voice Network), which already enables AI agents developed with diferent technologies to communicate seamlessly using a universal, natural language-based API or NLP-based standard APIs. Focusing on the management of multiparty AI conversations, this work introduces new concepts such as the Floor Manager, Convener Agent, Multi-Conversant Support, and mechanisms for handling Interruptions and Uninvited Agents. These advancements are crucial for ensuring smooth, eficient, and secure interactions in scenarios where multiple AI agents need to collaborate, debate, or contribute to a discussion. The paper elaborates on these concepts and provides practical examples, illustrating their implementation within the conversation envelope structure.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Artificial Intelligence</kwd>
        <kwd>Multi-Agents</kwd>
        <kwd>Agentic</kwd>
        <kwd>Conversational AI</kwd>
        <kwd>AI Specifications</kwd>
        <kwd>NLP and AI Applications</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        1. Modality Components Collaboration: Early eforts, such as the W3C Multimodal
Architecture[
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] and the Galaxy Communicator Software Infrastructure [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], focused on enabling
collaboration between independent modality components (e.g., speech recognition, natural
language understanding). These systems allowed components to work together using standard APIs.
However, these approaches were limited to tightly integrated systems and did not address the
broader need for interoperability among truly independent conversational agents.
2. Agentic AI with Hard-Wired Assistants: Another approach, exemplified by systems like
AutoGen [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and OpenDevin [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], involved hard-wiring assistants together at development time,
allowing them to collaborate more flexibly than monolithic designs. This method enabled the
addition of new functionalities by incorporating new assistants. However, it required that all
collaborating assistants be predefined, limiting the system’s ability to scale and adapt to new or
unforeseen tasks and environments.
3. VoiceXML and Simple Collaboration: The VoiceXML [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] framework provided a mechanism for
basic collaboration among voice dialog systems through the &lt;transfer&gt; element, allowing the
transfer of user interactions between systems. However, this approach was limited to voice-based
systems and required the receiving agent to adhere to specific protocols, making it unsuitable for
broader interoperability across diverse AI agents.
4. Inter-Agent Communication Languages (ICL): Systems like the Open Agent Architecture
[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] used Inter-Agent Communication Languages to facilitate collaboration among independent
agents. While this reduced dependencies on specific internal architectures, it required agents
to interpret highly structured semantic representations, which constrained the flexibility and
scalability of the system.
      </p>
      <p>
        In contrast to these approaches, the OVON (Open Voice Network) framework introduced in our
previous work[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] sought to overcome these limitations by establishing a highly scalable and flexible
method for AI agent interoperability1. Our framework supports a wide range of independent assistants,
regardless of their underlying technologies, enabling them to collaborate through minimal
communication standards. This loose coupling dramatically reduces the complexity of integrating new assistants
into the ecosystem, thereby enhancing scalability. However, this initial work covers only conversations
between one user and one assistant at a time. That is, if the user wants to get information from more
than one assistant, they have to access multiple assistants in sequence. This most likely will have two
less-than-optimal consequences. In the first place, any information from the conversation with the first
assistant that is required by the second assistant will have to be explicitly transferred to the second
assistant when the second assistant is invited to the conversation. The second and more significant
drawback is that any higher-level conclusions resulting from the various conversations will have to
be determined by the user. That is, since the assistants don’t know about the other tasks, they won’t
be able to make suggestions that combine information gathered from other assistants with their own
information.
      </p>
      <p>
        Let’s look at an example. Suppose a user is planning a trip that involves booking a flight, a rental car,
and a hotel, and also involves looking for interesting things to do in the destination city. This planning
could involve conversations with four or more assistants. The travel dates, which all of these assistants
need, have to be passed to each assistant in turn to avoid making the user repeat them. In addition,
if the assistants are talking together, the tourist information assistant could point out that there is a
music festival that the user would enjoy, but attending it would require extending the trip by one day.
If the tourist assistant is involved in the flight booking conversation, it could tell the user about the
festival even before the user books their flight. This could save the user a lot of time. A similar use case
is described in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], where several agents are jointly assigned the task of allocating beds to hospital
patients. Each agent has its own knowledge which it brings to the discussion of how to allocate a bed
to a specific patient, arguing why or why not a particular bed is suitable for that patient. It would be
very cumbersome if the user had to consult each agent in sequence to perform this task. Many other AI
healthcare-specific applications could benefit from having conversational AI multi-agents coordinate
with each other to enhance awareness of patient situations, including, for example, this risk detection
model[
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] for assisting vulnerable people. For these reasons, we propose to extend the earlier two-party
conversational specifications[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] to handle requirements for conversations involving multiple assistants.
Multi-party dialog systems have been discussed in the literature, for example [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ][
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] among others.
1For the remainder of this document, the term "agent" will be used to refer to an entity with the capacity to act, while "agency"
or “agentic” will denote the exercise or manifestation of this capacity, in accordance with the definition provided by Markus
Schlosser[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ].
[
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] describes a multi-agent system with user-initiative, where several agents can be present but the
agents don’t collaborate – they simply respond individually to user questions. [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] describes a system for
collaborative problem-solving among agents, but it is restricted to one domain in that all of the agents
are experts in diferent aspects of a larger problem. Our goal is to be able to support mixed-initiative
applications with multiple agents that collaborate across domains. These are the requirements that we
propose for support of multi-party conversations:
1. It must be possible to hold a conversation among more than two conversants.
2. Conversants must be able to come and go during a conversation.
3. It should be possible for a subset of conversants to be able to hold private conversations among
themselves.
4. There should be no fixed limit on the number of conversants.
5. There should be a way to control possible unruly conversants through techniques like muting or
ejecting.
      </p>
      <p>Requirement 1 is the key requirement for support of multi-party conversations. The other requirements
support it. This paper extends the initial specifications by introducing key concepts that address the
specific requirements of managing multiparty conversations within the context of AI-driven multiparty
conferences. The new concepts introduced in this work—such as the Floor Manager (figure 1), and
related Multi-Conversant Support, Convener Agent, and mechanisms for handling Interruptions and
Uninvited Agents—are designed to ensure that AI agents can collaborate efectively in dynamic,
multiagent environments. These extensions not only enhance the framework’s ability to handle complex,
multi-party interactions but also ensure that the system can scale to accommodate a growing number
of agents and tasks.</p>
      <p>
        For instance, in scenarios where a human interacts with multiple AI assistants for various tasks—such
as coordinating events, managing appointments, or retrieving information—the framework ensures
seamless communication and task delegation among the agents. This is achieved independently of
each agent’s underlying technologies or models, showcasing the system’s ability to scale across
diferent applications and user needs. Previous work[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] laid the foundation for AI agent interoperability,
establishing the basic framework for seamless communication between independent conversational
agents. However, the extensions presented in this paper are essential for overcoming the challenges
associated with scalability and efective management in multiparty conversational settings. These
enhancements introduce a versatile and adaptable platform that ensures AI-driven multiparty conferences
can be conducted smoothly, with agents collaborating eficiently and efectively, regardless of their
technological diversity. This approach not only addresses the current needs of evolving AI ecosystems
but also provides a robust and future-proof solution capable of integrating new agents and capabilities
as they emerge.
      </p>
    </sec>
    <sec id="sec-2">
      <title>2. Extensions to the Conversation Envelope</title>
      <p>
        The concept of Multi-Agent Interoperability revolves around creating a shared protocol, based on
standard universal APIs using NLP, that allows heterogeneous AI agents to communicate efectively.
This is achieved through a standardized conversation envelope API, as detailed in previous research[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ],
which defines the message structure and communication protocols. In multiparty scenarios, such as
AI-driven conferences or collaborative tasks, the existing framework needs to be enhanced to manage
the flow of conversation among multiple agents, handle interruptions, and secure the conversation from
uninvited agents. These scenarios require additional layers of management that were not addressed
in the initial framework. To address these complex interactions, this paper introduces several key
extensions to the conversation envelope framework, specifically designed to enhance the coordination
and management of multiparty conversations. The next sections will dive into these extensions,
including the introduction of a Floor Manager to regulate conversation flow, Multi-Conversant Support
to enable seamless collaboration among multiple agents, and mechanisms to handle interruptions and
manage unwanted agents, ensuring that all interactions remain orderly and productive. Note that these
extensions are backward-compatible with the basic conversation envelope messages[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] and it is not
necessary for systems to support them if the application doesn’t require multiple agents.
      </p>
      <sec id="sec-2-1">
        <title>2.1. The Floor Manager</title>
        <p>The Floor Manager is a conceptual hub within the multi-agent system that coordinates the flow of
conversation. It ensures orderly communication by regulating which agent has the conversational floor
at any given time. The Floor Manager processes requests from agents to take the floor and grants or
revokes these requests based on predefined rules and the current state of the conversation. The Floor
Manager also determines which agent will speak when multiple agents request to speak.
2.1.1. Benefits
• Orderly Management: By managing which agent has the floor, the system prevents multiple
agents from speaking simultaneously, ensuring a coherent conversation flow.
• Fair Distribution: The Floor Manager ensures that all agents have the opportunity to contribute
according to their roles and the context of the discussion.
• Automated Coordination: As a hub, the Floor Manager can prioritize floor requests based on
the conversation’s needs and predefined rules.
2.1.2. Examples of possible messages
• Floor Request: An agent submits a request to take the floor, specifying the reason and urgency
of their contribution.
• Floor Grant: The Floor Manager grants the floor to an agent, defining the extent and context for
their contribution.
• Floor Revoke: The Floor Manager revokes an agent’s floor privileges if the conversation’s rules
or the situation demands it.
• Add/remove from Queue: The Floor Manager decides the order in which the agents are
scheduled to speak (not shown).</p>
        <p>2.1.3. Floor Request Example
},
"conversation": {
2.1.4. Floor Grant Example
}
]
}</p>
        <p>}
2.1.5. Floor Revoke Example
"to":"https://agentFloorRevoked.com",
"eventType":"floor_revoke",
"parameters": {</p>
        <p>"reason":"exceeded_time_limit"
2.2.1. Benefits</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Multi-Conversant Support</title>
        <p>This extension enables multiple agents to participate in a conversation, supporting complex discussions
where various perspectives need to be considered. The conversation envelope is designed to manage
contributions from multiple agents simultaneously.</p>
        <p>• Enhanced Collaboration: Facilitates complex interactions where multiple agents need to
contribute simultaneously.
• Scalability: Eficiently manages conversations with a large number of participants.
• Context Management: Ensures that the conversation stays on track, with each agent’s
contributions appropriately contextualized.
2.2.2. Multi-Conversant Message Example
{
"to": [
"https://agentMultiConversant2.com",
"https://agentMultiConversant3.com"</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Convener Agent and Invitation Mechanism</title>
        <p>
          In the context of multi-agent conversations, a "convener" agent is introduced. The convener is
responsible for initiating and managing the participation of other agents in the conversation. The convener
sends individual "invite" messages to each participating agent. This approach ensures clarity and retains
compatibility with the existing OVON "invite" message structures[
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]. By avoiding a broadcast invitation,
we reduce the number of events that must be handled intelligently, and maintain compatibility with the
existing protocol.
        </p>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. Interruptions and Uninvited Agents</title>
        <p>Managing interruptions and uninvited agents is crucial in dynamic multi-agent environments. The
conversation envelope supports controlled interruptions and prevents unauthorized agents from disrupting
the conversation.
2.4.1. Benefits
• Controlled Interruptions: Enables essential interjections without disrupting the conversation.
• Security: EProtects the conversation from uninvited or unauthorized agents.</p>
        <p>• Focus Maintenance: Helps maintain the integrity and focus of the discussion.
2.4.2. Uninvited/Unhelpful Conversant Example
{
"to": "https://currentSpeakerAgent.com",
"eventType": "utterance",
2.4.3. Uninvited Agent Rejection Example
The convener can also prevent an agent from contributing directly to the conversation by using a mute
message event (not shown). Any agent can send an utterance. The convener determines if it is allowed
to be "spoken" or not. A muted agent will continue to receive utterances and other events that are
intended for it. The mute message informs the agent that any utterances that they send will not be
delivered. All other events, such as whispers and requests to take the floor, will still be delivered. Even
if the agent has been muted, the convener can still see the messages it sends and decide to "unmute" it:
this puts the onus on the convener and keeps the standard simple.</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Messaging in Multi-Agent Conversations</title>
        <p>To streamline communication, we propose a special Unified Messaging behaviour in case no recipient
is specified: all utterances in a multi-party conversation are disseminated to all participating agents
if there is no recipient specified. This method also allows the convener to present issues to all of the
conversants. In addition it allows all agents to identify others in the conversation.</p>
      </sec>
      <sec id="sec-2-6">
        <title>2.6. Private conversations</title>
        <p>The standard should allow for sub-conversations among agents without requiring any additional events:
agents within the general conversation can initiate private dialogues with other agents, regardless of
whether those agents are part of the general conversation. Private conversations among agents are not
perceptible to the user. These interactions remain opaque to the convener, preserving confidentiality
and promoting autonomous communication.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation and Results</title>
      <p>In implementing the proposed extensions, the JSON message envelopes provided in this paper, such as
those used for the Floor Manager, the Convener, Multi-Conversant Support, and new event categories,
serve as draft illustrations2.</p>
      <p>
        Let’s refer to the use case already described in the previous paper[
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. In the first scenario, Emmett, a
human, seeks assistance from Cassandra, his general AI assistant, to manage and streamline his possible
errands eficiently. The AI assistants at various service points - Pat at Blooming Town Florist, Andrew
at the Post Ofice, Charles at the hardware store, and Sukanya the Host at Thai Palace - facilitate the
transactions. Emmett, a human, has the following goals:
• Order some flowers for his wife’s birthday.
• Check on the repair of the chainsaw he left at the hardware store.
• Order some carryout Thai food for lunch.
      </p>
      <p>• Find the cost of mailing a 2 pound package to California.</p>
      <p>Characters
• Emmett: The Human
• Cassandra: Emmett’s general AI assistant
• Pat: AI Assistant for his local florist
• Andrew: AI Assistant at the post ofice
• Charles: AI Assistant at Emmett’s local hardware store
• Sukanya: AI Assistant at the local restaurant,"Thai Palace"
AI LLM Technologies
• Cassandra: based on GPT-3.5 Turbo
• Pat: llama2
• Andrew: rule-based application (no LLM involved)
• Charles: Claude.ai 3.5 Sonnet
• Sukanya: GPT-4o
2These examples are intended to demonstrate the conceptual implementation of the proposed extensions within the
MultiAgent Interoperability framework. However, these drafts should not be considered as final or oficial specifications. Further
analysis, discussion, and refinement are required to develop these into robust, standardized specifications that can be
universally adopted. This work is an ongoing process, involving input from the broader AI and interoperability communities
to ensure the specifications meet the necessary technical, operational, and security requirements.</p>
      <p>To illustrate an AI Multiparty Conversation, let us consider a new scenario where the Floor Manager
helps Cassandra manage more sophisticated multi-agent situations in the conversations. In this example
the assistantBrowser is the convener agent. For example, the Florist (Pat) adds a new agent specialized in
Credit Card Transactions (Hermes) into the conversation. Hermes requests a two-factor authentication
(2FA) confirmation from Emmett before proceeding with the credit card charge (via OTP, One Time
Password).</p>
      <p>
        Once Emmett provides the confirmation, Hermes receives it, and Pat can confirm the order to Emmett.
This would be the dialogue example described in the initial Arxiv paper [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ], properly modified to
manage the previously mentioned AI multiparty conversation.
      </p>
    </sec>
    <sec id="sec-4">
      <title>Conversations</title>
      <p>Emmett: Hi Cassandra.</p>
      <p>Cassandra: Hi Emmett! How can I assist you today?
Emmett: I need to order some flowers for my wife’s birthday.</p>
      <p>Cassandra: Sure thing, Emmett! I’ll connect you with the local florist shop.
[invites Blooming Town Florist to this conversation]
Pat: Hi Emmett!
I’m Pat, your friendly florist.</p>
      <p>How can I help you with your flower selection today?
Are you looking for something specific or need suggestions?
Emmett: Do you have any red Proteas?
Pat: Hi Emmett! Yes, we do have red Proteas.</p>
      <p>They’re stunning and make a bold statement.</p>
      <p>Would you like to include them in your arrangement?
Emmett: Yes and add some eucalyptus in a clear vase, please.</p>
      <p>Pat: Great choice, Emmett! Shall I use the credit card on file for this order?
Emmett: Yes please, use the card on file.</p>
      <p>Pat: OK, Let me please include Hermes the AI assistant who is going to help us
in performing your card secure transaction.
[Pat acts as convener to invite Hermes]
Hermes: Hi Emmett, this is Hermes. I will help you to perform your credit card
transaction safety. I have sent a six digit code in a text message to the phone
linked to your credit card. It will be valid for 3 minutes.</p>
      <p>When you are ready tell me the number please.</p>
      <p>Emmett: Okay the number is 782391.
[transaction proceeding and confirmed]
Hermes: Thanks. Goodbye.
[conversation is sent back to Pat by Hermes]
Pat: Thanks for your payment. Great choice, Emmett!
Your red Proteas with eucalyptus
in a clear vase will be sent to your home.</p>
      <p>Thanks for your order! Have a blooming day!
[Pat returns a "bye" event and the conversation turns back to Cassandra]
Cassandra: Hi Emmett! How can I assist you today?
. . .</p>
      <p>[Conversations continue back and forth with the other AI agents involved]
...</p>
      <p>Emmett: That’s all I needed. Have a good day.</p>
      <p>Cassandra: Thank you, Emmett! Have a wonderful day!</p>
    </sec>
    <sec id="sec-5">
      <title>OVON sequence diagram</title>
      <p>Figure 2 depicts part of the sequence diagram available on the open sandbox playground, with a focus on
the convener invitation procedure. Figure 3 in appendix A (Sequence Diagrams) shows the full sequence
diagram. The multiparty extension to the AI conversation framework introduces significant scalability
by enabling multiple specialized AI agents to collaborate through natural language interactions. In the
example scenario, agents like Pat (the florist) and Hermes (the payment assistant) seamlessly interact
using simple, human-readable communication, with the Floor Manager ensuring orderly conversation
lfow. This allows for a more intuitive and accessible interaction environment for users while the agents
handle complex tasks behind the scenes. One of the most valuable benefits of this architecture is that
each AI agent can be based on completely diferent AI technologies (i.e., diferent LLMs and serving
logic). Furthermore, each AI agent can focus on its specific area of expertise while remaining aware of
the broader conversational context. For instance, Pat manages the floral arrangement, while Hermes
handles secure payment, both through natural language.</p>
      <p>By enabling agents to understand the ongoing tasks of other agents through these natural language
exchanges, they can make smarter, informed suggestions or perform additional complex actions that
combine information from various sources. Using natural language-based API not only simplifies user
interactions but also streamlines communication between AI agents.</p>
    </sec>
    <sec id="sec-6">
      <title>4. Future Directions and Potential Improvements</title>
      <p>While the extensions introduced in this paper significantly enhance the Multi-Agent Interoperability
framework, there are several areas where further improvements can be made to advance the capabilities
and scalability of AI-driven multiparty conversations.</p>
      <sec id="sec-6-1">
        <title>4.1. Enhanced Context Management</title>
        <p>As the number of agents and the complexity of conversations increase, maintaining a coherent context
across multiple agents becomes increasingly challenging. Future work could focus on developing more
sophisticated mechanisms for context management, enabling agents to better understand and track
the nuances of ongoing discussions, especially in long-running or highly dynamic conversations. This
could involve integrating advanced context-awareness specifications that allow agents to retain and
reference past interactions more efectively.</p>
      </sec>
      <sec id="sec-6-2">
        <title>4.2. Improved Security and Privacy Protocols</title>
        <p>As AI-driven conversations become more prevalent, ensuring the security and privacy of the interactions
becomes increasingly important. Future work could involve enhancing the specifications to facilitate the
framework’s security protocols to better protect against unauthorized access and ensure that sensitive
information is handled appropriately. This could include implementing advanced encryption methods,
robust authentication processes, and more sophisticated mechanisms for managing uninvited agents.</p>
      </sec>
      <sec id="sec-6-3">
        <title>4.3. Observability</title>
        <p>
          Another crucial area for future improvement is enhancing the observability of multi-agent interactions.
As AI-driven conversations grow in complexity, the ability to perform comprehensive log retrievals,
generate summaries, provide detailed reports, and debug issues becomes increasingly important. Future
enhancements to the specifications could include robust observability features that allow for real-time
monitoring and control of multi-agent conversations. This would enable developers and operators to
gain deeper insights into the behavior of the agents, troubleshoot issues more efectively, and ensure that
the system operates within expected parameters. Enhancing observability is also vital for addressing
the explainability and transparency of Conversational AI models, which are increasing both in numbers
and in dificulty to distinguish between human and artificial agents, as discussed in [
          <xref ref-type="bibr" rid="ref14">14</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>5. Conclusion</title>
      <p>This paper introduces novel critical extensions to the Multi-Agent Interoperability framework,
addressing the challenges posed by multiparty conversations. This collaborative framework, powered
by natural language via standard NLP-based APIs, allows agents to work together eficiently without
requiring specialized protocols or technical interfaces. Ultimately, this extension significantly improves
scalability and eficiency, ensuring faster decision-making and task execution. The ability for AI agents
to communicate through natural language makes the system more flexible and accessible, allowing for
advanced, dynamic collaboration that can meet increasingly sophisticated user needs and interactions.
The Floor Manager, functioning as a coordinating hub, alongside Multi-Conversant Support and
mechanisms for managing Interruptions and Uninvited Agents, significantly enhances the framework’s ability
to manage complex, dynamic environments such as AI conferences. The introduction of a convener
agent, individual invitation mechanisms, inclusive messaging protocols, and new event categories
provides a structured yet flexible approach to multi-agent interactions. These extensions ensure that
AI agents can collaborate more efectively, maintaining order and focus in multiparty interactions.
While these advancements provide substantial improvements to the current framework, there remains
significant potential for further development. To further enhance multiparty interactions, future work
should concentrate on advancing context management and improving security and privacy protocols.
Enhancing these areas will ensure better handling of complex conversations and safeguard sensitive
information, respectively. Additionally, refining observability will be essential for monitoring and
controlling the increasing complexity of these systems. By addressing these areas, future developments
can continue to push the boundaries of AI-driven communication, ensuring that the Multi-Agent
Interoperability framework remains at the forefront of AI technology, capable of scaling and adapting
to the evolving needs of AI ecosystems.</p>
    </sec>
    <sec id="sec-8">
      <title>Acknowledgments</title>
      <p>
        We express our sincere appreciation to the Open Voice interoperability[
        <xref ref-type="bibr" rid="ref15">15</xref>
        ] Team (Linux Foundation AI
&amp; Data Foundation) for their invaluable contributions and support in developing the Interoperable
Standards, particularly to Jon Stine, Jim Larson, Leah Barnes, and Allan Wylie. Their expertise, suggestions,
and resources have been pivotal in shaping a model that is both ethically grounded and practically
efective in real-world applications.
      </p>
      <p>
        Sequence Diagrams can be generated by running the Sandbox environment available in this
repository[
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>G.</given-names>
            <surname>Bonetta</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. D.</given-names>
            <surname>Hromei</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Siciliani</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. A.</given-names>
            <surname>Stranisci</surname>
          </string-name>
          , Preface to the
          <source>Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI)</source>
          ,
          <source>in: Proceedings of the Eighth Workshop on Natural Language for Artificial Intelligence (NL4AI</source>
          <year>2024</year>
          )
          <article-title>co-located with 23th International Conference of the Italian Association for Artificial Intelligence (AI*IA</article-title>
          <year>2024</year>
          ),
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Dahl</surname>
          </string-name>
          , E. Coin,
          <article-title>Conversational ai multi-agent interoperability, universal open apis for agentic natural language multimodal communications</article-title>
          ,
          <year>2024</year>
          . URL: https://arxiv.org/abs/2407. 19438. arXiv:
          <volume>2407</volume>
          .
          <fpage>19438</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>M. B.</surname>
          </string-name>
          et al.,
          <article-title>Multimodal architecture and interfaces, w3c recommendation</article-title>
          , https://www.w3.org/ TR/mmi-arch/,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>DARPA</given-names>
            ,
            <surname>Galaxy</surname>
          </string-name>
          <string-name>
            <surname>communicator</surname>
          </string-name>
          , https://communicator.sourceforge.io/,
          <year>2003</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>A.</given-names>
            <surname>Authors</surname>
          </string-name>
          ,
          <string-name>
            <surname>Autogen.</surname>
          </string-name>
          <article-title>an open-source programming framework for agentic ai</article-title>
          , https://microsoft. github.io/autogen/,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <article-title>[6] OpenDevin, Opendevin, an autonomous ai software engineer</article-title>
          , https://docs.all-hands.dev/modules/ usage/intro,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>M. O.</surname>
          </string-name>
          et al.,
          <article-title>Voice extensible markup language (voicexml), w3c recommendation</article-title>
          , https://www.w3. org/TR/voicexml21/,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>A.</given-names>
            <surname>Cheyer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Martin</surname>
          </string-name>
          ,
          <article-title>The open agent architecture</article-title>
          ,
          <source>Autonomous Agents and Multi-Agent Systems</source>
          <volume>4</volume>
          (
          <year>2001</year>
          )
          <fpage>143</fpage>
          -
          <lpage>148</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>Attwater</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wylie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosmar</surname>
          </string-name>
          , Open voice interoperability specifications, https://github.com/open-voice-interoperability/docs/tree/main/specifications,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>M.</given-names>
            <surname>Schlosser</surname>
          </string-name>
          ,
          <article-title>Agency definition, stanford encyclopedia of philosophy archive</article-title>
          , https://plato.stanford. edu/archives/fall2015/entries/agency/,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>D. C.</given-names>
            <surname>Engelmann</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. R.</given-names>
            <surname>Panisson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vieira</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. F.</given-names>
            <surname>Hübner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Mascardi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Bordini</surname>
          </string-name>
          ,
          <article-title>Maids - a framework for the development of multi-agent intentional dialogue systems</article-title>
          ,
          <source>in: Proceedings of the 2023 International Conference on Autonomous Agents and Multiagent Systems</source>
          , AAMAS '23,
          <string-name>
            <surname>International</surname>
            <given-names>Foundation</given-names>
          </string-name>
          <source>for Autonomous Agents and Multiagent Systems</source>
          , Richland,
          <string-name>
            <surname>SC</surname>
          </string-name>
          ,
          <year>2023</year>
          , p.
          <fpage>1209</fpage>
          -
          <lpage>1217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosmar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Peretto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Coleman</surname>
          </string-name>
          ,
          <article-title>Insight ai risk detection model - vulnerable people emotional situation support</article-title>
          ,
          <source>in: Proceedings of the 28th International Conference on Evaluation and Assessment in Software Engineering</source>
          , EASE '24,
          <string-name>
            <surname>Association</surname>
          </string-name>
          for Computing Machinery, New York, NY, USA,
          <year>2024</year>
          , p.
          <fpage>437</fpage>
          -
          <lpage>441</lpage>
          . URL: https://doi.org/10.1145/3661167.3661235. doi:
          <volume>10</volume>
          .1145/3661167. 3661235.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>T.</given-names>
            <surname>Harris</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Banerjee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Rudnicky</surname>
          </string-name>
          ,
          <article-title>Heterogeneous multi-robot dialogues for search tasks (</article-title>
          <year>2005</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosmar</surname>
          </string-name>
          ,
          <article-title>Conversational hyperconvergence: an onlife evolution model for conversational ai agency</article-title>
          , https://doi.org/10.1007/s43681-024
          <source>-00463-0</source>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>O. V.</surname>
          </string-name>
          <article-title>I. initiative, Introducing the interoperability initiative of the open voice network</article-title>
          , https: //openvoicenetwork.org/interoperability-initiative,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>D.</given-names>
            <surname>Attwater</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Coin</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Dahl</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Barnes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Wylie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Gosmar</surname>
          </string-name>
          , Open voice sandbox repository, https://github.com/open-voice
          <article-title>-interoperability/open-voice-</article-title>
          <string-name>
            <surname>sandbox</surname>
          </string-name>
          ,
          <year>2024</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>