<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>September</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Toward a knowledge management method for training customer support AI agents</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Edgars Dzenuska</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Peteris Rudzajs</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Visma Labs SIA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Latvia</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Pearl Latvija SIA</string-name>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Latvia</string-name>
        </contrib>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <volume>1</volume>
      <fpage>7</fpage>
      <lpage>19</lpage>
      <abstract>
        <p>This paper summarizes preliminary findings on a knowledge management method to train generative AI agents for customer support in software companies. Despite advances allowing AI deployment with minimal technical skills, companies struggle with documenting and maintaining suitable knowledge bases. A survey of 20 software firms found that 75% face challenges in training AI with domain-specific knowledge. Through literature review and industry analysis, this research develops guidelines for creating and managing knowledge articles as training data. We report initial results from a ten‑article pilot in a European software company, where our method improved answer quality, as measured by BERTScore F1, Cosine similarity and human‑rated correctness of answers.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;knowledge management</kwd>
        <kwd>generative AI</kwd>
        <kwd>customer support</kwd>
        <kwd>retrieval augmented generation 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        As users of software, we expect the supplier – a software company - to provide a fast and competent
customer support. Depending on the company, the customer support team offers troubleshooting,
answering queries, and guidance on using the software. Thus, the support we receive can have
profound impact on our experience, business results or personal wellbeing. Providing this essential
service creates substantial cost and operational challenges for the companies. Generative AI models
present an opportunity to lower the costs and increase efficiency. Currently there is a wide choice
of no-code and low-code solutions in the market, enabling relatively easy implementation and use
of generative AI components. According to estimate of Boston Consulting Group, “the technology,
once implemented at scale, could increase productivity by 30% to 50%”. [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] However, generative AI
models don't come with knowledge about the specific software or services, and need to be trained to
understand the particular business domain. We surveyed customer support leaders in 20 software
companies that have either completed or are currently implementing AI agents in their support
organisations. On the question “What challenges did you face during implementation?”, 75% of leaders
responded “Training the generative AI agent with knowledge about our software and services”, 50%
indicated that insufficient quality of the AI agent's responses delayed or complicated the
implementation. Most common issues were partial responses (90%) and misleading answers or
“hallucinations” (80%). Most rewrote (90%) or re-structured (80%) the knowledge articles to fix the
response quality issues.
      </p>
      <p>The current scientific literature lists specific methods, requirements and challenges related to
training generative AI agents. These range from building a custom GPT model or fine-tuning an
existing model (requiring specific training data and financial investments) to using retrieval
augmented generation (requiring the right content at the right time and quality). However, the
current scientific literature does not provide enough detail about knowledge management, necessary
to create a knowledge corpus for training generative AI agents, enabling them to provide correct and
useful answers to customer support queries. Also, it does not address additional challenges of
knowledge management in software companies, such as frequent knowledge changes due to agile
software development life cycle. Based on these findings, this paper aims to design a new method
for managing knowledge. The proposed method offers a practical way for implementing generative
AI agents in customer support with a higher probability of success – in particular customer
satisfaction and organisation efficiency. The authors carried out an initial experimental test of the
method by comparing the quality of answers generated by a generative AI agent before and after
applying the method in a software company.</p>
      <p>Section 2 examines the main use cases and technologies of generative AI agents in customer
support and the methods of training these agents on domain knowledge. Section 3 outlines the work
related to knowledge management methods relevant for training generative AI agents. In Section 4,
a new method is proposed for capturing the needed knowledge to ensure high quality of the
generated responses. Section 5 describes the initial results of the experimental test, with the
conclusions and areas for further research outlined in Section 6. Appendix contains the
requirements for writing knowledge articles.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Background</title>
      <p>
        AI agents can assist companies with customer service in several ways. In this paper, the focus is on
the tasks of 1st Level support - the team at the forefront of receiving customer support requests. In
addition to registering, classifying and handling incidents, 1st level support also processes service
requests and keeps users informed about incidents' status. Service requests in most cases are minor
(standard) changes (e.g. requests to change a password) or requests for information. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] Generative
AI agents can add significant value, especially when responding to repeat and low-complexity
requests for information (prompts), received as non-formal and freely structured textual input that
instructs the model (used by the AI agent) to provide an answer. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ]
      </p>
      <p>
        Based on the conducted literature research, the three most often mentioned types of AI agents for
customer support are text chatbots, voice assistants, and recommenders of solutions [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]. These can
be implemented via no- or low-code platforms leveraging pre-trained large language models (LLMs)
for natural language processing (NLP). However, to respond effectively to customer queries, these
models require domain-specific training. Companies can customize models by building
agentspecific layers on pre-trained LLMs, such as training on business data to generate relevant responses
[
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]. Without proper training, AI agents may misinterpret queries or hallucinate. Two main
methods to adapt LLMs to specific domains are fine-tuning [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and Retrieval Augmented Generation
(RAG) models [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ].
      </p>
      <p>
        RAG offers advantages over fine-tuning, including lower implementation effort, widespread use
in commercial solutions, and the ability to cite specific documents in responses. Challenges include
document retrieval accuracy and quality [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], but RAG can produce more factual and diverse
responses, reducing hallucinations [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ],[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Therefore, RAG is recommended for integrating domain
knowledge via vector databases containing embeddings of company knowledge sources, which must
be accurate, current, and well-structured for effective response generation. Proper knowledge
management is essential for high-quality AI responses, as discussed in the following section.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Related works</title>
      <p>
        Lineberry 2019 [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] discusses the Knowledge Centered Service (KCS) which distinguishes two loops
in the organization’s knowledge base: Solve and Evolve. The article stipulates that an indication of a
“healthy” knowledge content is its usage, however, does not provide detailed instructions on how to
create the knowledge content, nor on how to use it for training generative AI agents.
      </p>
      <p>
        Lou et al. 2021 [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] mention several methods of managing knowledge (referring to chatbots in
the specific article), indicating that corpus-based chatbots are best for applications that need large
knowledge bases – however not addressing the specificity of the modern generative AI agents.
      </p>
      <p>
        Referring to deployment of AI agents using a RAG-based system, O’Leary 2024 [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] suggests that
experts of a company (knowledge management resources) choose information that should be used
to train the model. The author provides an example of PWC knowledge base, without providing
instructions on how to capture knowledge to increase probability of a successful application with
RAG-based systems.
      </p>
      <p>
        Ngai et al. 2021 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] discuss integrating chatbots with a knowledge base to allow them to search
it and use the data to generate a personalized response. The authors propose a knowledge base design
framework for customer knowledge management strategy and practices of a company. The authors
note that the sources that they reviewed do not investigate the design of the knowledge base
sufficiently and don’t substantiate it with theoretical basis. The challenge of continuously updating
the knowledge base is also not addressed sufficiently.
      </p>
      <p>
        According to Wilde 2011 [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ], three types of knowledge can be distinguished: 1) knowledge about
the customer, 2) knowledge from the customer, 3) knowledge for the customer. According to Ngai et
al. 2021 [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ], each of these types can be further divided into unverified reference information (such
as information from external sources) and confirmed knowledge (verified by an expert in the
company). The article does not delve into specifics about how the knowledge of these types should
be captured to make sure they are suitable for training generative AI agents.
      </p>
      <p>Dagkoulis et al. 2022 [15] discuss the implementation of a chatbot using Chatbot Development
Platforms (CDP’s), their architecture containing a search knowledge service which retrieves info
from documents, web content and other knowledge management tools. The article does not provide
any requirements towards the structure or quality of the information that the chatbots would use.</p>
      <p>
        Guimaraes et al 2024 [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] recognize several limitations of the current LLMs (such as difficulties
with mathematical reasoning and hallucinations) and urge to discuss new methods for the
integration of commonsense inference in LLMs that go beyond just increase of the number of
parameters and the training data. Whilst the article emphasizes the importance of these new methods
to develop intelligent systems based on pre-trained language models, it does not provide more
specific suggestions.
      </p>
      <p>O’Leary 2023 [16] recognizes that LLMs have knowledge gaps which the LLMs themselves don’t
know about, and that LLMs can’t access enterprise knowledge management systems and internal
knowledge, with no solutions offered.</p>
      <p>Suppliers of generative AI solutions focus on making their solutions easy to implement,
userfriendly, and compatible with text-based sources of various formats and languages, as well as
recommending what content to select for training, however with the prerequisite that a suitable
knowledge corpus with the domain knowledge is available.</p>
      <p>To summarize, whilst the scientific literature points out several factors to be mindful of when
managing the knowledge and training the AI agent (such as impact of knowledge source distribution
and knowledge transmission across the organization, AI agent integration with knowledge base, and
others), the described methods don't provide enough specificity and instructions for companies to
implement a solid business process for the given use case.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Proposed knowledge management method</title>
      <p>
        The proposed method consists of requirements and guidelines for creating a knowledge base that is
suitable for training of RAG-based AI agents in customer support. The proposal is based on the
insights gained from the scientific literature (such as [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] and [17]) as well as best practices within
the software industry (such as instructions from Salesforce [18], Zendesk [19] or Intercom [20] –
vendors of customer support platforms).
      </p>
      <sec id="sec-4-1">
        <title>The two key components of the method are:</title>
        <p>1. Guidelines for identifying relevant knowledge in the company.</p>
        <p>2. Requirements for how the knowledge should be captured and stored.</p>
        <p>Figure 1 shows the key contribution of the method - creating or improving the external knowledge
base used by RAG-based AI agents. A detailed process of how a knowledge base is created in a given
software company is not outlined since knowledge management systems, roles, and AI agents differ
significantly across companies, and it is not within the scope of this article to detail the knowledge
management process.</p>
        <sec id="sec-4-1-1">
          <title>Identifying relevant knowledge in the company</title>
          <p>The company increases probability of a successful knowledge management process by ensuring the
following elements are in place:
•
•
•</p>
          <p>
            A centralized knowledge management role or team to coordinate the process. This role or
team defines the requirements and standards for the process, provides guidance within the
organization, and oversees that the process runs smoothly. It is common for the role or team
to be part of the customer support function which keeps the accountability where the most
value from the process is perceived. As per Wilde 2011 [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ], the role or team needs to have
executive sponsorship to mobilize the stakeholders who are essential for the process to work
but may not be part of the customer support function.
          </p>
          <p>Identify questions that customers ask frequently. A useful heuristics to perform this analysis
is to 1) identify frequent questions in the historical support, whereby mostly it is sufficient
to look at the requests received within the last 12 months which would account for
seasonality of requests based on the business cycle, as well as the recent software releases; 2)
identify knowledge needs customers mention in their feedback (such as Net Promoter Score
(NPS) and Customer Effort Score (CES) surveys within the software, logs of phone
conversations and emails as well as customer satisfaction (CSAT) surveys). Examples of such
questions are “How do I import data into product x from product y?”, “How can I reset my
password?”, “Where can I see my payslip?”, “How can I create a new report?”, and so on.
Identify information needed to continuously answer the frequently asked questions (FAQ).
The objective is to understand the triggers of the FAQs that arise from the changes in the
business in order to identify and capture knowledge that answers the potential questions
arising from these changes as soon as they occur: 1) software changes that impact the
customer workflow; 2) disruptions of IT services that drive incident requests from customers,
such as an unplanned interruption to an IT service or a reduction of its quality; 3) business
model changes, such as the product offering, pricing, terms and conditions; 4) changes in
professional services (consultancy, support, training, etc.) and their delivery; 5)
nonobservance of specific software usage best practice guidelines, such as a need to follow
specific government regulations; 6) non-observance of customer-specifics, such as if software
or services for a specific customer (type) require a different workflow.
•
•</p>
          <p>Identify knowledge possessors. Establish who in the company (functions or individual roles)
possesses the knowledge identified as needed to answer FAQs. The possessor ideally is the
person(s) who will be the first to notice the changes.</p>
          <p>Identify who captures the knowledge. Persons who have experience of working in customer
support related roles have a better understanding of how to write the content so that majority
of the customers would be able to understand it, given the different levels of customer
technical competence and experience. Examples of such roles are software implementation
consultants or support agents who work with customers daily, if they have the necessary
level of skills in technical writing.
4.2.</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Capturing the knowledge</title>
          <p>
            The main output of the proposed knowledge management process is a knowledge article - a
document that contains a solution to a particular problem [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ] of the user of the software product,
or any of its products or services. A key objective of the method is to ensure a knowledge article is
created and maintained as the “one source of truth” for answering a particular support query,
regardless of who uses it or through what channel. The same knowledge article should be used by
the support agents as a reference material, by the customers in an online knowledge base, and for
retrieval by the AI agent. With this approach, the company minimizes the risk that customers get
conflicting messages, as well as time and resources spent on capturing the knowledge.
          </p>
          <p>The authors defined 30 requirements that provide specific instructions on how to create a
knowledge base, consisting of knowledge articles, that would achieve the above objective. The
requirements are split into 4 distinct categories: 1) Authoring &amp; Content, 2) Metadata, 3) Structure
&amp; Formatting, 4) Storage &amp; Maintenance. For detailed requirements see A. Appendix. Following the
defined requirements allows the company to also engineer very effective prompts for the generative
AI agent. For example, the AI agent may be enabled to retrieve the most up to date information for
the right geography, actuality and product domain by referencing the metadata parameters
(language, date and domain keywords). A company should evaluate if it needs to include any
additional parameters in the knowledge article according to its business domain (such as identifiers
of roles on the customer side that the article is meant for, of the country or region, and so forth).
4.3.</p>
        </sec>
        <sec id="sec-4-1-3">
          <title>Evaluating quality and updating existing knowledge articles</title>
          <p>With some exceptions, companies already have domain knowledge documented in some shape or
form that can speed up creation of the knowledge articles for the generative AI agent. LLM platforms
(such as OpenAI ChatGPT 4o1 or Anthropic Claude 3.5 Sonnet) can evaluate the conformity of the
existing knowledge articles with the requirements, based on a prompt that contains the requirements
included in the A. Appendix that refer to the Authoring &amp; Content, Metadata, and Structure &amp;
Formatting, and not its environment or other specifics that the LLM can’t infer (such as if the article
has been updated). LLM platforms are also able to generate a new knowledge article based on the
foundation of the existing one and the requirements. However, as of the time of writing such an
approach is risky, especially if the existing knowledge article contains pictures or videos which the
LLM may not interpret accurately, and if the information it contains is incomplete. Considering the
risks, the article must undergo scrutiny of a qualified human before exposing to the customers and/or
an AI agent.
4.4.</p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Information systems for storing the knowledge articles</title>
          <p>As stated by Hosseingholizadeh in 2014 [21], it’s the stage of storage, embodiment and updating of
acquired or created knowledge in organization memory.</p>
          <p>Creating a record of customer support requests in a customer service platform is a common
practice to keep track of service requests, product incidents, their related problems, as well as to
maintain and improve the quality of service delivery. Thus, it is likely that a company that wants to
implement a customer facing AI chatbot or a recommender of solutions based on incoming customer
support requests, uses a customer service platform (such as Zendesk, Salesforce Service Cloud, and
others [22]). Therefore, to ensure that the format of the knowledge article is suitable for uploading
into customer service platforms (RQ15 in A. Appendix), the company should first explore the
viability of using their existing customer service platform for authoring, storing and maintaining of
the knowledge articles. An added benefit of this approach is maintaining the existing IT systems
landscape (and potentially not increasing operating costs). Despite this obvious benefit, the company
should evaluate the concrete customer service platform for complying with the requirements
outlined in the A. Appendix before deciding.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Experimental test</title>
      <p>The objective of the initial experimental test is to evaluate applicability of the proposed method. This
is achieved through verifying the quality of the responses generated by a generative AI agent
improves after adjusting the articles according to the requirements in A. Appendix. Details on the
implemented agent are provided in Section 5.1. The quality is measured using BERTscore [23] and
Cosine similarity [24] metrics, as well as human-evaluated correctness. The ongoing application of
the method in a company was not evaluated – how a company would continuously train AI agents
and update the knowledge articles, which is an area for further research. The test includes these
steps:
1. Select 10 knowledge articles in an actual software company that are currently used in
customer support, providing a manageable yet diverse sample representing various customer
support topics, and sufficient data to observe patterns and draw meaningful conclusions
while keeping the scope feasible for in-depth analysis.
2. Adjust these 10 knowledge articles following the requirements in categories “Authoring &amp;
Content” and “Metadata” in A. Appendix (carried out by the knowledge manager of the
company).
3. Define 2 questions that customers may ask about the content included in each of the current
knowledge articles (20 questions in total), and the expected, ideal answers on the 20 questions
(defined by the knowledge manager of the company). Expected answers provide a reference
point for evaluating the answers generated by the AI agent, using the evaluation methods
described in steps 5 and 6.
4. Generate answers with a generative AI agent, gaining 20 question and answer pairs with the
original knowledge articles, and 20 – with the adjusted knowledge articles.
5. Calculate BERTscore. It is a well-established method that computes token-level similarity
using contextual embeddings, providing an evaluation of semantic similarity in terms of
precision (candidate token match to the reference), recall (reference token match to the
candidate) and F1 score (combination of precision and recall). It provides a similarity score
of -1 to 1 for tokens in the reference sentence (expected answer) with tokens in the candidate
sentence (generated answer).
6. Calculate Cosine Similarity. While BERTScore is particularly effective at detecting
paraphrases, in customer support paraphrasing can lead to undesired results and
misunderstandings. Therefore, to more directly compare the reference text and the generated
response, similarity score for each text fragment will be calculated using the cosine similarity
evaluation method. The employed method uses embeddings model 'all-MiniLM- L6-v2' - a
sentence transformer model designed to generate dense vector representations (embeddings)
of sentences or short paragraphs (in our case the expected answer and the generated answer)
in high-dimensional space and measures cosine distance between them on a scale of -1
(perfect mismatch) to 1 (perfect match). Sudhi et al 2024 [25] inspired the use of this method.
7. Compare the responses generated using the current and adjusted articles to the expected
answers. For BERTScore, only F1 will be considered as it is a combination of recall and
precision, and provides sufficient information for the given purpose.
8. Manually evaluate the generated responses, following the approach of Afzal et al 2024 [26],
grading responses in four categories (Readability, Relevance,Truthfulness, Usability) on
Likert scale. [27]
9. Summarise the findings to evaluate the proposed knowledge management method.</p>
      <p>Figure 2 shows an example of a simple knowledge article, responding to a specific user’s question,
such as “How can I import data into Visionplanner platform from Visma.net platform?”
a)</p>
      <sec id="sec-5-1">
        <title>Technical setup of the generative AI agent</title>
        <p>To ensure identical conditions for evaluation of the results obtained with the original and the
improved knowledge articles, a generative AI agent was deployed using OpenAI Playground
“assistants=v2” (a cloud-based out-of-the-box solution for deployment of simple AI agents), “File
Search” feature (allowing to define the set of files that the AI agent uses for RAG) and the model
“gpt-4-turbo” [28]. After obtaining the generated answers from the AI agent, the answers were
compared with the expected answers, using an evaluation model built in Google Colab [29]. All the
knowledge articles used by the AI agent and the questions and answers were in Dutch language.
BERTscore evaluation used Python library “bert_score”, method “score”. Cosine Similarity
evaluation used Python library “sklearn.metrics.pairwise”, method “cosine_similarity”. Knowledge
articles were uploaded as HTML files and stored in the vector database, provided as part of the
OpenAI Playground.</p>
        <p>The generative AI agent was given free-text instructions for response generation, describing the
role of the AI agent (customer support assistant) and the context and geography of the audience,
enforcing usage of the header structure in the document and quoting the content, if possible,
dictating the expected layout of the response (step-by-step guides as lists), and limiting the length of
the answer. The exact same instructions and setup was used to generate the answers based on both
the original and adjusted knowledge articles. It was recognized that the free-text instructions can
have a significant impact on how the responses are generated, however comparison of the results is
possible due to a consistent test environment, even if the response generation could be altered or
improved, given different instructions.
5.2.</p>
      </sec>
      <sec id="sec-5-2">
        <title>Test results</title>
        <p>The quality of answers obtained from the knowledge articles before adjustments are shown in Figure
3. In the chart a), the bars F1-before and F1-after indicate the BERTscore F1 similarity before and
after the knowledge article was adjusted. Respectively, the b) chart indicates Cosine similarity of the
text fragments.</p>
        <p>a)</p>
        <p>b)</p>
        <p>The average BERTscore of the responses before adjustments is 0.7215, and the average Cosine
Similarity is 0.4232. Figure 4 summarizes the findings by showing the difference between the
BERTscore F1 and Cosine Similarity scores before and after the adjustments. A positive value
indicates quality improvement, and negative – vice-versa – indicates the responses after the
adjustment got worse.</p>
        <p>The manual evaluation of the answers before and after adjustments yielded the results shown in
Figure 5, whereby any values below or above 0 indicate a deterioration or improvement of the answer
in the given category. As is visible, overall, the quality improved, whereby i) readability increased
by 0.4 points, ii) relevance increased by 0.65 points, iii) truthfulness increased by 0.9 points, and iv)
usability increased by 1.25 points. These results validate the findings using BERTscore and Cosine
similarity metrics. For example, the answers to questions 1 and 3 emerge as being worse after the
adjustments in all 3 evaluation methods, whereas answers to questions 2 and 11 are significantly
better after the adjustments.</p>
        <p>As is visible from Figure 5, in 16 out of 20 questions, the adjustments of the knowledge articles
resulted in a higher similarity of the answer with what the knowledge manager was expecting. The
average BERTscore of the responses after adjustments is 0.7892 (+0.0677 or 9.38% improvement), and
the average Cosine Similarity is 0.6218 (+0.1986 or 46.93% improvement). Put in another way, Cosine
Similarity of the generated answers indicated that the quality had improved of the answers based on
8 out of 10 knowledge articles (80%), and 18 out of 20 questions (90%).</p>
        <p>It was observed that in specific cases (e.g., question/expected answer pair 20) BERTscore F1 value
is significantly higher than cosine similarity (0.7049 and 0.1215 respectively). After reviewing the
samples, the explanation is that BERTscore compares embeddings of chunks created from the text it
compares, and the similarity of individual words causes the score to be quite high in both answers.
However, Cosine Similarity evaluates the embeddings of the whole text fragment and more
accurately evaluates the similarity of the meaning of the generated answer to the expected one.</p>
        <p>The deterioration of both BERTscore F1 and Cosine Similarity for question-expected answer pairs
1 and 3 is notable. It is due to that the headers in the adjusted knowledge article did not contain the
information that would allow to identify the right instructions in the article, and the adjusted article
did not contain the information that was included in the expected (reference) answer. These edge
cases emphasise the importance of the description section (RQ3, see Appendix) and including the
terms that describe parts of the software solution or the company’s business model, relevant to the
solution.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Discussion and conclusion</title>
      <p>A new knowledge management method has been proposed in this paper that software companies
can use to create a knowledge base for customer support that is able to serve both the customers
directly as well as an external non-parametric memory for RAG-based generative AI agents. Initial
testing of the proposed method was performed using a generative AI agent built with OpenAI
Playground and evaluating correctness of 20 responses generated based on 10 knowledge articles
before and after the adjustments to match the requirements of the method.</p>
      <p>Based on the evaluation, the average BERTscore across the 20 questions of the generated answers
improved by 9.38%, and the average Cosine Similarity improved by 46.93%. The quality
improvements were observed for answers generated based on 8 of 10 knowledge articles (80%), and
18 of 20 questions (90%). The results indicate a strong positive impact of the proposed method on the
quality of responses generated. The human evaluation confirms the positive effect in terms of
readability, relevance, truthfulness and usability of the answers.</p>
      <p>The proposed method has a high practical potential as it can help companies across the world to
implement generative AI agents in customer support with a higher probability of success – in
particular customer satisfaction and organisation efficiency. In addition, it provides a theoretical
background for further research and development of the method to address specifics of other
industries, business models, and use cases (i.e., not only in customer support).</p>
      <p>The authors acknowledge that the method relies on a manual oversight for creating and updating
knowledge articles, however the rapid development of generative AI solutions allows using LLM to
rapidly evaluate existing articles against the requirements, and generate new, better knowledge
articles based on the most recent and accurate information sources for the given topic. This is a topic
for further research.</p>
      <p>It is acknowledged that the results can vary significantly between different companies that would
implement the proposed method. The characteristics of the software and services, sources of
knowledge in the company, skills of the technical writers, the configuration of the AI agent are but
a few aspects that can significantly influence the results of the proposed method in another company.
Conducted research and evaluation revealed further areas for research. Evaluation with a wider
dataset would add more objectivity and surface additional important factors that determine success
of the AI agent implementation. During the evaluation, it became obvious that the instructions sent
to the LLM within the prompt could significantly change the way responses were generated.
Researching, engineering and testing various prompts that leverage the knowledge article metadata
and parameters could yield new, creative methods of understanding the customer context and
maintaining a multilingual knowledge base with knowledge articles going back in time and
addressing multiple customer segments. Graph-based RAG systems emerge as a new way [30] of
capturing and retrieving knowledge for AI agents. Graphs can not only indicate sources of data,
information and knowledge, but also interconnect them in the way that allows an AI agent to retrieve
information with a better “understanding” of context. Exploration of this area and the
state-of-theart technology is recommended to even further improve the quality of AI agent responses.
Considering that companies often work across borders and serve customers in multiple languages,
it may be necessary to ensure that the terminology in the customer query is maintained correctly
when retrieving embeddings of knowledge articles from the vector database. Therefore further
research also could cover the creation of a terminology dictionary for a RAG-based AI agent to find
the right term given a specific language pair. The initial experimental test conducted by the authors
of this paper did not evaluate the ongoing application of the method in a company – specifically
continuous training of AI agents and updating the knowledge articles. This is a potential area for
further research, given the fact that knowledge articles and thereby the external non-parametric
memory of a RAG-based generative AI agent can get outdated quickly. For example, duplication and
versioning, or full replacement of outdated knowledge articles are two approaches that could be
researched for applicability and feasibility in a company environment.</p>
    </sec>
    <sec id="sec-7">
      <title>Declaration on Generative AI</title>
      <sec id="sec-7-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[15] I. Dagkoulis and L. Moussiades, “A comparative evaluation of chatbot development platforms,” in</p>
        <p>ACM International Conference Proceeding Series, 2022. doi: 10.1145/3575879.3576012.
[16] D. E. O’Leary, “Enterprise large language models: Knowledge characteristics, risks, and
organizational activities,” 2023. doi: 10.1002/isaf.1541.
[17] S. Wood and R. J. Howlett, “A web-based customer support knowledge base system,” in Lecture Notes
in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in
Bioinformatics), 2008. doi: 10.1007/978-3-540-85563-7_47.
[18] Salesforce: How you can write a good knowledge base article, 2025. URL:
https://www.salesforce.com/service/knowledge-base/article/.
[19] Zendesk: Getting started with self-service – Part 4: Writing your knowledge base articles, 2025. URL:
https://support.zendesk.com/hc/en-us/articles/4408887322522-Getting-started-with-self-service.
[20] Intercom: How to write great help articles, 2025. URL:
https://www.intercom.com/help/en/articles/56645-how-to-write-great-help-articles.
[21] R. Hosseingholizadeh, “Managing the knowledge lifecycle: An integrated knowledge management
process model,” in Proceedings of the 4th International Conference on Computer and Knowledge
Engineering, ICCKE 2014, 2014. doi: 10.1109/ICCKE.2014.6993467.
[22] Gartner, “CRM customer engagement center (CEC) reviews and ratings,” 2024, Accessed: Nov. 23,
2024. [Online]. Available:
https://www.gartner.com/reviews/market/crm-customer-engagementcenter.
[23] T. Zhang, V. Kishore, F. Wu, K. Q. Weinberger, and Y. Artzi, “BERTScore: Evaluating text generation
with BERT,” in 8th International Conference on Learning Representations, ICLR 2020, 2020.
[24] “Semantic similarity with sentence embeddings.” Accessed: Oct. 20, 2024. [Online]. Available:
https://fastdatascience.com/natural-language-processing/semantic-similarity-with-sentenceembeddings/.
[25] V. Sudhi, S. R. Bhat, M. Rudat, and R. Teucher, “RAG-Ex: A generic framework for explaining
retrieval augmented generation,” in Proceedings of the 47th International ACM SIGIR Conference on
Research and Development in Information Retrieval, New York, NY, USA: ACM, Jul. 2024, pp. 2776–
2780. doi: 10.1145/3626772.3657660.
[26] A. Afzal, A. Kowsik, R. Fani, and F. Matthes, “Towards optimizing and evaluating a retrieval
augmented QA chatbot using LLMs with human-in-the-loop,” in Proceedings of the Fifth Workshop
on Data Science with Human-in-the-Loop (DaSH 2024), Stroudsburg, PA, USA: Association for
Computational Linguistics, 2024, pp. 4–16. doi: 10.18653/v1/2024.dash-1.2.
[27] Encyclopedia Britannica, “Likert scale.” Accessed: Sep. 28, 2024. [Online]. Available:
https://www.britannica.com/topic/Likert-Scale.
[28] OpenAI Assistant Playground, 2024, Accessed: Nov. 23, 2024. [Online]. Available:
https://platform.openai.com/playground.
[29] Google Colab, Accessed: Oct. 20, 2024. [Online]. Available: https://colab.research.google.com/.
[30] Z. Xu et al., “Retrieval-augmented generation with knowledge graphs for customer service question
answering,” in Proceedings of the 47th International ACM SIGIR Conference on Research and
Development in Information Retrieval, New York, NY, USA: ACM, Jul. 2024, pp. 2905–2909. doi:
10.1145/3626772.3661370.</p>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>A. Appendix - Requirements for writing knowledge articles</title>
      <p>No. Requirement
RQ1 A single KA should describe a single problem, question, or use case, clearly reflected by
the title.</p>
      <p>For guidance related to software usage, use action words in the title, such as “How to...”,
“Using ...”, “Setting Up ...”, etc. For company-related information, use terms like "Pricing
RQ2
of...", "Customer support opening hours...", etc. A title of a KA should be unique across
the knowledge base.</p>
      <p>The KA should include a description section, clarifying its value proposition in a concise
RQ3 manner. For example, "Problem: Brief description of the problem to be solved and the
typical reasons why it occurs."</p>
      <p>The KA should use direct language and avoid ambiguity (e.g., instead of suggesting,
RQ4 "You may need to update your software," directly state, "Update your software to the
latest version for optimal performance.")
The content in the KAs should be self-contained and complete, using full sentences and
making each paragraph complete and easily understandable on its own. If the KA
RQ5
includes answers to multiple frequently asked questions, write the answers so that they
are self-contained, and avoid simple "Yes.", "No. " or links as an answer to the question.
RQ6 Ensure the KA is tailored to the competence of the user. Avoid words that users may not
be familiar with in the given context, or technical jargon.</p>
      <p>RQ7 The content in the KA should not repeat the same information or using more words
than necessary to convey the meaning.</p>
      <p>RQ8 The KA should contain the most up to date information about the topic.</p>
      <p>If the user needs to know information described in a separate KA to fully understand the
RQ9 solution, the article should include cross-reference as a link to the other article, and not
a copy of the information. Minimise number of such links and clearly explain their
inclusion (e.g., "In order to understand … , take a look at &lt;this article&gt;")</p>
      <p>If the KA contains pictures or videos to supplement the text, the visuals should have
RQ10
clear descriptions, compliant with accessibility standards.</p>
      <p>In case of a change in the given topic, the KA should be updated as soon as possible
RQ11 (ideally same business day) by overwriting the original content and updating the
metadata parameters "Updated:" and "Updated by:".</p>
      <p>The KA should contain keywords to recognise metadata parameters: "Language:",
"Created on: ", "Created by:", "Version:", "Updated on: ", "Updated by:", "Access:",
RQ12 "Domains". These parameters should be in the respective language of the KA, included in
normal text or small text at the beginning or end of the KA, depending on the authoring
platform.</p>
      <p>Include domain in the metadata section "Domains" to locate a KA describing a specific
RQ13 software product, feature, or service - the smallest item that can change based on the
company's business processes.</p>
      <p>RQ14 Use domain keywords in sections of the KA that refer to the particular domain,
especially within the relevant headings.</p>
      <p>No. Requirement</p>
      <p>The KA must be written in a flexible, structured format (e.g., Markdown or HTML) that
supports metadata, media embedding (pictures, videos, tables), and accessibility
standards, while also allowing for responsive design and SEO optimization. The format
RQ15 must be suitable for direct publication to web platforms, usage by AI platforms, or
uploading into customer service platforms (e.g., Zendesk, Salesforce) as a repository
without reformatting or change of structure
The KA should be structured in individual sections using multi-level headings to
indicate titles, subtitles, and subsections and separate them from the normal text. Title
H1 (or equivalent to # in Markdown). Second highest level heading - H2 (or equivalent
RQ16 to ## in Markdown), used for titles of the main sections of the KA, such as "Description",
"Troubleshooting", "Summary", a.o. Third highest level heading - H3 (equivalent to ###
in Markdown), used for subtitles within the main sections, such as "Step 1: Do this...",
"Keep in mind", and similar.</p>
      <p>RQ17 The KA should not contain duplicate section titles (headings).</p>
      <p>If the KA contains guidance that guides the user through sequential steps to achieve a
RQ18 result, format the specific steps as number or bullet lists.</p>
      <p>The KA should be readable by humans without a need of converting to another
RQ19 document or using additional tools, except a web browser.</p>
      <p>RQ20 The KA must be stored in a cloud storage solution, accessible remotely.</p>
      <p>The cloud storage must offer secure, real-time collaboration, scalable infrastructure, and
RQ21</p>
      <p>integration capabilities for future technologies.</p>
      <p>RQ22 The cloud storage must comply with security and privacy regulations.
RQ23 The cloud storage should load an article within 2 seconds.</p>
      <p>RQ24 The uptime of the cloud storage should be 99.9%.</p>
      <p>The cloud storage must provide data backup and disaster recovery capabilities in line
RQ25
with the defined recovery time and point objectives.</p>
      <p>The cloud storage should allow referencing the KA as a persistent URL in the metadata
RQ26
when used as a source by the generative AI agent.</p>
      <p>The cloud storage must provide analytics, allowing to see usage of KAs and user rating,
RQ27
based on the chosen user feedback method.</p>
      <p>The cloud storage must allow filtering the KAs by as many metadata parameters (RQ12)
RQ28
as possible, and as a minimum by domain keywords.</p>
      <p>The cloud storage must be compatible with version control systems and include access
RQ29
controls to support role-based visibility.</p>
      <p>The cloud storage must be compatible with import/export standards and REST API to
RQ30
ensure easy integration.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Clark</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Ramachandran</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sokolova</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Bamberger</surname>
          </string-name>
          , “
          <article-title>How generative AI is already transforming customer service</article-title>
          ,” https://www.bcg.com/publications/2023/how
          <article-title>-generative-ai-transformscustomer-service.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          <article-title>[2] “ITIL roles and responsibilities</article-title>
          ,”
          <year>2024</year>
          , Accessed: Oct.
          <volume>27</volume>
          ,
          <year>2024</year>
          . [Online]. Available: https://wiki.en.itprocessmaps.com/index.php/ITIL_Roles#ITIL_roles_and_boards_-_
          <string-name>
            <surname>Service</surname>
          </string-name>
          _Operation.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>N.</given-names>
            <surname>Guimarães</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Campos</surname>
          </string-name>
          ,
          <article-title>and</article-title>
          <string-name>
            <given-names>A.</given-names>
            <surname>Jorge</surname>
          </string-name>
          , “
          <article-title>Pre-trained language models: What do they know?,”</article-title>
          <source>Wiley Interdiscip Rev Data Min Knowl Discov</source>
          , vol.
          <volume>14</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2024</year>
          , doi: 10.1002/widm.1518.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>E. F.</given-names>
            <surname>Ohata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C. L. C.</given-names>
            <surname>Mattos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. L.</given-names>
            <surname>Gomes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E. D. S.</given-names>
            <surname>Reboucas</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. A. L.</given-names>
            <surname>Rego</surname>
          </string-name>
          , “
          <article-title>A text classification methodology to assist a large technical support system</article-title>
          ,
          <source>” IEEE Access</source>
          , vol.
          <volume>10</volume>
          ,
          <year>2022</year>
          , doi: 10.1109/ACCESS.
          <year>2022</year>
          .
          <volume>3213033</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Yun</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. E.</given-names>
            <surname>Sohn</surname>
          </string-name>
          , and
          <string-name>
            <given-names>S.</given-names>
            <surname>Kyeong</surname>
          </string-name>
          , “
          <article-title>Fine-tuning pretrained language models to enhance dialogue summarization in customer service centers,” in 4th ACM International Conference on AI in Finance</article-title>
          , New York, NY, USA: ACM, Nov.
          <year>2023</year>
          , pp.
          <fpage>365</fpage>
          -
          <lpage>373</lpage>
          . doi:
          <volume>10</volume>
          .1145/3604237.3626838.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>A.</given-names>
            <surname>Beheshti</surname>
          </string-name>
          et al., “ProcessGPT:
          <article-title>Transforming business process management with generative artificial intelligence</article-title>
          ,” in Proceedings - 2023
          <source>IEEE International Conference on Web Services, ICWS</source>
          <year>2023</year>
          ,
          <year>2023</year>
          . doi:
          <volume>10</volume>
          .1109/ICWS60048.
          <year>2023</year>
          .
          <volume>00099</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>U.</given-names>
            <surname>Kamath</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Keenan</surname>
          </string-name>
          , G. Somers, and
          <string-name>
            <given-names>S.</given-names>
            <surname>Sorenson</surname>
          </string-name>
          , “
          <article-title>Retrieval-augmented generation,” in Large language models: A deep dive</article-title>
          , Cham: Springer Nature Switzerland,
          <year>2024</year>
          , pp.
          <fpage>275</fpage>
          -
          <lpage>313</lpage>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>031</fpage>
          -65647-
          <issue>7</issue>
          _
          <fpage>7</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <article-title>[8] “Understanding retrieval pitfalls: Challenges faced by retrieval augmented generation (RAG) models</article-title>
          .”
          <source>Accessed: May</source>
          <volume>28</volume>
          ,
          <year>2024</year>
          . [Online]. Available: https://medium.com/@researchgraph/understanding
          <article-title>-retrieval-pitfalls-challenges-faced-byretrieval-augmented-generation-rag-models-5bcc28a03842.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          et al.,
          <article-title>“Retrieval-augmented generation for knowledge-intensive NLP tasks</article-title>
          ,”
          <source>in Advances in Neural Information Processing Systems</source>
          ,
          <year>2020</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>R.</given-names>
            <surname>Lineberry</surname>
          </string-name>
          , “
          <article-title>Solve and evolve: Practical applications for knowledge-centered service</article-title>
          ,”
          <source>in Proceedings ACM SIGUCCS User Services Conference</source>
          ,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1145/3347709.3347793.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>B.</given-names>
            <surname>Luo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. Y. K.</given-names>
            <surname>Lau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Li</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Y. W.</given-names>
            <surname>Si</surname>
          </string-name>
          , “
          <article-title>A critical review of state-of-the-art chatbot designs and applications</article-title>
          ,”
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1002/widm.1434.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>D. E. O'Leary</surname>
          </string-name>
          , “
          <article-title>The rise and design of enterprise large language models,” IEEE Intell Syst</article-title>
          , vol.
          <volume>39</volume>
          , no.
          <issue>1</issue>
          ,
          <year>2024</year>
          , doi: 10.1109/MIS.
          <year>2023</year>
          .
          <volume>3345591</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>E. W. T.</given-names>
            <surname>Ngai</surname>
          </string-name>
          ,
          <string-name>
            <surname>M. C. M. Lee</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>P. S. L.</given-names>
          </string-name>
          <string-name>
            <surname>Chan</surname>
          </string-name>
          , and T. Liang, “
          <article-title>An intelligent knowledge-based chatbot for customer service</article-title>
          ,
          <source>” Electron Commer Res Appl</source>
          , vol.
          <volume>50</volume>
          ,
          <year>2021</year>
          , doi: 10.1016/j.elerap.
          <year>2021</year>
          .
          <volume>101098</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wilde</surname>
          </string-name>
          ,
          <article-title>Customer knowledge management</article-title>
          .
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .1007/978-3-
          <fpage>642</fpage>
          -16475-0.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>