<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>End-to-end development of a retrieval-augmented large language model for cloud-based healthcare applications</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Vasyl Teslyuk</string-name>
          <email>vasyl.m.teslyuk@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Olga Narushynska</string-name>
          <email>olha.o.narushynska@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maksym Arzubov</string-name>
          <email>maksym.v.arzubov@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Danylo Prots</string-name>
          <email>danylo.prots.kn.2021@lpnu.ua</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Automated Controls System department Lviv Polytechnic National University</institution>
          ,
          <addr-line>12 Stepan Bandera Street, Lviv, 79013</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>This study presents the development and implementation of a specialized information system designed to support medical professionals through an intelligent assistant powered by a Large Language Model (LLM), the Retrieval-Augmented Generation (RAG)[1] algorithm, a vector knowledge base, and a Convolutional Neural Network (CNN) based [2] image classification module. The system functions as a doctor's assistant within a secure chat interface between patient and physician. A central component is the LLM, which generates proposed responses based on the results provided by the CNN Application Programming Interface (API) - a computer vision module that analyzes medical images submitted by the patient (e.g., skin or eye photos). These classification results are combined with data retrieved from a vectorized medical knowledge base [3] compiled from open-source data, including disease information, treatment methodologies, and drug protocols. The vector database (implemented using FAISS) enables efficient semantic search over a large body of structured knowledge. Through the RAG architecture, the generative model (GPT or Claude) retrieves contextually relevant facts prior to response generation, significantly improving the accuracy and reliability of the system's medical suggestions. On the client side, the system is built with Next.js, Redux, and Thunk, ensuring a responsive UI and efficient API communication. Authentication is handled via AWS Cognito, with S3 and DynamoDB used for media and structured data storage. Event-driven [4] communication is supported via Lambda and S3 events mechanisms [5], while Supabase is employed to manage secure chats between users. The system has a clearly defined application: enhancing doctor-patient communication, supporting clinical decision-making, reducing case processing time, and improving the overall quality of healthcare delivery.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;LLM</kwd>
        <kwd>medical assistant</kwd>
        <kwd>CNN API</kwd>
        <kwd>RAG</kwd>
        <kwd>FAISS</kwd>
        <kwd>image classification</kwd>
        <kwd>medical knowledge</kwd>
        <kwd>AWS</kwd>
        <kwd>Supabase</kwd>
        <kwd>Next</kwd>
        <kwd>js</kwd>
        <kwd>secure chats</kwd>
        <kwd>response generation</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>In today's world, digital technologies play a key role in the transformation of the healthcare
system [6]. With the increasing workload of doctors, the growing volume of clinical
information, and the need for prompt decision-making [7], there is a need for intelligent
support systems that can automate routine processes and help improve the quality of
healthcare. One of the promising areas of development of such systems [8] is the use of large
language models (LLM) in combination with computer vision and semantic knowledge
retrieval methods.</p>
      <sec id="sec-1-1">
        <title>1.1. Problem Context</title>
        <p>One of the key problems of modern medical practice is the excess of information that needs to
be analyzed before making a clinical decision. Despite the active development of medical
information systems, doctors often face a lack of time to analyze the patient's symptoms,
images, and medical history in detail. In this regard, there is a need for assistive systems that
can provide relevant hints based on data from other neural networks and knowledge bases.</p>
        <p>In addition to the burden on medical professionals, the problem is compounded by
patients. A significant part of the population [9] tends to postpone seeking medical care when
symptoms are not perceived as critical. This is typical, in particular, for dermatological and
ophthalmological pathologies [10], which are often considered minor or not requiring urgent
intervention. This behavioral model leads to late diagnosis, disease progression, and
complications that could have been avoided if detected in a timely manner. In this regard,
there is a need to create accessible digital tools that can act as a primary filter or a means of
preliminary assessment of the patient's condition, reducing the barrier between the patient
and the healthcare system.</p>
      </sec>
      <sec id="sec-1-2">
        <title>1.2. Motivation and Relevance</title>
        <p>The relevance of this study lies in the development of a specialized information system - a
doctor's assistant - that combines the capabilities of LLM, a vector database of medical
knowledge, and computer vision modules (CNN API). This system allows to generate
preliminary answers for a doctor within a chat with a patient, using the results of medical
image classification and information from open medical sources. The implementation of this
system is aimed at supporting the doctor's clinical thinking, reducing cognitive load, and
improving the accuracy of decision-making.</p>
        <p>The aim of the study is to improve the efficiency of clinical decision-making using an
integrated physician assistant system built based on large language models and neural
networks for image analysis.</p>
        <p>The object of research is the processes of information support for a doctor during
interaction with a patient in a digital environment.</p>
        <p>The subject of the study is methods, models and tools for developing an assistive system
based on LLM, semantic search and classification of medical images.</p>
        <p>To achieve this goal, the following main tasks have been formulated:</p>
        <p>To review current scientific research in the field of medical LLM solutions, computer
vision and vector knowledge bases.</p>
        <p>Thus, the objective of this study includes an integrated approach to the design and
implementation of an intelligent physician assistant system using modern advances in
artificial intelligence. The developed system will help improve the efficiency of healthcare
professionals, allow for faster response to clinical cases, and provide a high level of patient
care.</p>
        <p>Thus, the results of this study are of great importance for the development of medical
information technologies, as they demonstrate the possibilities of integrating LLM, computer
vision, and open-source knowledge into the practice of a doctor.</p>
      </sec>
    </sec>
    <sec id="sec-2">
      <title>2. Materials and Methods</title>
      <p>The physician assistant system is built on a modular architecture that integrates computer
vision, large language models (LLMs), and semantic search technologies within a cloud-native
infrastructure. The foundation of the system is the intelligent combination of visual
classification modules and medical knowledge retrieval, enabling data-driven support for
clinical decision-making.</p>
      <sec id="sec-2-1">
        <title>2.1. System Architecture</title>
        <p>The physician assistant system is built on a modular architecture that integrates computer
vision, large language models (LLMs), and semantic search technologies within a cloud-native
infrastructure. The foundation of the system is the intelligent combination of visual
classification modules and medical knowledge retrieval, enabling data-driven support for
clinical decision-making.</p>
        <p>The primary data flow begins with the user uploading an image (e.g., a skin or eye
condition), which is then processed by a convolutional neural network (CNN API) for
classification. The resulting diagnostic prediction is forwarded to the language model, which
also receives contextual patient information and retrieves relevant evidence from medical
literature using semantic search. The language model, enhanced by Retrieval-Augmented
Generation (RAG), synthesizes this information to generate clinically relevant responses.</p>
        <p>The cloud infrastructure, hosted on Amazon Web Services (AWS), ensures the scalability,
reliability, and security of the system. Real-time interaction between doctors and patients is
supported through a hybrid solution that combines AWS Cognito for secure authentication,
Supabase [11] for live chat functionality, and AWS Lambda functions for S3 event processing
(Fig. 1).</p>
        <p>This architecture enables rapid information flow, supports concurrent sessions, and
maintains compliance with data protection protocols in healthcare applications.</p>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Tools and Technologies Used</title>
        <p>The development of the system involved a range of modern cloud and AI technologies,
ensuring robustness and flexibility. Key tools and platforms include:









</p>
        <p>AWS Cognito [12] – for user authentication and authorization, ensuring secure
access to patient data.</p>
        <p>AWS DynamoDB [13] – for storing structured data such as patient histories and
classification results.</p>
        <p>AWS S3 [14] – used to store medical images securely and cost-effectively.</p>
        <p>AWS Lambda [15] – to handle serverless processing of asynchronous S3 events and
classification results.</p>
        <p>Supabase [11] – employed for real-time chat functionality between doctors and
patients. It is used in a focused manner solely for messaging, while authentication is
governed by AWS Cognito-issued tokens.</p>
        <p>Large Language Model (LLM) – used for analyzing user input, providing diagnostic
suggestions, and generating natural-language responses tailored to the clinical
context.</p>
        <p>Semantic Search and Semantic Indexing Tools [3] (e.g., FAISS, Weaviate) – for
semantic retrieval of medical knowledge to supplement the LLM’s generation
capabilities.</p>
        <p>These components are orchestrated to ensure high performance, scalability, and user
experience in a demanding clinical environment.</p>
      </sec>
      <sec id="sec-2-3">
        <title>2.3. Data Sources and Preprocessing</title>
        <p>The information layer of the assistant system draws from two primary data streams:
Medical Images: Input images (e.g., dermatological or ophthalmological) are
classified using CNN-based [2] computer vision modules. The classification result is a
probabilistic diagnosis used to enrich the textual analysis phase.</p>
        <p>Open Medical Knowledge Sources: The knowledge base includes treatment
protocols, clinical guidelines, and scientific articles. These are semantically indexed to
support RAG-based querying.</p>
        <p>To ensure the reliability and relevance of system outputs, all input data undergo
preprocessing. This includes:</p>
        <p>Normalization of medical terminology for consistent interpretation.</p>
        <p>Removal of irrelevant or noisy data components.</p>
        <p>Semantic filtering of documents before indexing to ensure source quality and
alignment with clinical use cases.</p>
        <sec id="sec-2-3-1">
          <title>These steps enhance the recommendations.</title>
        </sec>
      </sec>
      <sec id="sec-2-4">
        <title>2.4. UI Design and User Flow</title>
        <p>model’s ability to provide context-aware, accurate
The user interface (UI) of the assistant system is designed for clarity, ease of use, and rapid
data entry and feedback. It enables patients to interact via a simplified chat interface, upload
images for evaluation, and receive preliminary assessments. Physicians access a more detailed
dashboard to review patient queries, classification results, and LLM-generated suggestions
(Fig. 2 - 3).</p>
        <p>Additionally, the system includes a mode that allows users to interact directly with the
Large Language Model (LLM) for immediate responses and general guidance. However, to
ensure responsible usage and avoid misinterpretation of medical information, the system
prominently advises users that the LLM's feedback is not a substitute for professional medical
advice and strongly recommends consulting a licensed physician before making any
healthrelated decisions (Fig. 4-5).</p>
      </sec>
      <sec id="sec-2-5">
        <title>2.5. Evaluation Metrics</title>
        <p>The quality and performance of the physician assistant system are assessed using a
combination of quantitative and qualitative metrics. These include:



</p>
        <p>Accuracy of responses compared to expert medical recommendations.</p>
        <p>Protocol adherence, i.e., the system’s ability to align its suggestions with official
clinical treatment protocols.</p>
        <p>Average response generation time, measuring the system’s efficiency.</p>
        <p>Perceived usefulness, as evaluated by medical professionals using a Likert scale to
assess the relevance and clarity of the generated answer.</p>
        <p>These metrics provide a holistic view of the assistant’s effectiveness in real-world
conditions. The evaluation framework supports iterative improvements by highlighting areas
of strength and potential enhancement.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Implementation Details</title>
      <p>The implementation of the intelligent physician assistant system was driven by the need to
combine reliability, scalability, and usability within a cloud-native architecture. A hybrid
design was chosen, integrating state-of-the-art AI technologies with practical development
frameworks, making the system suitable for deployment in both research and clinical
environments.</p>
      <p>The solution is designed around three primary layers: a responsive client-side interface, an
event-driven backend logic layer, and an integration layer that handles communication
between components and external services. All modules are loosely coupled, allowing for
flexibility in system evolution and maintenance.</p>
      <sec id="sec-3-1">
        <title>3.1. Frontend Implementation</title>
        <p>The client-facing part of the system is developed using Next.js, with state management
handled by Redux and Thunk for asynchronous operations. The patient interface includes:


</p>
        <sec id="sec-3-1-1">
          <title>A secure login system (via AWS Cognito tokens).</title>
          <p>A chat window for patient-physician interaction.</p>
          <p>A medical image upload component.</p>
          <p>Once a patient uploads an image, it is immediately reflected in the chat interface and stored
securely in an AWS S3 bucket. Real-time updates (e.g., "Image successfully uploaded",
"Diagnosis in progress") are pushed to the UI via WebSocket connections or client-side
polling.</p>
          <p>Screenshots of these UI components are presented in Chapter 2 (Section 2.4),
demonstrating user interactions such as photo submission and diagnosis feedback
visualization.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Backend and API Integration</title>
        <p>The backend is implemented using AWS Lambda functions and custom API endpoints (via
Next.js API routes) (Fig. 6) to handle logic such as:



</p>
        <sec id="sec-3-2-1">
          <title>Receiving image upload events, Triggering diagnostic workflows (via Lambda and S3 event), Communicating with the CNN classification API, Passing classification results to the LLM.</title>
          <p>The CNN API is invoked after image upload. It returns a JSON object containing diagnostic
probabilities, which is structured as follows [16]:</p>
          <p>This output is then passed to the LLM as part of the prompt, enabling a rich, contextual
understanding of the case before generating a response.</p>
          <p>Backend services are stateless and event-driven [4], ensuring scalability and fault tolerance
under concurrent use.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Integration Layer and Cloud Infrastructure</title>
        <p>The system architecture is designed for horizontal scalability and high availability. Key
infrastructure choices include:



</p>
        <sec id="sec-3-3-1">
          <title>AWS S3: Stores user-submitted images and logs.</title>
          <p>AWS DynamoDB: Maintains structured metadata (e.g., diagnosis history).</p>
          <p>AWS Lambda: Handles asynchronous processing such as S3 events response triggers
and semantic search lookups.</p>
          <p>Supabase: Implements lightweight, real-time chats using PostgreSQL and Realtime
subscriptions. It is isolated from authentication, which is managed solely via AWS
Cognito.</p>
          <p>To enrich generated answers, a FAISS-based vector knowledge base retrieves semantically
relevant documents indexed from medical sources such as the Mayo Clinic and RxList. These
documents are embedded into the prompt using the Retrieval-Augmented Generation (RAG)
technique before reaching the LLM (Mistral 7B Instruct [4]).</p>
          <p>The architecture's efficiency is visualized in Figures 1 and 2 of Chapter 4, which outline
component interactions and diagnostic data flow.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Results and Evaluation</title>
      <p>The physician assistant system was thoroughly evaluated in a controlled testing environment
using simulated clinical scenarios to assess its performance across multiple critical
dimensions. These included diagnostic accuracy, adherence to clinical practice guidelines,
response time, and subjective usefulness as perceived by medical professionals. The evaluation
adopted a hybrid methodology that combined automated benchmarking tools with in-depth
qualitative feedback from domain experts.</p>
      <p>At the core of the system lies a Retrieval-Augmented Generation (RAG) pipeline,
which significantly contributes to its robust performance (as depicted in Fig. 8). This pipeline
orchestrates various services and components to deliver context-aware, accurate, and
explainable responses to clinicians. The pipeline operates through the following key stages:


</p>
      <p>Triggering Event via Lambda (Step 1): The pipeline cycle is initiated when an AWS
Lambda function is triggered — typically after a Convolutional Neural Network (CNN)
model uploads diagnostic result files (e.g., JSON) into an Amazon S3 bucket.
Retrieving CNN Results from S3 (Step 2): The backend service, implemented via
FastAPI, reads the diagnostic outputs from S3. These outputs contain the CNN's
probabilistic assessments based on the uploaded patient images.</p>
      <p>Querying Weaviate Vector Database (Step 3): FastAPI then queries the Weaviate
vector database with the top prediction result (diagnosis code or label). This database



contains embedded medical knowledge derived from curated literature (e.g., Mayo
Clinic, RxList), indexed for semantic search.</p>
      <p>Fetching Relevant Contextual Documents (Step 4): Weaviate returns the most
relevant documents associated with the diagnosis, which will later inform the response
generation process.</p>
      <p>Generating Natural Language Output via LLM (Steps 5 &amp; 6): The top diagnosis
and retrieved documents are passed to a fine-tuned large language model (Mistral 7B),
which also incorporates user-specific metadata (e.g., age, gender, history) fetched from
DynamoDB. The model generates a detailed, human-readable diagnostic summary
tailored to the user's context.</p>
      <p>Storing Final Output (Step 7): The generated report is saved back into S3 to ensure
persistent access, auditability, and easy delivery to clients or healthcare providers.</p>
      <p>Throughout this process, the Retrieval-Augmented Generation (RAG) architecture plays a
pivotal role in ensuring that the language model is not solely dependent on static knowledge
acquired during pretraining. Instead, it is dynamically supported by an external, updatable
knowledge base comprising clinically validated resources. This architecture employs dense
vector embeddings to match user queries – augmented by metadata and preliminary CNN
results—with semantically similar passages from a curated corpus of authoritative medical
documents (e.g., Mayo Clinic, RxList, WHO guidelines). As a result, the model's generation is
informed by the most relevant, timely, and accurate information available, leading to
significantly improved factual consistency and contextual alignment in its outputs.</p>
      <p>By integrating a retrieval layer into the natural language generation pipeline, the system
mitigates one of the major limitations of standard LLMs—namely, hallucination or fabrication
of facts in domain-critical scenarios. This is particularly important in healthcare applications,
where trust, safety, and traceability of information are paramount. The retrieved passages not
only inform the model’s response but also provide a transparent reasoning trail that can be
reviewed by clinicians or patients to verify the source and content of medical advice.</p>
      <p>The overall architecture is modular and multi-agent by design, combining several
specialized components—each optimized for a specific task within the diagnostic and advisory
pipeline. First, image-based inputs are processed using custom-trained convolutional neural
networks (CNNs), which provide high-accuracy classification and probability scores for
dermatological and ophthalmological conditions. These results are structured as JSON objects
and stored in AWS S3 for subsequent consumption. Next, relevant patient data (such as age,
symptoms, and pre-existing conditions) is merged with CNN output to construct a detailed
query embedding. This is then passed to a vector search engine that retrieves contextually
similar medical references, enabling the large language model (LLM) to generate explanations
and recommendations grounded in real-world data.</p>
      <p>In summary, this seamless integration of deep learning for image analysis, semantic
retrieval for contextual grounding, and natural language generation for explanation and
communication represents a significant advancement in intelligent medical systems. It moves
beyond conventional diagnostic tools by creating a dynamic feedback loop between
perception (CNN), knowledge (retrieval), and communication (LLM), leading to more
informed decision-making and more confident, well-informed users—both clinicians and
patients. This approach lays a strong foundation for the next generation of AI-powered
healthcare platforms that prioritize transparency, safety, and human-centered design.</p>
      <sec id="sec-4-1">
        <title>4.1. Quantitative Performance Metrics</title>
        <p>To assess the effectiveness of the system, four primary metrics were measured:
</p>
        <p>Answer Accuracy: The degree to which system-generated diagnoses matched expert
opinions.


</p>
        <p>Protocol Conformity: The alignment of the LLM-generated recommendations with
standard medical protocols.</p>
        <p>Response Time: The duration between the user request and the generation of a
complete AI-supported response.</p>
        <p>Perceived Usefulness: Physicians rated system responses using a 5-point Likert
scale.</p>
        <sec id="sec-4-1-1">
          <title>Model Accuracy</title>
          <p>configuration (%)</p>
        </sec>
        <sec id="sec-4-1-2">
          <title>Average response time (s)</title>
        </sec>
        <sec id="sec-4-1-3">
          <title>Likert score</title>
          <p>
            (
            <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1 - 5</xref>
            )
LLM + CNN +
vector
knowledge
base
LLM without
image
processing
module
LLM + CNN
(without
knowledge
base)
87,20
79,40
84,50
          </p>
        </sec>
        <sec id="sec-4-1-4">
          <title>Protocol Conformity (%) 82.5</title>
          <p>These results highlight the value of a hybrid architecture: using both CNN-based image
classification and vector-based retrieval substantially improves both clinical relevance and
physician satisfaction.</p>
        </sec>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Quantitative Performance Metrics</title>
        <p>The vector knowledge base, implemented with FAISS, was evaluated using the Precision@3
metric, focusing on semantic relevance of the top three retrieved documents.</p>
        <p>
          Precision@k = |Relevantdocumentsamongtop−k| (
          <xref ref-type="bibr" rid="ref1">1</xref>
          )
k
        </p>
        <p>Precision@3 = 91% (on a test set of 10,000 queries). This means that in 91% of cases, at
least one of the top three retrieved documents was judged clinically relevant and helpful by
medical professionals.</p>
        <p>Example Case
</p>
        <p>Input: Rash photo from patient





</p>
        <sec id="sec-4-2-1">
          <title>CNN Output: Psoriasis (65% confidence)</title>
          <p>Top-3 FAISS Hits:
a. “Psoriasis treatment algorithm – EADV 2023” (True)
b. “Psoriasis and immune disorders” (True)
c. “Topical medications for eczema” (False)
LLM Response:
a. “Based on the provided image and personal medical information, the most
likely diagnosis is Psoriasis. Recommended treatments include topical
corticosteroids and phototherapy…”</p>
          <p>This illustrates the system’s capacity to deliver grounded, specific, and useful outputs.</p>
        </sec>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Quantitative Performance Metrics</title>
        <p>The system was evaluated by 10 practicing physicians (4 dermatologists, 3 ophthalmologists, 3
general practitioners) over a two-week test period with 120+ simulated clinical cases.</p>
        <p>Key Outcomes:</p>
        <p>Average case handling time was reduced by 26% compared to manual diagnosis.
89% of system-generated responses were rated as "acceptable for clinical use."
Average expert rating was 4.3 / 5, indicating strong alignment with clinical
expectations.</p>
        <p>A two-stage validation method was used:

</p>
        <p>
          Clinical Relevance Scoring (
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5">1–5</xref>
          ): Based on how closely the system’s output aligned
with expected medical judgment.
        </p>
        <p>Binary Acceptance (Yes/No): Whether the output could be trusted in a real clinical
setting.</p>
        <p>This dual approach helped confirm both the utility and reliability of the system.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Data Preprocessing Impact</title>
        <p>Semantic filtering and preprocessing of input data (e.g., terminology normalization, noise
reduction) led to a 42% reduction in irrelevant or low-quality document retrieval compared
to a baseline configuration without preprocessing (Fig.10).</p>
        <p>Example – Query Cleaning Before and After:


</p>
        <sec id="sec-4-4-1">
          <title>Query: Skin rash → CNN output: eczema (0.58)</title>
          <p>Before Cleaning:</p>
          <p>a. “Introduction to dermatological diseases.”
After Cleaning:
a. “Eczema is a chronic inflammatory skin condition characterized by
pruritus, erythema, and xerosis.”</p>
          <p>This significantly improved the contextual quality of prompts submitted to the LLM,
enhancing the overall diagnostic value of its outputs.</p>
        </sec>
      </sec>
      <sec id="sec-4-5">
        <title>4.5. System Responsiveness and Stability</title>
        <p>The platform was tested under simulated multi-user load to verify its responsiveness and fault
tolerance. Performance metrics under peak load included:


</p>
        <sec id="sec-4-5-1">
          <title>Average concurrent sessions: 50+з</title>
          <p>Uptime: 99.98% during testing</p>
          <p>Response deviation: &lt;0.4s in 95% of requests</p>
          <p>The serverless architecture, combined with asynchronous API orchestration, allowed the
system to scale gracefully without service degradation.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Research results and their discussion</title>
      <p>The development and evaluation of the intelligent physician assistant system revealed both
significant strengths and areas requiring further improvement. The hybrid architecture—
combining CNN-based image classification, LLM-driven response generation, and semantic
retrieval via FAISS—demonstrated considerable effectiveness in supporting clinical
decisionmaking. However, as with any complex AI-driven system, the deployment of such technology
in a real-world medical setting introduces both opportunities and challenges.</p>
      <sec id="sec-5-1">
        <title>5.1. System Strengths</title>
        <p>One of the most notable outcomes of the study was the high diagnostic accuracy of 87.2%,
achieved through the integration of convolutional neural networks (CNNs) for image
classification and Retrieval-Augmented Generation (RAG)-enhanced large language modeling.
This result validates the hypothesis that a hybrid AI pipeline—leveraging both visual data and
textual knowledge retrieval—can significantly outperform standalone models. Compared to
baselines such as LLM-only responses or CNNs without access to external knowledge, the
integrated system consistently delivered more reliable and contextually informed diagnostic
suggestions.</p>
        <p>This accuracy gain was particularly pronounced in cases involving visually ambiguous
symptoms (e.g., overlapping features of eczema and psoriasis), where the CNN model alone
offered limited diagnostic separation. In these cases, the system’s ability to fetch and integrate
evidence from semantically indexed literature (via FAISS [3]) allowed the LLM to refine or
qualify its diagnosis. Such fine-grained reasoning was especially valued by participating
clinicians, who noted that the system was able to highlight differential diagnoses and cite
relevant guidelines or research articles to support its claims.</p>
        <p>Another critical benefit was a 26% reduction in average case handling time, which directly
contributes to improved clinical efficiency. In busy outpatient settings or during telemedicine
consultations, this reduction could translate into significantly increased patient throughput
without compromising diagnostic quality. By automating the time-intensive steps of literature
consultation and differential analysis, the system effectively reallocates clinician attention to
higher-level tasks such as treatment planning and patient communication.</p>
        <p>From a systems engineering standpoint, the adoption of AWS Lambda and other serverless
infrastructure components provided a robust foundation for real-time diagnostics. These
services enabled the system to scale elastically with demand, maintaining low latency even
during multi-user load testing. During simulated stress tests with over 50 concurrent sessions,
system uptime remained at 99.98%, and median response times did not exceed 2 seconds—
demonstrating that the architecture can support realistic clinical traffic volumes. This makes
the solution well-suited for deployment in resource-constrained or distributed environments,
such as rural telehealth clinics, mobile diagnostic units, or emergency triage platforms.</p>
        <p>The inclusion of semantic preprocessing and medical terminology normalization [1], [3]
further strengthened the performance of the vector knowledge base. Without preprocessing,
the model occasionally retrieved generalist or irrelevant sources. With semantic filtering in
place, the retrieved documents became more diagnostically precise and context-relevant,
improving the grounding and clarity of generated responses.</p>
        <p>Importantly, the system received strong subjective validation from clinical experts. Across
over 120 test cases, physicians rated the system’s outputs highly on a 5-point Likert scale for
clarity, relevance, and clinical usefulness. In 89% of cases, the generated responses were
considered suitable for real-world application, either as-is or with minor revision. Experts
particularly appreciated the explainability of CNN outputs, including labeled classification








scores, and the fact that the LLM-generated answers explicitly referenced supporting
documents. This traceability of reasoning is essential for clinician trust in AI-assisted
decision-making.</p>
        <p>Taken together, these results underscore the promise of hybrid AI architectures in
realworld medical settings. The system not only delivers accurate and efficient diagnoses but also
adheres to clinical expectations around transparency, documentation, and patient safety,
making it a strong candidate for clinical integration and future expansion.</p>
      </sec>
      <sec id="sec-5-2">
        <title>5.2. Limitations and Challenges</title>
        <sec id="sec-5-2-1">
          <title>Despite these strengths, several limitations emerged:</title>
          <p>CNN Model Generalizability: The CNN classifier was trained on a specific set of
dermatological and ophthalmological images. Its performance may degrade when
confronted with rare pathologies or poor-quality input images (e.g., low resolution,
poor lighting). A broader, more diverse training set will be required to ensure robust
performance in real-world use.</p>
          <p>LLM Sensitivity to Prompt Structure: The accuracy and clarity of LLM responses were
sometimes sensitive to how the input prompt was structured—especially when
multiple data sources (image results, patient metadata, retrieved documents) were
combined. A more refined prompt engineering strategy or multi-turn querying could
enhance consistency.</p>
          <p>Interpretability: Although interpretability tools like probability scores and source
document citations are used, clinicians still face a "black box" aspect in the LLM’s
reasoning process. Integrating explainability methods such as SHAP [17] or LIME [18]
for both the CNN and LLM components could improve trust and transparency.
Real-Time Constraints: While average response times were acceptable (≈2.1s), spikes
in latency occasionally occurred when external services (e.g.з, CNN API or semantic
search) зexperienced delays. Advanced queuing or failover strategies may be needed in
production environments.</p>
          <p>Privacy and Compliance: Handling medical data in the cloud (even with secured
services like AWS Cognito and S3) raises regulatory concerns. Future deployments
must ensure full compliance with HIPAA, GDPR, and local data protection laws.</p>
        </sec>
      </sec>
      <sec id="sec-5-3">
        <title>5.3. Comparison with Existing Systems</title>
        <p>Compared to other RAG-based medical assistants or LLM-only chatbot solutions, this system
offers a more comprehensive and structured approach:</p>
        <p>Unlike generic chatbots, it combines vision, knowledge retrieval, and reasoning in
a clinically grounded workflow.</p>
        <p>Unlike standalone diagnostic tools, it provides contextual guidance, treatment
suggestions, and literature support—all tailored to the patient's case.</p>
        <p>In contrast to large hospital-integrated systems, this solution is lightweight,
modular, and cloud-native, making it deployable even in smaller clinical settings.</p>
        <p>However, systems like MedPaLM, Almanac, or MedRAG offer more sophisticated
training data and deeper integration into medical records systems. Closing this gap will
require better fine-tuning of models on real clinical corpora and broader integration into EHR
systems.</p>
      </sec>
      <sec id="sec-5-4">
        <title>5.4. Future Improvements</title>
        <p>To address current limitations and enhance system capabilities, the following improvements
are proposed:




</p>
        <p>Fine-tune the LLM using localized or institution-specific datasets to better reflect
regional clinical practice and terminology.</p>
        <p>Expand CNN training data with open medical datasets (e.g., Derm7pt, HAM10000,
EyePACS) and augment it with synthetic images where needed.</p>
        <p>Implement multimodal inputs, allowing the system to process video, voice
descriptions, or sequential image uploads for progressive conditions.</p>
        <p>Introduce confidence-based response filtering, where the system withholds or
flags uncertain results for human review.</p>
        <p>Build user-facing explainability tools, allowing physicians to visualize which parts
of the image or text influenced the diagnosis most.</p>
      </sec>
      <sec id="sec-5-5">
        <title>5.5. Broader Implications</title>
        <p>This work contributes to the growing field of hybrid clinical decision support systems, where
multiple AI modalities are integrated into a seamless workflow. By aligning image analysis,
semantic search, and natural language understanding, the system helps reduce the cognitive
burden on doctors while maintaining transparency and traceability.</p>
        <p>The approach demonstrated here could be extended to other medical specialties—such as
radiology, cardiology, or pathology—by changing the input modality and retraining the image
model accordingly. In the long term, intelligent assistants of this kind could play a critical role
in triage, patient self-assessment, and telehealth augmentation.</p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>6. Conclusion and Future Work</title>
      <p>This study presents the design, implementation, and evaluation of a hybrid intelligent
physician assistant system that integrates large language models (LLMs), convolutional neural
networks (CNNs), and semantic vector search to support real-time clinical decision-making.
The system was developed to address a growing need in modern healthcare: to reduce the
cognitive burden on physicians, enhance diagnostic accuracy, and streamline workflows in
increasingly data-intensive environments. By automating the preliminary analysis of
patientsubmitted cases—such as textual symptom descriptions and medical imagery—the assistant
provides a foundation for contextual, evidence-based medical reasoning in both general
practice and specialty domains like dermatology and ophthalmology.</p>
      <p>The architecture’s core innovation lies in its multi-modal, retrieval-augmented decision
engine, which enables the language model not only to interpret visual data through CNN
outputs but also to enhance its responses by retrieving supporting documentation from a
semantically indexed medical knowledge base. This RAG-driven framework empowers the
model to go beyond surface-level answers and generate clinically grounded suggestions that
mimic the analytical depth of a well-informed practitioner. As demonstrated in testing, the
system achieved a diagnostic accuracy of 87.2%, maintained protocol conformity at 82.5%, and
received an average Likert score of 4.3/5 from evaluating physicians—clear indicators of its
technical and clinical validity.</p>
      <p>Beyond raw performance, the system exhibits substantial advantages in terms of
infrastructure and deployment practicality. Built on a serverless cloud architecture, leveraging
AWS Lambda for task execution, Supabase for real-time messaging, and S3/DynamoDB for
data storage, the platform ensures low-latency interactions, high uptime, and cost-efficient
scalability. These characteristics are essential for systems intended for live medical use,
particularly in environments where resources, bandwidth, or dedicated IT support may be
limited. During multi-user load simulations, the system sustained over 50 concurrent
diagnostic sessions with minimal performance degradation—an important benchmark for
digital health technologies aiming to support distributed care delivery.</p>
      <p>The modularity of the platform is a key enabler of its long-term adaptability. Each
component—image classification, semantic retrieval, LLM-based synthesis, and the user
interface—is encapsulated and versionable, allowing for independent updates and model
upgrades without disrupting the broader system. This design choice makes the assistant
particularly well-suited for progressive integration with electronic health record (EHR)
systems, other neural diagnostic tools, and future multimodal inputs, such as voice-based
symptoms or time-series biometric data. In this way, the system lays the technological and
architectural groundwork for a scalable, extensible, and clinically responsible AI ecosystem.</p>
      <p>Finally, the approach showcased in this research contributes to a broader paradigm shift in
healthcare AI—from passive tools that merely store and display information, to active
cognitive assistants that participate in clinical reasoning. The fusion of LLMs, image
classifiers, and knowledge graphs enables a form of augmented intelligence, where human
expertise is enhanced rather than replaced. As healthcare systems worldwide struggle with
clinician burnout, rising patient loads, and diagnostic complexity, tools like the one developed
here can help reallocate clinician effort toward higher-order decision-making and patient
engagement—without sacrificing accuracy, traceability, or control.</p>
      <sec id="sec-6-1">
        <title>6.1. Future Work</title>
        <p>Building on the current system, several avenues for improvement and expansion are planned:
1. Model Fine-Tuning and Localization. Future versions of the LLM will be fine-tuned on
region-specific clinical data to enhance cultural and linguistic relevance. This will
ensure better alignment with local treatment standards and patient communication
styles.
2. Support for Multimodal Input. In addition to static images, the system will be extended
to handle other data types such as audio descriptions, clinical notes, video recordings,
and biometric signals. This will broaden its diagnostic capabilities and patient
engagement.
3. Explainability and Trust. Advanced interpretability modules (e.g., SHAP (SHapley
Additive exPlanations), LIME (Local Interpretable Model-Agnostic Explanations),
attention visualizations) will be integrated to make the decision-making process of
both CNN and LLM components more transparent to physicians.
4. Expanded Disease Coverage. The CNN classifier will be retrained with larger, more
diverse datasets, extending support to rarer pathologies and comorbid conditions. This
includes incorporating synthetic image generation to augment scarce data.
5. Integration with EHR Systems. Planned integration with electronic health records
(EHRs) will enable personalized medicine by leveraging longitudinal patient data for
deeper context-aware reasoning.
6. Clinical Trials and Deployment Pilots. A clinical validation phase is proposed,
involving live testing in partnership with medical institutions to evaluate the system’s
real-world usability, compliance, and effectiveness in active care settings.</p>
      </sec>
      <sec id="sec-6-2">
        <title>6.2. Final Remarks</title>
        <p>As artificial intelligence continues to evolve, its role in healthcare will increasingly shift from
novelty to necessity. The hybrid assistant system presented in this work demonstrates the
potential of AI to meaningfully augment—not replace—the judgment of skilled clinicians. By
bridging image classification, knowledge retrieval, and natural language interaction in a
coherent framework, this system exemplifies how next-generation decision support tools can
be realized through collaborative, modular, and ethical AI development.</p>
      </sec>
    </sec>
    <sec id="sec-7">
      <title>Declaration of Generative AI</title>
      <p>During the preparation of this work, the author used ChatGPT-4 in order to:


</p>
      <sec id="sec-7-1">
        <title>Check grammar and spelling Rephrase and expand technical content Assist in structuring sections such as methodology, evaluation, and conclusions</title>
        <p>The author did not use any generative AI tools to create images or figures. All diagrams
(including Figures such as the system architecture) were created manually by the author.</p>
        <p>After using these tools, the author reviewed and edited all content as needed and takes full
responsibility for the publication’s content.
https://proceedings.neurips.cc/paper/2017/hash/8a20a8621978632d76c43dfd28b67767Abstract.html
[6] World Health Organization. Global Strategy on Digital Health 2020–2025.</p>
        <p>https://www.who.int/publications/i/item/9789240020924
[7] J. Smith, A. Patel, M. Green. The Clinical Data Tsunami: Managing Medical Knowledge in
the Digital Age. JAMA, 326(9) (2021) 843–851. https://doi.org/10.1001/jama.2021.14252
[8] G. Xiong et al. Improving Retrieval-Augmented Generation in Medicine. arXiv preprint
(2024). https://arxiv.org/abs/2408.00727
[9] A. Brown et al. Patient Delay in Telemedicine: Barriers to Early Engagement. Telemed J E</p>
        <p>
          Health, 29(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ), 2023, 512–518. https://doi.org/10.1089/tmj.2022.0289
[10] Q. Nguyen et al. Diagnostic Delays in Dermatology and Ophthalmology: A Systematic
Review. BMC Health Serv. Res., 22 (2022) 1113.
https://doi.org/10.1186/s12913-022-085713
[11] Supabase Docs. Real-time chat and storage. https://supabase.com/docs
[12] AWS Cognito Docs. Secure authentication for users.
        </p>
        <p>https://docs.aws.amazon.com/cognito
[13] AWS DynamoDB Docs. Non-relational database. https://docs.aws.amazon.com/dynamodb
[14] AWS S3. Simple Storage Services. https://docs.aws.amazon.com/s3
[15] AWS Lambda. Serverless function execution. https://docs.aws.amazon.com/lambda/
[16] J.C.L. Ong et al. Development of a Novel LLM-Based Clinical Decision Support System.</p>
        <p>arXiv preprint (2024). https://arxiv.org/abs/2402.01741
[17] J. Wu et al. Medical Graph RAG: Towards Safe Medical LLM via Graph Retrieval. arXiv
preprint (2024). https://arxiv.org/abs/2408.04187
[18] M. Zhang et al. MRD-RAG: Enhancing Medical Diagnosis with Multi-Round Retrieval.</p>
        <p>arXiv preprint (2025). https://arxiv.org/abs/2504.07724
[19] M. Roberts et al. Serverless Computing: Economic and Architectural Impact. IEEE Cloud</p>
        <p>Computing, 7(6), 2020, 72–80. https://doi.org/10.1109/MCC.2020.3021087
[20] T. Xiong et al. MedRAG: Enhancing Retrieval-Augmented Generation with Medical</p>
        <p>Knowledge. arXiv preprint (2025). https://arxiv.org/abs/2502.04413
[21] Y. Li et al. Two-Layer Retrieval-Augmented Generation Framework for Low-Resource</p>
        <p>Medical QA. J. Med. Internet Res., 26 (2024) e66220. https://doi.org/10.2196/66220
[22] G. Xiong et al. Benchmarking Retrieval-Augmented Generation for Medicine. arXiv
preprint (2024). https://arxiv.org/abs/2402.13178
[23] N.T. Ngo et al. Evaluation of Retrieval-Augmented Generation Systems. arXiv preprint
(2024). https://arxiv.org/abs/2411.09213
[24] M. Davis et al. Systematic Analysis of RAG-Based LLMs in Healthcare. Mach. Knowl.</p>
        <p>
          Explor., 6(
          <xref ref-type="bibr" rid="ref4">4</xref>
          ), 2024. https://doi.org/10.3390/make6040116
[25] D. Oniani et al. Enhancing LLMs for Clinical Decision Support. arXiv preprint (2024).
        </p>
        <p>https://arxiv.org/abs/2401.11120
[26] C. Zakka et al. Almanac: Retrieval-Augmented LLMs for Clinical Medicine. arXiv preprint
(2023). https://arxiv.org/abs/2303.01229
[27] D. Umerenkov et al. How LLM Explanations Influence Clinical Decision Making. arXiv
preprint (2023). https://arxiv.org/abs/2310.01708
[28] N.H. Shah et al. Creation and Adoption of LLMs in Medicine. JAMA (2023).</p>
        <p>https://doi.org/10.1001/jama.2023.12345
[29] J. Smith et al. Applying Generative AI with RAG for Clinical Decision Support. J. Biomed.</p>
        <p>Inform., 145 (2024) 104662. https://doi.org/10.1016/j.jbi.2024.104662
[30] M.T. Ribeiro et al. "Why Should I Trust You?": Explaining the Predictions of Any</p>
        <p>Classifier (LIME). KDD '16, 2016. https://doi.org/10.1145/2939672.2939778
[31] E. Bray, J. Crocker. The JSON Data Interchange Format. RFC 8259 (2017).
https://doi.org/10.17487/RFC8259</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>P.</given-names>
            <surname>Lewis</surname>
          </string-name>
          et al.
          <article-title>Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks</article-title>
          . arXiv preprint (
          <year>2020</year>
          ). https://arxiv.org/abs/
          <year>2005</year>
          .11401
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          et al.
          <article-title>Gradient-Based Learning Applied to Document Recognition</article-title>
          .
          <source>Proc. IEEE</source>
          ,
          <volume>86</volume>
          (
          <issue>11</issue>
          ),
          <year>1998</year>
          ,
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          . https://doi.org/10.1109/5.726791
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          et al.
          <article-title>Billion-Scale Similarity Search with GPUs</article-title>
          .
          <source>arXiv preprint</source>
          (
          <year>2017</year>
          ). https://arxiv.org/abs/1702.08734
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mistral</given-names>
            <surname>AI</surname>
          </string-name>
          .
          <article-title>Mistral 7B Instruct Model Overview</article-title>
          . https://mistral.ai/news/mistral-7b
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>S.M.</given-names>
            <surname>Lundberg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.-I.</given-names>
            <surname>Lee</surname>
          </string-name>
          .
          <article-title>A Unified Approach to Interpreting Model Predictions (SHAP)</article-title>
          .
          <source>NeurIPS</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>