=Paper= {{Paper |id=Vol-3894/paper15 |storemode=property |title=Traffic Accident Prediction and Warning System: Integration Use Case |pdfUrl=https://ceur-ws.org/Vol-3894/paper15.pdf |volume=Vol-3894 |authors=Amirhossein Ghaffari,Huong Nguyen,Alaa Saleh,Lauri Lovén,Ekaterina Gilman |dblpUrl=https://dblp.org/rec/conf/kil/GhaffariNSLG24 }} ==Traffic Accident Prediction and Warning System: Integration Use Case== https://ceur-ws.org/Vol-3894/paper15.pdf
                                Traffic Accident Prediction and Warning System:
                                Integration Use Case
                                Amirhossein Ghaffari1,2,∗ , Huong Nguyen1 , Alaa Saleh1 , Lauri Lovén1 and Ekaterina Gilman1
                                1
                                    Center for Ubiquitous Computing, University of Oulu, Oulu, Finland
                                2
                                    Infotech Oulu, University of Oulu, Oulu, Finland


                                                 Abstract
                                                 This paper presents a system for predicting and warning about traffic accidents in smart cities, aimed at enhancing urban safety
                                                 through advanced data analysis and explained warning and reporting. Our system emphasizes computational efficiency and
                                                 data privacy, predicting traffic accident severity with good accuracy. By integrating real data with external knowledge sources,
                                                 the system produces detailed, contextually relevant reports and warnings. Implemented with effective task orchestration, our
                                                 system ensures seamless integration and resource management. Evaluation results demonstrate high accuracy and scalability,
                                                 highlighting its potential for practical application in smart city environments. Future work will focus on further enhancing
                                                 model efficiency, exploring transfer learning for broader applicability, and conducting real-world deployments to validate
                                                 system performance.

                                                 Keywords
                                                 Smart City, Transportation, Federated Learning, Edge Computing, Generative AI, RAG



                                1. Introduction                                                                                                  When edge computing is integrated with AI, known as
                                                                                                                                                 EdgeAI, real-time urban decision-making could be facili-
                                By 2050, over two-thirds of the global population is pro- tated [9, 10]. For example, models trained to predict road
                                jected to live in urban areas [1]. Urbanization, driven by weather [11, 12] or traffic congestion can operate on edge
                                population growth and migration towards cities, presents devices located closer to the sites, providing immediate
                                both opportunities and challenges such as overpopula- insights to traffic management systems. Nonetheless, due
                                tion and traffic congestion [2]. Developing smart cities to the resource constraints of edge devices, certain com-
                                is a strategic approach to mitigate these challenges.                                                            putationally intensive tasks might still be offloaded to the
                                              A ”smart city” integrates information and communica- cloud or other powerful nodes within the city network.
                                tion technology to enhance urban living [3]. This concept That approach requires loosely-coupled architectures and
                                emphasizes the interconnection of community, people, distributed algorithms [13, 14].
                                and technology, aiming to prioritize human needs [4].                                                               A lot of research proposes AI support for smart trans-
                                Urban mobility and transportation are significant chal- portation systems from various perspectives. For exam-
                                lenges, with traffic congestion and accidents being major ple, Bortnikov et al. [15] detect accidents by training
                                concerns. Annually, traffic accidents result in 1.35 mil- a 3D Convolutional Network on the data generated by
                                lion deaths globally, underscoring the critical need for a video game, Uma and Eswar’s [16] develop yawning
                                effective accident prevention measures [5].                                                                      detection of the drivers, Liu et al [17] concentrate on
                                              In large-scale Internet of Things (IoT) ecosystems, effi- traffic flow prediction. However, the majority of related
                                cient data processing is crucial. Centralized cloud servers work focuses on a single type of AI module specifically
                                face latency and security challenges for many applica- developed for the task at hand, neglecting the capabili-
                                tion domains, making real-time processing difficult [6]. ties of integrating their approach with other kinds of AI
                                Edge computing aims to address these limitations by modules to create a more comprehensive support system.
                                bringing computational resources closer to data sources, Additionally, response time is often overlooked in the
                                enabling timely processing and reducing latency [7, 8]. assessment of related work, with a primary focus on ac-
                                                                                                                                                 curacy. To our knowledge, no existing work incorporates
                                KiL’24: Workshop on Knowledge-infused Learning co-located with multiple types of AI modules, raising questions about the
                                30th ACM KDD Conference, August 26, 2024, Barcelona, Spain
                                ∗
                                     Corresponding author.
                                                                                                                                                 integration and applicability of these separate modules
                                Envelope-Open amirhossein.ghaffari@oulu.fi (A. Ghaffari);                                                        into a cohesive framework.
                                huong.nguyen@oulu.fi (H. Nguyen); alaa.saleh@oulu.fi (A. Saleh);                                                    To address these gaps, we present our integrated sys-
                                lauri.loven@oulu.fi (L. Lovén); ekaterina.gilman@oulu.fi                                                         tem, containing two AI modules: first, Federated Learn-
                                (E. Gilman)                                                                                                      ing (FL) [18] model to predict traffic accident occurrences
                                Orcid 0009-0006-9264-8681 (A. Ghaffari); 0000-0001-9067-3396
                                (H. Nguyen); 0000-0002-6725-7383 (A. Saleh); 0000-0001-9475-4839
                                                                                                                                                 and estimate severity and second, Generative Artificial
                                (L. Lovén); 0000-0001-9816-2240 (E. Gilman)                                                                      intelligence (GenAI) to generate reports and warnings.
                                                    © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License
                                                    Attribution 4.0 International (CC BY 4.0).
                                                                                                                                                 Moreover, we utilized k0s, a lightweight Kubernetes dis-




CEUR
                  ceur-ws.org
Workshop      ISSN 1613-0073
Proceedings
tribution, for efficient task orchestration [19]. The task     to generate datasets, which are challenging to replicate
orchestration capabilities of k0s are crucial for seamlessly   in real-life scenarios. Yu et al. [21], with the same aim,
integrating the FL models and Retrieval-Augmented Gen-         proposed a Deep Spatio-Temporal Graph Convolutional
eration (RAG) processes across multiple edge nodes. This       Network for traffic accident prediction for Beijing traf-
enables automated deployment, scaling, and manage-             fic data, which was collected hourly over three months
ment of tasks, ensuring high availability, fault tolerance,    and includes accident records (time and location), ve-
and robust performance monitoring for our accident pre-        hicle speeds, meteorological conditions and points of
vention warning system.                                        interest. Recent research has considered informing other
   The contributions of this work can be summarized as         vehicles after detecting traffic accidents using IoT, IoV,
follows:                                                       and related technologies. Zhou et al. [22] proposed an
                                                               accident detection algorithm based on spatio-temporal
    1. We integrate two different kinds of AI modules
                                                               feature encoding with a multilayer neural network. This
       into a coherent distributed system supporting ac-
                                                               method first detects border frames as potential accident
       cident prevention. We comprehensively evaluate
                                                               frames, then encodes the spatial relationships of detected
       this system and analyze the related challenges
                                                               objects to confirm an accident. The process involves us-
       and opportunities.
                                                               ing Histogram of Oriented Gradients and ordinal features
    2. We orchestrate tasks and monitor our system,            initially, followed by CNN feature encoding and object
       examining its feasibility for real-world smart city     relationship detection with a multilayer neural network.
       environments.                                           A trained Support Vector Machine then confirms the
   The remainder of this article is organized as follows.      presence of an accident.
sec:relatedwork discusses related work, while sec:design          Another approach involves efforts to reduce accidents
describes the system design, and sec:implementation de-        before they occur is the work of Uma and Eswari [16],
tails the implementation. sec:eval then provides a de-         which developed a prototype using a Raspberry Pi and
tailed system evaluation and metrics, sec:discussfuture        Pi Camera, along with sensors to monitor driver’s eye
discusses our findings, implications, and future research      movements, detect yawning, and identify toxic gases and
directions, and sec:conclusion concludes the work.             alcohol consumption. This system, employing the Haar
                                                               Cascade algorithm for face detection and calculation of
                                                               Eye Aspect Ratio and Mouth Aspect Ratio, estimates risk
2. Related Work                                                through these feature analysis. Besides, to identify acci-
                                                               dent hot spots, Le et al. [23] used Road Traffic Accident
2.1. Intelligent Transportation System in                      data over three years in Hanoi, Vietnam, to develop a
     Smart City                                                GIS-based statistical analysis technique. This method
                                                               assesses the influence of accident severity on temporal-
Intelligent Transportation Systems (ITS) are essential
                                                               spatial patterns, identifying accident hotspots in relation
for the advancement of smart cities, with many recent
                                                               to specific times of day and seasons.
studies dedicated to improving urban traffic management
                                                                  Beyond the mention in [24] of the potential service
and safety. Here, we discuss several key works that have
                                                               supports of cloud to autonomous vehicles applications,
made significant contributions to this field. As an exam-
                                                               edge computing is playing a pivotal role in reshaping
ple, Hasan et al. [20] used the Google Distance Matrix
                                                               traffic management in smart cities. Within this domain,
and Directions APIs to provide advanced traffic jam alerts.
                                                               Mohamed’s [25] and Zhou’s research groups [26] demon-
Their Internet of Vehicles (IoV) module detects accidents
                                                               strated substantial improvements in traffic management
and, with the assistance of the National Data Warehouse
                                                               and reduced congestion durations through an edge-based
and a GPS module, notifies the nearest clinic. They devel-
                                                               model for real-time traffic data analysis. Besides, to
oped an Android application for routing suggestions and
                                                               achieve low latency and high prediction accuracy on
employed an Arduino with a Sonar sensor, temperature
                                                               vehicle identification at the edge, Wan et.al [27] have
sensor, gyroscope, piezo sensor, and GSM module as the
                                                               eliminated redundant frames from collected videos and
core processing unit.
                                                               presented an approach for real-time video processing.
   Working on one of the most trendy applications, Bort-
                                                                  In a similar manner, Ke et al. [28] developed a multi-
nikov et al. [15] developed a 3D Convolutional Neu-
                                                               thread system for real-time detection of near-crash events
ral Network (CNN) to recognize accidents automatically.
                                                               in traffic, using video analytics on dashcams. Leverag-
They trained the CNN using a custom video game to
                                                               ing edge power, their system efficiently performs object
create accident scenes with various weather and lighting
                                                               detection and tracking directly from the video feeds on
conditions, adding noise to diversify the data. The model
                                                               board. This approach involves removing irrelevant video
was then tested on real traffic videos from YouTube. The
                                                               to conserve bandwidth and storage while collecting di-
novelty of this research lies in the use of video games
                                                               verse and valuable data for traffic safety such as road user
type, vehicle trajectory, vehicle speed, brake switch, and      systems begin by detecting vehicles and subsequently
throttle. The approach from Ke et al. demonstrates con-         estimating traffic flow density.
siderable promise for widespread application due to its            In their research, Xu et al. [32] employed remote
low cost, real-time processing, high accuracy, and broad        sensing images for this purpose, while Chougule et al.
compatibility with various vehicles and camera types.           [33] continuously used the estimated traffic density from
   Additionally, a recent work by Nguyen et al. [29] uti-       intersection-captured images to dynamically adjust the
lized Blockchain technology alongside edge computing to         duration of green light and schedule the timing of signals
develop a reliable and transparent situational awareness        across all lanes.
system for autonomous vehicles. Their system broadcasts            As one of the highlights in the narrow field of applying
notifications and alternative route suggestions from the        FL on ITS: risk detection, Yuan et al. [34] introduced
nearest edge station when congestion or accidents are           FedRD, a framework combining edge-cloud computing,
detected by other vehicles, using various sensing data          FL, and differential privacy techniques for intelligent road
sources, including dashcam images and environmental             damage detection and warning. The framework not only
factors like weather, temperature, and humidity. The use        improves detection performance and coverage area but
of Blockchain in their study ensures the data validity and      also addresses privacy concerns through Individualized
integrity, as well as facilitates collaboration among differ-   Differential Privacy with pixelization technique.
ent service providers. However, despite the recognized             Comprehensive evaluations demonstrate FedRD’s ca-
vision and applications, Zhou et al. [30] emphasized that       pability to deliver high detection accuracy and wider cov-
employing edge computing in ITS always comes with               erage while preserving user privacy, even in scenarios
inherent challenges related to sensor failure, and privacy      where edge devices have limited data. This groundbreak-
protection concerns, which must be addressed for effec-         ing effectiveness sets a new benchmark in the field.
tive implementation.
                                                                2.3. GenAI in ITS
2.2. FL in ITS
                                                                Recently, GenAI has garnered significant attention in
Building on the challenges identified by Zhou et al. [30]       several applications, including ITS, due to its advan-
particularly concerning privacy protection, FL recently         tages and flexibility. By analyzing data from various
has been used more in smart cities. Amongst many ap-            sources, such as roadside sensors, vehicles, and traffic
plied domains within urban environments, the extension          signals, GenAI enhances urban operations by detecting
of FL applications in traffic systems is mostly leveraged       patterns, identifying trends, and providing accurate pre-
for traffic monitoring and accident predictions.                dictions and advice. With the leverage of natural lan-
   FedGRU - FL-based Gated Recurrent Unit (GRU) neural          guage processing, GenAI can present these predictions in
network [17] is one of the pioneering works for traffic         human-understandable language, making these technolo-
flow prediction (TFP) with federated deep learning that         gies more accessible and practical for smart services [35].
comparably performs to other advanced competing meth-           See prior works [36, 37] for examples of how GenAI in-
ods without compromising the privacy and security of            tegrated into many services within cities. As another
data. Additionally, as proved by experiments, the joint         example in ITS, Impedovo et al. [38] propose a deep gen-
announcement protocol proposed in this paper helps in           erative model to predict weekday vehicular traffic flow
reducing communication overhead by 64.10% compared              to prevent accidents in the most critical areas and im-
with centralized models, implicating the scalability of         prove continuity by reducing traffic. More notably, RAG,
FedGRU for bigger networks.                                     first introduced by Lewis et al. in 2020 [39]l, stood out
   With the same motivation to address the privacy expo-        as a part of this GenAI world, representing a distinct
sure risk of centralized machine learning, Qi et al. [31]       approach to generating text, informed reasoning, and
presented a fully decentralized FL network, utilizing a         supporting decision-making.
Blockchain-based FL architecture as opposed to the con-            Its application in ITS is not really popular, however,
ventional vanilla framework. The authors employed the           there are some notable works. For instance, Dai et al. [40]
local differential privacy technique to protect vehicle lo-     integrated RAG into autonomous driving systems to en-
cation and utilized GRU to achieve accurate TFP. Perfor-        hance decision-making processes. According to the au-
mance and security comparisons were also made among             thors, the use of RAG in their work addresses the problem
different machine learning models and with/without the          of impractical generated content from the mainstream
use of blockchain. Qi et al. also conducted comparative         foundation models nowadays, such as GPT4 or LLaMa. It
analyses in terms of both performance and security, exam-       helps these models enhance the reliability of their outputs
ining various machine learning models and contrasting           during the generation phase by dynamically retrieving ac-
scenarios with and without blockchain implementation.           curate contextual information from outer databases (e.g.
Concerning the monitoring of traffic congestion, typical        updated traffic rules, driving experiences, or human pref-
                           Severity Estimation by FL




                                                                      Preprocessed Data
                                                                                                                 Accident Severity
                                       Prepocessing
                                                                                                                 Prediction Model

                                                                                                            Estimated Accident Severity

         Sensors
          Data
                           Accident Report Generation by RAG

                               Query




                                                                                          Relevant Chunks
       Comprehensive          Semantic      Embeddings   Similarity                                                                        Traffic
          analysis            Meaning                     Search                                                     Warning
                                                                                                                                          Accident
            of US              Model                      Library                                                Generation Model
                                                                                                                                           Report
        accident data



Figure 1: System workflow. This figure illustrates the key components of the core system, Federated Learning (FL) and
Retrieval-Augmented Generation (RAG). Using preprocessed weather and road traffic sensors, FL predicts accident severity.
Within the RAG framework, the Semantic Meaning Model creates embeddings for documents and queries. The Similarity
Search Library selects the most relevant document chunks based on similarity. Finally, the Warning Generation Model
generates a traffic accident report that incorporates data analysis and future recommendations.



erence). Similarly, Ding et al. [41] utilized RAG for more estimation. Figure 1 illustrates the overall system flow,
controlled generation of traffic scenarios. Specifically,  highlighting the interplay between the key components:
RealGen [41] synthesizes new scenarios by combining        Federated Learning (FL) and Retrieval- Augmented Gen-
behaviors from multiple retrieved examples in a gradient-  eration (RAG).
free manner, using templates or tagged scenarios. This        This integrated system combines the strengths of RAG
in-context learning framework provides versatile gener-    and FL to ensure high-quality outputs while maintaining
ative capabilities, including scenario editing, behavior   data privacy and relevance. FL enhances the accident
composition, and the creation of critical scenarios, thus  severity prediction model while maintaining data privacy.
enhancing the adaptability and precision of synthetic      The RAG system uses integration between the warning
data generation for various applications. Most recently,   generation model and the knowledge retrieval model to
in his Master’s thesis, Mohanan [42] evaluated eight em-   enhance the generation process with relevant external
bedding RAG models for a chatbot tailored to Indian        data, improving context and accuracy.
Motor Vehicle Law.                                            Our training approach starts from data preprocessing.
   As can be seen, prior research typically focuses on a   The preprocessed dataset is then used to train the FL
single module, such as risk estimation or warning gen-     model for traffic accident risk estimation. The predic-
eration, limiting possible support for ITS. This raises an tions, along with the sensors’ real-time data, are utilized
open question: ”Is it possible to integrate all diverse compo-
                                                           as input for the RAG model. The RAG model integrates
nents into a cohesive and comprehensive ITS framework?”    advanced retrieval mechanisms with state-of-the-art lan-
This is where our work positions.                          guage generation capabilities to produce detailed warn-
                                                           ings and reports for traffic accidents.
                                                              To efficiently manage and deploy these components,
3. System Design                                           we use a task orchestration tool. This tool ensures seam-
                                                           less integration and coordination among the various
This article presents a system for predicting and pre-
                                                           models, automates deployment, and scales the system as
venting traffic accidents. It is capable of predicting the
                                                           needed. Additionally, it facilitates robust performance
possible accidents based on the traffic conditions and
                                                           monitoring, ensuring high availability and fault tolerance
other available data, and provides detailed textual com-
                                                           across the system.
ments to the user explaining the grounds leading to such
3.1. Dataset                                                     3.3. Retrieval-Augmented Generation
This study uses US Accidents (2016-2023) dataset 1 [43]      RAG combines an information retrieval component with
from Kaggle, distributed under CC BY-NC-SA 4.0 license.      a text generator model to provide situational information
This dataset comprises a vast collection of over 7.7 mil-    and guidance [44]. In the ITS context, RAG can integrate
lion (7,728,394) traffic accident records, covering 49 statesvarious external data sources to analyze and report traffic
of the USA from February 2016 to March 2023. The ac-         accidents, identifying risk factors and details [45]. This
cident data were collected using multiple APIs that pro-     makes the system more dynamic and adaptable to new
vide streaming traffic incident data captured by various     information. In our system, see Figure 1, RAG provides
entities, including the US and state departments of trans-   textual accident warnings to the end user, along with
portation, law enforcement agencies, traffic cameras, and    explanations of how the estimates were derived.
traffic sensors within the road networks. The data in-       Knowledge retrieval model It is designed to find the
cludes detailed information on accident severity, location,  most relevant information from an external knowledge
time, and weather conditions. This dataset was utilized      base in response to the query. This enhances FL model
to train the FL models for traffic accident prediction.      output and sensor data with relevant information. We
                                                             use SentenceTransformers2 as a retrieval model based on
3.2. Federated Learning                                      similarity search.
                                                             Warning generation model: It is designed to generate
Our application relies on FL model for accident risk esti- new content using language models. It uses the retrieved
mation. FL was selected based on two primary consider- information by the retrieval model and FL-output details
ations: data privacy and collaborative enhancement.          to generate a response. For our system, we use gpt-3.5-
                                                             turbo-06133 to create contextually relevant warnings
     1. Privacy: Addressing privacy concerns, vehicles and detailed reports. The accident report includes the
         in a real scenario do not transmit raw data, which severity of the accident, the location and traffic control
         could potentially reveal sensitive information. In- procedures, and guidance and actions.
         stead, only model parameters will be sent, en-
         suring that individual data remains secure and
         private. This cannot be done with traditional cen- 3.4. Task Orchestration and Monitoring
         tralized learning when all data need to be sent to Effective resource management and device health moni-
         a central server for training.                      toring are essential for enhancing the responsiveness of
     2. Collaboration: When a vehicle updates and smart city services. This requires comprehensive system
         shares its model parameters, it contributes to the monitoring that spans from edge devices to the cloud.
         overall learning process. This collective effort The deployment of applications on edge devices neces-
         leads to an improvement in the overall model’s sitates advanced task orchestration platforms, which
         performance, as it can learn from a wide range of must be carefully selected based on specific requirements.
         diverse and localized inputs. The shared knowl- Given that edge devices typically have limited resources,
         edge enables more accurate and robust risk esti- the chosen tool must operate smoothly under such con-
         mation.                                             straints. For the proposed system, k0s4 has been selected.
                                                             We selected k0s because of its minimal resource consump-
   The training data features provide a detailed view of
                                                             tion on edge devices and its straightforward and rapid
accident records, including the specifics of the accidents,
                                                             implementation process, supported by comprehensive
the geographic locations, the prevailing weather condi-
                                                             documentation and active developer forums. It typically
tions at the time of the accidents, and various environ-
                                                             operates with as little as 1 CPU and 512 MB of RAM on
mental and contextual factors that may be relevant to
                                                             each controller node and 1 GB of RAM on each worker
analyzing the accidents. In a real scenario, the vehicle’s
                                                             node, which aligns well with the capabilities of edge
onboard computing system uses these inputs to contin-
                                                             devices. However, the minimum requirements increase
uously update its local model, learning from real data.
                                                             when the number of worker nodes is increased. Addition-
Once the training is done, the model parameters will be
                                                             ally, numerous monitoring options compatible with k0s
sent to the nearby edge server. The server, after receiving
                                                             are available. k0s is packaged as a single, self-extracting
a sufficient amount of models will start doing the aggre-
                                                             binary which embeds Kubernetes binaries. It has many
gation to get the global model, which is then sent back
                                                             benefits, such as it has no OS level dependencies and
to the participating vehicles. When this whole process
                                                             everything can be, and is, statically compiled.
is complete, we finish one communication round and
continue to the next round.                                  2
                                                                     https://sbert.net/
                                                                 3
                                                                     https://platform.openai.com/docs/models/gpt-3-5-turbo
1                                                                4
    https://www.kaggle.com/datasets/sobhanmoosavi/us-accidents       https://docs.k0sproject.io/stable/
4. System Implementation                                   back to the participants for training in the next round.
                                                           The FL training process concludes after ten communica-
4.1. Risk Estimation with FL                               tion rounds. At this stage, various model architectures,
                                                           encompassing differing layer counts and hyperparame-
4.1.1. Preprocessing
                                                           ters, were evaluated over 50 communication rounds to
The preprocessing phase for our system includes a series observe the trend and convergence in via its performance.
of essential data preparation steps to ensure the quality The selected model outperformed alternatives; models
of the dataset for further analysis:                       with reduced layers demonstrated inferior outcomes (3-
1. Data Cleaning: Duplicated and missing values were 4%), while configurations with additional layers, despite
removed.                                                   a 3% accuracy improvement, incurred prolonged train-
2. Feature Engineering: To enhance the informative- ing duration and converged to local, rather than global,
ness of the dataset, a new feature, called “Comfort_Index” optima. See Table 1 for details.
following Equation 1 is created.
                                                             Table 1
                                                             Risk estimation models comparison: accuracy (%) and training
𝐶𝑜𝑚𝑓 𝑜𝑟𝑡_𝐼 𝑛𝑑𝑒𝑥 = (𝑇 𝑒𝑚𝑝𝑒𝑟𝑎𝑡𝑢𝑟𝑒 − 32) ∗ (𝐻 𝑢𝑚𝑖𝑑𝑖𝑡𝑦/100)      time (hours)
                                                        (1)                        Simple       Chosen     Complex
3. Data Resampling: To address the imbalance issue,              Acuuracy (%)        67.09       71.15       74.42
both random oversampling and undersampling of the                 Time (hours)       3.461       4.042       5.603
data was done to ensure that each label had an equal
distribution.                                               Prediction: The input is a sensors real-time data. This
4. Data Transformation: Done according to feature data goes first through the 5-step preprocessing process
type:                                                       (refer to Sub-section 4.1.1) to get the feature vector, which
                                                            will be fed as input for the model to predict.
      • Categorical Data: One-hot encoding was ap-
         plied to categorical columns, except for “Street,”
        “State,” and the target label “Severity”.           4.2. Warning Generation with RAG
      • Boolean Data: Columns with two distinct values Using the RAG model, we retrieve text passages using
         were binarized, converting them to 0 and 1.        an input sequence. During the generation of the target
      • Numeric Data: Columns containing numeric sequence, we include these passages as additional con-
         data were left unchanged, preserving their origi- text. Our model leverages two components, which are
         nal values.                                        implemented in LangChain5 . A retriever that retrieves
                                                            relevant text snippets in response to a user’s query or
5. Standardization: The dataset was then subjected
                                                            prompt based on knowledge source which is uploaded
to StandardScaler standardization. This process ensured
                                                            using built-in document loader from LangChain.
that all features had consistent scales and values within
                                                               In our system, we rely on the US traffic accident
a particular range.
                                                            database as an external knowledge source, containing a
                                                            comprehensive analysis of US traffic accident data [46].
4.1.2. FL Training and Prediction                           This report provides insight into preventive measures
To simulate a real-world scenario using our chosen and policy recommendations for decreasing traffic acci-
dataset, we distributed the data across several nodes and dents in the US based on detailed analyses by state, time,
established certain assumptions. This section will elabo- and contributing factors such as weather. The retrieval
rate on those details.                                      process begins with loading documents using a tool in
Distribution: The data is divided into five equal parts, LangChain. This process is enhanced by a splitter tool,
corresponding to five nodes in the system. We also make also integrated into LangChain, designed to segment ex-
sure the number of samples of each label is distributed tensive texts into smaller chunks based on a specified
equally among clients.                                      chunk size by examining characters recursively which is
Model Training: Each client trains its local model, con- crucial for the efficient handling of large textual data.
sisting of three fully connected layers. Training specifi-     For the creation of text embeddings, we employ Hug-
cations include the use of the cross-entropy loss function, gingFaceEmbeddings, a specialized embedding model
                                                                                               6
Adam optimizer with a learning rate of 1e-3, and a batch from the Hugging Face library within LangChain. This
size of 32. After ten training epochs, the locally trained  model   transforms  the  segmented   text chunks into numer-
models are aggregated by the server into a global model,    ical vectors,  facilitating  their  computational   handling.
and the global parameters are saved at each checkpoint, 5 https://www.langchain.com/
here at each communication round, before being sent 6 https://huggingface.co/
                                                                            Traffic Accident Report:
To store these embedding vectors in a vector store, we                      Accident Data:
                                                                            - Street: US Highway 22

utilize the FAISS library7 , a robust vector database. It en-               - State: NJ
                                                                            - Start Latitude: 40.65562
                                                                            - Start Longitude: -74.40149
ables effective similarity search by identifying text chunk                 - Crossing: False
                                                                            - Give Way: False
                                                                            - Junction: False
vectors most similar to the question vector. This process                   - No Exit: False
                                                                            - Railway: False

is vital to determine which portions of the knowledge                       - Roundabout: False
                                                                            - Station: True
                                                                            - Stop: False
source are most pertinent to the input query. This is for                   - Traffic Calming: False
                                                                            - Traffic Signal: False
                                                                            - Distance (mi): 0.167
later retrieval at query time based on the k argument                       - Temperature (F): 56.09
                                                                            - Wind Chill (F): 72.98

which finds the top k most relevant text chunk vectors                      - Humidity (%): 42.97
                                                                            - Pressure (in): 29.59
                                                                            - Visibility (mi): 9.99
for each query. Table 2 summarizes the RAG parameters                       - Wind Direction: NNW
                                                                            - Wind Speed (mph): 9.19

used.
                                                                            - Precipitation (in): 0.0016
                                                                            - Weather Condition: Fair
                                                                            - Sunrise/Sunset: Night

   The generator creates a more detailed, factual, and rel-                 - Comfort Index: 10.35
                                                                            - Severity: 2
                                                                            Analysis of the Accident: Based on the accident data, the accident occurred on US Highway 22
evant response based on the original input and retrieved                    in NJ. The accident did not occur at a crossing, junction, or railway. However, there was a
                                                                            station nearby. The distance of the accident location from the starting point was approximately

documents. The original input represents the severity
                                                                            0.167 miles. The weather conditions at the time of the accident were fair, with a temperature
                                                                            of 56.09°F, wind chill of 72.98°F, humidity of 42.97%, pressure of 29.59 in, and visibility
                                                                            of 9.99 miles. The wind was coming from the NNW direction at a speed of 9.19 mph, and there

of an accident, derived from the FL output and comple-                      was a slight precipitation of 0.0016 inches.
                                                                            Recommendations for Future Prevention:
                                                                            1. Increase Public Awareness: It is crucial to educate the public about safe driving practices
mented by sensor real-time data. For the generation                         and the importance of following traffic rules and regulations. Public awareness campaigns
                                                                            should be conducted to promote responsible driving behavior and reduce the likelihood of

of coherent and contextually relevant text, the original                    accidents.
                                                                            2. Improve Road Safety Measures: Implementing traffic calming measures, such as speed bumps
                                                                            or roundabouts, can help reduce the risk of accidents. Additionally, installing traffic signals
input and the retrieved documents are fed into gpt-3.5-                     at appropriate locations can improve traffic flow and prevent collisions.
                                                                            3. Enhance Emergency Response: Ensure that emergency services, such as fire and medical teams,
                                                                            are well-equipped and trained to handle road traffic accidents efficiently. Regular drills
turbo-0613, a sophisticated pre-trained language model.                     and training sessions should be conducted to improve response times and minimize casualties.
                                                                            4. Regular Safety Inspections: Regular inspections of roads, signage, and traffic signals

Based on the content of these documents, the model gen-                     should be conducted to identify and address any potential safety hazards. Prompt repairs and
                                                                            maintenance should be carried out to ensure the safety of drivers and pedestrians.
                                                                            5. Collaboration with Law Enforcement: Collaborate with local law enforcement agencies to
erates coherent and contextually relevant text grounded                     enforce traffic laws and regulations effectively. Increase police presence on the roads to
                                                                            deter reckless driving and enforce speed limits.
                                                                            6. Continuous Monitoring of Weather Conditions: Implement a system to continuously monitor
in real-world information. Figure 2 illustrates an example                  weather conditions and provide real-time updates to drivers. This will help drivers make
                                                                            informed decisions and adjust their driving behavior accordingly during adverse weather

of a traffic accident report generated by RAG.                              conditions.
                                                                            By implementing these recommendations, we can work towards preventing similar accidents in
                                                                            the future and ensuring the safety of all road users.




Table 2                                                                     Figure 2: An example of a traffic accident report generated
Summary of RAG parameters used                                              by RAG
            Parameter                              Value
     Text splitter type                RecursiveCharacterTextSplitter
     Chat model
     ChatOpenAI model name
                                                ChatOpenAI
                                             gpt-3.5-turbo-0613             5. System Evaluation
     Vector store                                   FAISS
     Embeddings type                                       To assess the system’s performance, several key metrics
                                         HuggingFaceEmbeddings
     Embeddings model name        sentence-transformers/all-mpnet-base-v2
                                                           were employed. We want to ensure that all the com-
     Search type                                 similarity
     Chunk size                                      2000  ponents work perfectly both independently and in the
                                                           integrated system. First, we monitored the accuracy of
                                                           the FL model for risk estimation, assessing its ability to
                                                           predict traffic accident severity. This evaluation utilized
4.3. Task Orchestration and Monitoring                     the dataset for training the model. Additionally, the qual-
As discussed in Sub-section 3.4 we opted for k0S, which ity and relevance of warnings and reports generated by
is ideal for our needs and simple in implementation. We the RAG model were assessed. The system’s prompt re-
used Lens IDE8 which is a Kubernetes IDE to manage sponsiveness was also tested, particularly how quickly
the cluster and monitoring of the whole system. It al- it can generate alerts and warnings based on incoming
lows for comprehensive oversight of nodes, pods, and data. Furthermore, the resource management aspect was
resource monitoring. Monitoring involves tracking the evaluated to ensure that the system’s resource usage is
usage of CPU, memory, storage, and network bandwidth, optimized and well-maintained. The developed system
and monitoring device safety and functionality to detect was deployed and tested on a real cluster of three nodes
any potential problem. We containerized our application with k0s equipped with the monitoring application.
using Docker9 and deployed our application using Lens
IDE and k0s task orchestration tool. We used Cluster met- 5.1. Risk Estimation Evaluation
rics in the Lens IDE to monitor the resources efficiently.
                                                           5.1.1. Accuracy
                                                                            We monitor the training process of the FL model in terms
                                                                            of accuracy, loss, and convergence. The training for 50
                                                                            communication rounds with 5 training clients takes up
7
    https://faiss.ai/index.html                                             to 4.042 hours.
8
    https://k8slens.dev/                                                       Figure 3 plots the training accuracy in the upper graph
9
    https://www.docker.com/
and the training loss in the lower graph. The model                                                                    However, as input sizes increase to 1,000 and 10,000,
demonstrates convergence approximately by round 30                                                                  the total latency grows more substantially, hitting 0.4487
at 71.15%, as depicted in the upper plot. Initially, model                                                          seconds for 10,000 inputs. This increment continues,
accuracy exhibits an upward trend from round 0 to 30,                                                               even more sharply, with the model taking 0.9463 seconds
albeit with fluctuations observed around rounds 15-17                                                               to predict outcomes for 100,000 inputs concurrently.
and 21. Subsequently, after round 30, the risk estimation                                                              Overall, this evaluation outcome underscores the FL
model appears to have reached a plateau in accuracy,                                                                model’s scalability with a total latency, not only for small
becoming converged. This is also reflected in the lower                                                             input batches but also optimized for larger ones. Never-
graph of training loss.                                                                                             theless, it should be noted that the measured time can be
                                                                                                                    different among different working devices.
     0.720                                           Training accuracy
             #0
     0.715   #1
             #2
     0.710
     0.705
             #3
             #4                                                                                                     5.2. Accident Warning Report Evaluation
     0.700
     0.695                                                                                                         To evaluate the quality of accident warning report gener-
     0.690   0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849
                                                           Round                                                   ated by RAG, we have used correctness, relevance, and
 0.0235                                              Training loss
                                                                                                                #0
                                                                                                                   faithfulness as criteria to assess LLM outputs10 . We used
                                                                                                                #1
 0.0230                                                                                                         #2
                                                                                                                #3 gpt-3.5-turbo-0613 for the evaluation task to contextually
 0.0225                                                                                                         #4

 0.0220
                                                                                                                   analyze and interpret generated reports according to the
 0.0215                                                                                                            criteria.
 0.0210   0 1 2 3 4 5 6 7 8 9 10111213141516171819202122232425262728293031323334353637383940414243444546474849
                                                         Round
                                                                                                                       Correctness is based on the LLM’s internal knowledge.
                                                                                                                   However, given the potential unreliability of the LLM’s
Figure 3: Risk estimation training accuracy (top) and loss
                                                                                                                   knowledge base, we enhanced the evaluation method by
(bottom)
                                                                                                                   incorporating reference labels. This provides an exter-
                                                                                                                   nal benchmark for correctness. The evaluation process
      It is, however, possible for low power-resource devices produces a dictionary containing key metrics: “score”, a
to terminate the training process at an earlier stage, such binary integer from 0 to 1 indicating compliance with
as after round 10 or 20, with negligible tradeoffs in accu- the criteria, “value”, which is either ”Y” (Yes) or ”N” (No)
racy.                                                                                                              based on the score, and “reasoning”, which outlines the
                                                                                                                   LLM’s chain of thought. Relevance evaluates the rele-
5.1.2. Total latency trends                                                                                        vance    and focus of the generated answer in relation to
                                                                                                                   the provided prompt. Faithfulness assesses the factual
The bar graph (referred to Fig. 4) depicting the total la- consistency of the generated answer against the given
tency for predictions reveals a clear trend: as the number context and reference documents. Using this approach,
of inputs processed simultaneously increases, so does the we ensure not only that the generated content meets the
time required for prediction.                                                                                      prompt’s specific requirements. It also remains true to
                                                    Time response
                                                                                                                   the factual information provided in the reference ma-
                                                                                                     0.9463
                                                                                                                   terial. Figure 5 illustrates an example of RAG output
   0.8
                                                                                                                   evaluation.
                                                                                                                       Based on correctness, relevance, and faithfulness cri-
   0.6                                                                                                             teria, the evaluation shows that the output accurately
                                                                                                                   represents an actual quote. Throughout the evaluation
Seconds




                                                                                      0.4487
                                                                       0.4216
   0.4       0.3931        0.3965         0.4009         0.4062
                                                                                                                   output, all necessary elements are addressed in a com-
                                                                                                                   prehensive, well-structured, and well-written manner.
   0.2
                                                                                                                   Based on the evaluation output, the response summa-
                                                                                                                   rizes accident data and provides a comprehensive anal-
   0.0       Single      Multiple-10    Multiple-50   Multiple-100   Multiple-1k   Multiple-10k   Multiple-100k
                                                         Type                                                      ysis of weather conditions at the time of the accident,
                                                                                                                   including visibility and severity. Additionally, it provides
Figure 4: Total latency trends of risk estimation
                                                                                                                   recommendations for preventing accidents in the future
                                                                                                                   relevant to the reference.
      Starting from a swift 0.3931 seconds for a single input,
the latency moderately rises for batches of 10 and 100
inputs, reaching 0.4062 seconds, suggesting the model
handles small to moderate increases in input size effi-
ciently.                                                                                                           10
                                                                                                                      https://python.langchain.com/docs/guides/evaluation/string/
                                                                                                                     criteria_eval_chain
Correctness_criteria:
{'reasoning': To determine if the submission meets the criteria, we need to evaluate the
                                                                                              6. Discussion and Future Work
correctness, accuracy, and factual nature of the submission.
1. Check if the submission correctly presents the accident data, including the street,
state, latitude, longitude, and various factors related to the accident.                      The development and integration of FL and RAG into an
                                                                                              ITS service presents several key findings and areas for
2. Verify if the submission accurately describes the weather conditions at the time of the
accident, including temperature, wind chill, humidity, pressure, visibility, wind
direction, and precipitation.
3. Assess whether the submission accurately provides information about the severity of the    further research.
accident, distance, sunrise/sunset, and comfort index.
4. Evaluate if the recommendations for future prevention are reasonable and relevant to
                                                                                                 Our FL model demonstrated good performance in pre-
the accident scenario.
Based on the above reasoning, the submission meets the criteria if all the above conditions
                                                                                              dicting traffic accident severity, achieving a convergence
are satisfied. 'score': 1, 'value': 'Y'}
                                                                                              point after approximately 30 communication rounds.
Relevance_criteria:
{'reasoning': To determine if the submission meets the criteria of relevance, we need to      This suggests that FL can effectively utilize distributed
compare the content of the submission with the provided data.
We will check if the submission accurately refers to a real quote from the text.
                                                                                              data for predictions while maintaining data privacy. Ad-
- The submission provides a detailed analysis of the accident data, including the street,
state, and various accident factors. It also mentions the weather conditions, severity,
                                                                                              ditionally, the scalability of the FL model was evident
and recommendations for future prevention based on the given data.
                                                                                              from the total latency evaluations, which showed reason-
- The submission accurately reflects the information provided in the data.
- Therefore, the submission meets the criteria of relevance.                                  able prediction times even with increasing input sizes,
Based on the above reasoning, the conclusion is that the submission meets all the
criteria.'score': 1, 'value': 'Y'}
                                                                                              indicating the model’s applicability in real scenarios.
Faithfulness_criteria:                                                                           The RAG model generated detailed and contextually
                                                                                              relevant reports and warnings based on simulated real-
The assistant's response is faithful to the reference context. It accurately summarizes
the accident data provided in the user question and provides a detailed analysis of the
accident. It also offers recommendations for future prevention. The response is
comprehensive and covers all the relevant aspects of the accident data.                       time inputs. This was validated through evaluations fo-
                                                                                              cusing on correctness, relevance, and faithfulness. The
Figure 5: An example of RAG output evaluation criteria                                        integration of real-time data and FL with external knowl-
                                                                                              edge sources ensured that the generated content was not
                                                                                              only accurate but also practical for end-users, such as
5.3. Task Orchestration and Monitoring                                                        traffic management authorities.
                                                                                                 The use of k0s for task orchestration proved to be effec-
We utilized a simplified demonstration setup comprising                                       tive, enabling seamless integration and management of
one controller node and two worker nodes to test the                                          various system components. The monitoring capabilities
deployment of the system to the distributed environment.                                      provided by Lens IDE ensured the system’s robustness
The technical characteristics of our system are as follows:                                   and allowed for efficient resource management. Testing
The controller node is equipped with an Intel Core i7-                                        on a simulated cluster confirmed the system’s reliability
6700HQ CPU, an NVIDIA GeForce GTX 960M GPU, and                                               and scalability.
16 GB of RAM. One of the worker nodes is identical to                                            While our system shows promising results, several
the controller node, featuring an Intel Core i7-6700HQ                                        areas warrant further investigation and development.
CPU, an NVIDIA GeForce GTX 960M GPU, and 16 GB of                                             Future work should focus on strengthening privacy-
RAM. The other worker node is equipped with an Intel                                          preserving techniques within the FL framework.
Core i5-1135G7 CPU and 16 GB of RAM. The system                                                  In our design of the FL model, we prioritized simplicity
was successfully deployed and operated as expected, ef-                                       and efficiency to predict accident severity. This approach
fectively generating warnings in response to simulated                                        was intended to minimize the computational load. For
input data. Additionally, we employed Lens IDE to moni-                                       future work, it would be advantageous to enhance the FL
tor data outputs and to oversee the resource usage on the                                     model by exploring other lightweight models. This could
controller node. A screenshot of the Lens IDE is provided                                     potentially improve the accuracy while maintaining the
in Figure 6 to demonstrate how the cluster is controlled.                                     model’s efficiency.
                                                                                                 Exploring the feasibility of using transfer learning
                                                                                              methods to transfer knowledge gained about each state
                                                                                              or district to other districts or states can be beneficial.
                                                                                                 Developing user-friendly interfaces for traffic manage-
                                                                                              ment authorities and end-users will be crucial for effec-
                                                                                              tive system adoption. This involves designing intuitive
                                                                                              dashboards and visualization tools to present predictions
                                                                                              and warnings in an accessible manner. Implementing
                                                                                              and testing the system in real-world smart city environ-
                                                                                              ments will provide valuable insights into its performance
                                                                                              and scalability. Collaborations with city authorities can
                                                                                              facilitate this process and help refine the system based
Figure 6: Lens IDE logs output                                                                on practical feedback.
7. Conclusion                                                  [6] T. Alam, Cloud-based iot applications and their
                                                                   roles in smart cities,        Smart Cities 4 (2021)
This paper presents a service in smart cities integrating          1196–1219.
FL and RAG to enhance traffic risk prediction and man-         [7] N. H. Motlagh, et al., Edge computing: The com-
agement in smart cities. Our findings demonstrate the              puting infrastructure for the smart megacities of
system’s accuracy, efficiency, and potential for real-world        the future, Computer 55 (2022) 54–64.
applications. The FL model achieved a good predictive          [8] N. H. Motlagh, et al., Digital twins for smart spaces-
performance while preserving data privacy. The RAG                 beyond iot analytics, IEEE internet of things journal
model produced detailed and relevant reports, aiding in            (2023).
effective traffic management.                                  [9] M. A. Rahman, M. S. Hossain, A. J. Showail, N. A.
   Task orchestration using k0s ensured seamless integra-          Alrajeh, A. Ghoneim, Ai-enabled iiot for live smart
tion and robust performance monitoring. Future work                city event monitoring, IEEE Internet of Things
will focus on enhancing privacy, scalability, and real-            Journal (2021).
world testing, aiming for broader deployment and in-          [10] Peltonen, Ella and others, The many faces of edge
tegration. Our system offers a promising approach to               intelligence, IEEE Access 10 (2022) 104769–104782.
addressing urban safety challenges, contributing to the            doi:10.1109/ACCESS.2022.3210584 .
development of smarter and safer cities.                      [11] L. Lovén, et al., Mobile road weather sensor cali-
                                                                   bration by sensor fusion and linear mixed models,
                                                                   PloS one 14 (2019) e0211702.
Acknowledgments                                               [12] V. Karsisto, L. Lovén, Verification of road surface
We would like to thank Prof. Timo Ojala for fruit-                 temperature forecasts assimilating data from mo-
ful discussions on the topic. This research is finan-              bile sensors, Weather and Forecasting 34 (2019)
cially supported by the Research Council of Finland                539–558.
(former Academy of Finland) through the 6G Flagship           [13] H. Kokkonen, et al., Autonomy and intelligence in
Program (346208) and UrBOT project (323630), by Busi-              the computing continuum: Challenges, enablers,
ness Finland Neural pub/sub research project (diary num-           and future directions for orchestration, arXiv
ber 8754/31/2022), by EU Horizon 2020 IDUNN project                preprint arXiv:2205.01423 (2022).
(101021911), and by the Emerging projects program, In-        [14] L. Lovén, et al., Edison: An edge-native method and
fotech Oulu.                                                       architecture for distributed interpolation, Sensors
                                                                   21 (2021) 2279.
                                                              [15] M. Bortnikov, A. Khan, A. M. Khattak, M. Ahmad,
References                                                         Accident recognition via 3d cnns for automated traf-
                                                                   fic monitoring in smart cities, in: Advances in Com-
 [1] Our World in Data, Urbanization, 2023. URL:                   puter Vision: Proceedings of the 2019 Computer
     https://ourworldindata.org/urbanization, accessed:            Vision Conference (CVC), Volume 2 1, Springer,
     November 13, 2023.                                            2020, pp. 256–264.
 [2] Q. Wang, L. Li, The effects of population aging,         [16] S. Uma, R. Eswari, Accident prevention and safety
     life expectancy, unemployment rate, population                assistance using iot and machine learning, Jour-
     density, per capita gdp, urbanization on per capita           nal of Reliable Intelligent Environments 8 (2022)
     carbon emissions, Sustainable Production and Con-             79–103.
     sumption 28 (2021) 760–774.                              [17] Y. Liu, J. James, J. Kang, D. Niyato, S. Zhang,
 [3] E. Gilman, et al., Addressing data challenges to              Privacy-preserving traffic flow prediction: A fed-
     drive the transformation of smart cities, ACM                 erated learning approach, IEEE Internet of Things
     Transactions on Intelligent Systems and Technol-              Journal 7 (2020) 7751–7763.
     ogy (2024).                                              [18] B. McMahan, et al., Communication-efficient learn-
 [4] A. Adel, Future of industry 5.0 in society: Human-            ing of deep networks from decentralized data, in:
     centric solutions, challenges and prospective re-             Artificial intelligence and statistics, PMLR, 2017, pp.
     search areas, Journal of Cloud Computing 11 (2022)            1273–1282.
     1–15.                                                    [19] H. Koziolek, N. Eskandani, Lightweight kuber-
 [5] World Health Organization, Global Status Report               netes distributions: a performance comparison of
     on Road Safety 2018: Summary, Geneva: World                   microk8s, k3s, k0s, and microshift, in: Proceedings
     Health Organization; 2018 (WHO/NMH/NVI/18.20).                of the 2023 ACM/SPEC International Conference
     Licence: CC BY-NC-SA 3.0 IGO, 2018. License: CC               on Performance Engineering, 2023, pp. 17–29.
     BY-NC-SA 3.0 IGO.                                        [20] F. Hasan, et al., Iot based traffic management, ac-
                                                                   cident detection, and accident prevention system
     using machine learning method, in: Proceedings                 Transactions on Consumer Electronics (2023).
     of the 2nd International Conference on Computing          [34] Y. Yuan, et al., Fedrd: Privacy-preserving adap-
     Advancements, 2022, pp. 249–253.                               tive federated learning framework for intelligent
[21] L. Yu, B. Du, X. Hu, L. Sun, L. Han, W. Lv, Deep               hazardous road damage detection and warning,
     spatio-temporal graph convolutional network for                Future Generation Computer Systems 125 (2021)
     traffic accident prediction, Neurocomputing 423                385–398. URL: https://www.sciencedirect.com/
     (2021) 135–147.                                                science/article/pii/S0167739X21002302. doi:https:
[22] Z. Zhou, et al., Spatio-temporal feature encoding              //doi.org/10.1016/j.future.2021.06.035 .
     for traffic accident detection in vanet environment,      [35] Y.-C. Wang, J. Xue, C. Wei, C.-C. J. Kuo, An
     IEEE Transactions on Intelligent Transportation                overview on generative ai at scale with edge-cloud
     Systems 23 (2022) 19772–19781.                                 computing (2023).
[23] K. G. Le, P. Liu, L.-T. Lin, Determining the road         [36] N. Rane, Role of chatgpt and similar generative
     traffic accident hotspots using gis-based temporal-            artificial intelligence (ai) in construction industry,
     spatial statistical analytic techniques in hanoi, viet-        Available at SSRN 4598258 (2023).
     nam, Geo-spatial Information Science 23 (2020)            [37] R. A. Bakir, S. A. M. Attia, Advancing urban health
     153–164.                                                       assessment through generative ai-driven indicators:
[24] S. Sharma, V. Chang, U. S. Tim, J. Wong, S. Ga-                Gcr case study (2023).
     dia, Cloud and iot-based emerging services systems,       [38] D. Impedovo, V. Dentamaro, G. Pirlo, L. Sarcinella,
     Cluster Computing 22 (2019) 71–91.                             Trafficwave: Generative deep learning architecture
[25] S. A. Elsagheer Mohamed, K. A. AlShalfan, Intel-               for vehicular traffic flow prediction, Applied Sci-
     ligent traffic management system based on the in-              ences 9 (2019) 5504.
     ternet of vehicles (iov), Journal of advanced trans-      [39] P. Lewis, et al., Retrieval-augmented generation
     portation 2021 (2021) 1–23.                                    for knowledge-intensive nlp tasks, Advances in
[26] Z. Zhou, X. Chen, E. Li, L. Zeng, K. Luo, J. Zhang,            Neural Information Processing Systems 33 (2020)
     Edge intelligence: Paving the last mile of artificial          9459–9474.
     intelligence with edge computing, Proceedings of          [40] X. Dai, et al., Vistarag: Toward safe and trustworthy
     the IEEE 107 (2019) 1738–1762.                                 autonomous driving through retrieval-augmented
[27] S. Wan, S. Ding, C. Chen, Edge computing enabled               generation, IEEE Transactions on Intelligent Vehi-
     video segmentation for real-time traffic monitoring            cles (2024).
     in internet of vehicles, Pattern Recognition 121          [41] W. Ding, Y. Cao, D. Zhao, C. Xiao, M. Pavone,
     (2022) 108146.                                                 Realgen: Retrieval augmented generation for
[28] R. Ke, Z. Cui, Y. Chen, M. Zhu, H. Yang, Y. Wang,              controllable traffic scenarios,       arXiv preprint
     Edge computing for real-time near-crash detec-                 arXiv:2312.13303 (2023).
     tion for smart transportation applications, arXiv         [42] M. Mohanan, Competitive Analysis of Embedding
     preprint arXiv:2008.00549 (2020).                              Models in Retrieval-Augmented Generation for In-
[29] H. Nguyen, T. Nguyen, T. Leppänen, J. Partala,                 dian Motor Vehicle Law Chat Bots, Ph.D. thesis,
     S. Pirttikangas,       Situation awareness for au-             Dublin Business School, 2024.
     tonomous vehicles using blockchain-based service          [43] S. Moosavi, et al., Accident risk prediction based
     cooperation, in: International Conference on Ad-               on heterogeneous sparse data: New dataset and
     vanced Information Systems Engineering, Springer,              insights, in: Proceedings of the 27th ACM SIGSPA-
     2022, pp. 501–516.                                             TIAL International Conference on Advances in Ge-
[30] X. Zhou, R. Ke, H. Yang, C. Liu, When intelligent              ographic Information Systems, 2019, pp. 33–42.
     transportation systems sensing meets edge comput-         [44] H. Li, Y. Su, D. Cai, Y. Wang, L. Liu, A survey on
     ing: Vision and challenges, Applied Sciences 11                retrieval-augmented text generation, arXiv preprint
     (2021) 9680.                                                   arXiv:2202.01110 (2022).
[31] Y. Qi, M. S. Hossain, J. Nie, X. Li, Privacy-preserving   [45] D. Cai, Y. Wang, L. Liu, S. Shi, Recent advances in
     blockchain-based federated learning for traffic flow           retrieval-augmented text generation, in: Proceed-
     prediction, Future Generation Computer Systems                 ings of the 45th International ACM SIGIR Confer-
     117 (2021) 328–337.                                            ence on Research and Development in Information
[32] C. Xu, Y. Mao, An improved traffic congestion                  Retrieval, 2022, pp. 3417–3419.
     monitoring system based on federated learning, In-        [46] N. D. F. FIRE, E. M. R. HANDBOOK, Road traffic
     formation 11 (2020) 365.                                       accident handbook (JUNE,2009).
[33] A. Chougule, et al., A novel framework for traf-
     fic congestion management at intersections using
     federated learning and vertical partitioning, IEEE