=Paper=
{{Paper
|id=Vol-3392/paper3
|storemode=property
|title=On Efficient Single-Core Execution of Agent-Based Epidemiological Models with Contact-Tracing Transmission
|pdfUrl=https://ceur-ws.org/Vol-3392/paper3.pdf
|volume=Vol-3392
|authors=Vladyslav Sarnatskyi,Igor Baklan
|dblpUrl=https://dblp.org/rec/conf/cmis/SarnatskyiB23
}}
==On Efficient Single-Core Execution of Agent-Based Epidemiological Models with Contact-Tracing Transmission==
On Efficient Single‐Core Execution of Agent‐Based Epidemiological Models with Contact‐Tracing Transmission Vladyslav Sarnatskyia and Igor Baklana a National Technical University of Ukraine “Igor Sikorsky Kyiv Polytechnic Institute”, 37, Prosp. Peremohy, Kyiv, 03056, Ukraine Abstract In this work, we present our research on efficiency of single-core execution of agent-based epidemiological models with contact-tracing transmission. We performed an analysis of existing epidemiological agent-based modeling tools from the perspective of their performance, which reflects computational time needed to perform a single simulation step of a model with fixed number of agents. We developed several simulation algorithms, based on different model types, and showed some optimizations to maximize the performance of them. We designed a metric to compare simulation step execution times on a single CPU core and used to estimate a performance of underlying simulation algorithms in the existing methods with developed one. Even though, as it will be discussed below, it is very difficult to estimate an execution time of some algorithm without actual access to it, we present the results of our method on both modern and old CPU. Keywords 1 Modeling, epidemiology, agent-based models, individual-based models, IBMs, infectious disease spread modeling, performance, single-thread, optimization 1. Introduction Global epidemilogical events can make significant damage to global economy [1]. In order to reduce both epidemiological and economic impact, active measures must be considered. They include, but not limited to full or partial quarantine, obligatory face mask regime, vaccinations etc. However, even though some measures like full quarantine can significantly lower disease spread rate by flattening the curve of new cases, economic impact can be unbearable for certain communities. This is why a good balance between epidemiological and economic risks must be hold. But, determining the exact threshold of disease spread countermeasures severity is problematic without modeling particular epidemics. First research in the field of infectious disease spread modeling is dated early XX century. In their work, Anderson McKendrick and Willian Kermack [2] considered modeling an epidemic by splitting the population into several compartments with different stages of a disease. These stages can be, for example: (S)usceptible — individuals, who can be infected in the future, (I)nfectious — individuals, who were infected, show symptoms and can infect others. After assigning the initial number of individuals in each compartment, the dynamics of its change can be modeled as a system of differential equations. Work of McKendrick and Kermack served as the foundation for the research and development of compartmental epidemiological models, which are used these days to model various diseases like dengue fever, COVID-19, etc [3-11]. 1 The Sixth International Workshop on Computer Modeling and Intelligent Systems (CMIS-2023), May 3, 2023, Zaporizhzhia, Ukraine EMAIL: vladysarn@protonmail.com (V. Sarnatskyi); iaa@ukr.net (I. Baklan) ORCID: 0000−0001−5231−6136 (V. Sarnatskyi); 0000−0002−5274−5261 (I. Baklan) © 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings (CEUR-WS.org) Proceedings With the invention of computers and increase in available computation power, another type of epidemiological model appeared — agent- or individual-based. This type bears some resemblance to compartmental by considering the number of individuals in each compartment. But, instead of modeling it via equations, each individual is modeled separately. It means, that each individual is a separate entity, which can move inside an environment and spread the disease by contacting other individuals. Such significant increase in model granularity allows the users to model complex stochastic behaviors, common for the real world. But, on the other hand, required computational resources, needed to run the model, also increased rapidly. This can be explained easily by the following example: in a community of individuals, which can be considered a medium-sized model, a single simulation step requires computation of agent-to-agent contact events. In this work, we explore the possibilities to optimize both time and space algorithmic complexity of simulation algorithm. 2. Problem statement The present study aims to address the challenges of infectious disease modeling through the development of a mathematical framework for a generalized agent-based epidemiological model. As opposed to compartmental epidemiological modeling, agent-based considers modeling population as a set of distinct entities – agents. These agents can perform some activities and interact with each other. This agent-to-agent interaction is the main way of infection disease transmission. However, analysis of it requires iterating through all of agents pairs, making its algorithmic complexity quadratic, which is a problem for a large models with significant number of agents. For example, single simulation of FluTE model with number of agents equal to the population of United States of America takes about 6 hours to run on a cluster with 32x CPUs [12]. The ultimate goal of this research is to enhance the computational efficiency of this process. Specifically, the study has the following objectives: Firstly, we aim to formulate a mathematical apparatus for a general agent-based epidemiological model. To achieve this, we will identify critical factors that influence disease transmission and develop a mathematical framework that accurately captures these dynamics. Secondly, we aim to explore ways to optimize the complexity of the simulation algorithm, which can be computationally expensive, especially when dealing with large populations. The optimization of the simulation algorithm will involve identifying ways to reduce its computational complexity while maintaining its accuracy. Thirdly, we aim to compare the performance of the model, based on optimized algorithm with previously developed models. By conducting a comparative analysis, we aim to demonstrate the effectiveness of the proposed optimization strategies in enhancing the efficiency of simulations. We will use standard evaluation metrics, such as the time required for simulation, to compare the performance of the optimized model with that of existing models. 3. Literature review In this section, we describe our findings on existing tools and methods of agent-based epidemiological modeling. These tools can be divided into three distinct groups: Programming language modules/libraries; General-purpose modeling environments; Specialized software for epidemiological modeling. We review each group separately. 3.1. Programming language modules/libraries Swarm [13] is a set of libraries for Java programming languages which defines a model as a set of interacting agents, dynamics of their evolution and the schedule of events. This package is no longer maintained and has its successor — Ascape [14], which was simplified from the perspective of programming interface. Ascape also delivers a simple graphical interface for model adjustments and inspection and spreadsheet data export. 3.2. General‐purpose modeling environments NetLogo [15] is modeling environment, consisting of the following elements: Subject-oriented programming language, used for describing agents behavior, their inner states evolution and interactions between each other; Tool set for the analysis of experiments data as the result of modeling. Overall, NetLogo allows one to flexibly define models, but this flexibility comes with certain limitations. First of all, as showed in [16], the overall performance of NetLogo simulation engine is significantly lower, compared to similar models implemented using other tools. NetLogo features fixed modeling space geometry in the form of rectangular grid, where each agent is ``attached'' to a certain cell of it and can interact with agent on adjacent cells. Despite the interaction of agents positioned far from each other can be implemented, it is not supported internally, can be cumbersome and limits computational performance of a simulation. RePast [17] is a software tool set, designed for modeling of complex systems. There are several implementations of it, based on different programming languages such as Java, C++. Repast also features an implementation, suitable for high-parallel environments [18]. 3.3. Specialized software for epidemiological modeling As each model with software implementation can be viewed as specialized software for epidemiological modeling, we review only those, offering some amount of configurability. A Framework for Reconstructing Epidemic Dynamics (FRED) [19] is a software for agent-based epidemiological modeling, which uses a synthetic population database to represent every individual in a specific geographic region. The database contains geographically located synthetic households, group quarters, schools, and workplaces that reflect the actual spatial distribution of the area. Each agent has associated demographic and socioeconomic information and locations for their activities. The synthetic population closely matches the available census data for the United States with high spatial resolution. The model runs a discrete-time simulation with a time step of one day, allowing agents to interact with other agents at their shared activity locations, potentially transmitting diseases. The simulation records each infection transmission event to evaluate control measures and their impact on sub-populations. Agents can dynamically alter their daily activities. The fixed simulation step of 1 day allows for performance optimizations and is not a severe limitation for diseases with long latency periods, but it may be a limitation for diseases with short latency and infectious periods or short-term interventions. FRED was originally developed to study influenza, but it can be customized to investigate other infectious diseases, such as measles, by making adjustments to the configuration files that describe the natural history of the disease. Unfortunately, to our knowledge, standalone software for agent-based epidemiological modeling with wide range of configurability options similar to GLEaMviz [20] doesn't exist. 4. Method 4.1. General notation In this subsection, we describe a general mathematical apparatus of agent behavior within an epidemiological model with contact-tracing transmission. Simulation is done in discrete steps, which can be, for example, days. These steps form the ¯ simulation step set 𝑇 1 … 𝑁 . Agents, representing individuals, who are somehow related to a modeled disease (can be infected, serve as vectors etc.) create the finite set A. Each agent has its schedule, dictating the geographical position of it. This position shouldn't be though of as coordinates in some space, as it would limit the model, but rather an occupancy of some entity, for example the household or the school. In order to model the change of agents' location during the day, we split each simulation step into a finite number ¯ of smaller time steps from the set 𝐻 1 … 𝑁 . Thus, the schedule function can be defined as: 𝜙: 𝐴 𝑇 𝐻 → 𝑃, (1) 2 where P is a finite set of possible agents' location (henceforth, we assume each function can depend on some external model parameters, even though it's not displayed in the notation). Instead of using the compartments, agent-based models assign infection states to each agent, often mirroring respective compartments of compartmental models. This assignment can be expressed as infection state function: 𝜉: 𝐴 𝑇 → 𝑆, (2) where S is a finite set of infection states. Naturally, the infection state of an agent can change by two separate processes: intrinsic and extrinsic. Intrinsic one considers events, occurring inside the organism, which is modeled by the agent. These events can be driven by an immunological response or nature of the disease. We call this process infection progress. Extrinsic process involves inter-agent communication and majorly represents the transmission of a disease. We call this process infection spread. Mathematically, these processes can be described by infection progress (Equation 3) and infection transmission (Equation 4) functions. 𝛷 : 𝐴 𝑆 → 0,1 (3) 𝛷 :𝐴 𝑆 → 0,1 (4) For a given agent 𝑎 ∈ 𝐴 and infection state 𝑠 ∈ 𝑆, Φp describes the probability, that at next simulations step agent's a infection state will become equal to s as the result of intra-agent processes. Similarly, for given agents 𝑟, 𝑑 ∈ 𝑆 and infection state 𝑠 ∈ 𝑆, Φt describes the probability, that at next simulation step agent's r infection state will become equal to s as the result of inter-agent communication with agent d. Discussing transmission function Φt, we refer to r as recipient, to d as donor. Thus, the goal of an agent-based epidemiological model at simulation step 𝑡 ∈ 𝑇 is to estimate 𝜉 𝑎, 𝑡 1 , ∀𝑎 ∈ 𝐴 (5) This can be done by evaluating the probability of each agent 𝑎 ∈ 𝐴 to change its state to 𝑠 ∈ 𝑆, 𝑠 𝜉 𝑎, 𝑡 3, which can be expressed by the following expression: 𝑃𝑟 𝜉 𝑎, 𝑡 1 𝑠 1 1 𝛷 𝑎, 𝑠 1 𝛷 𝑎, 𝑑, 𝑠 (6) ∈ ∈ : , , , , Given this equation, a trivial algorithm for computing 𝑃𝑟 𝜉 𝑎, 𝑡 1 𝑠 is presented in Algorithm 1. As one may notice, this evaluation involves iteration over A, which has to be done for each agent as nested loop. Optimization of computation of this expression is the main concern of this work. 4.2. Optimizations A significant part of literature discuss models, with transmission function independent of donor's and recipient's inner states, which can be formulated as following constraints of function Φt: ∀𝑎, 𝑎 , 𝑎 ∈ 𝐴, 𝑠 ∈ 𝑆: 𝜉 𝑎 , 𝑡 𝜉 𝑎 ,𝑡 ⇒ (7) 𝛷 𝑎, 𝑎 , 𝑠 𝛷 𝑎, 𝑎 , 𝑠 ∧ 𝛷 𝑎 , 𝑎, 𝑠 𝛷 𝑎 , 𝑎, 𝑠 This assumption can be used to perform a significant optimization. The main idea of it is to estimate «contagiousness» of each location at each time step by tracking infectious agents' movements, which is defined as: ∀𝑡 ∈ 𝑇: 𝐼 𝑝, ℎ, 𝑠 , 𝑠 1 1 𝛷 𝑠 , 𝜉 𝑎, 𝑡 , 𝑠 , (8) ∈ , , , 2 Henceforth, we assume each function can depend on some external model parameters, even though it’s not displayed in the notation 3 𝑃 𝜉 𝑎, 𝑡 1 𝜉 𝑎, 𝑡 can be computed as 1 ∑ ∈ , , 𝑃𝑟 𝜉 𝑎, 𝑡 1 𝑠 where: ∀𝑟, 𝑑 ∈ 𝐴, 𝑠 ∈ 𝑆, 𝑡 ∈ 𝑇: 𝛷 𝑟, 𝑑, 𝑠 𝛷 𝜉 𝑟, 𝑡 , 𝜉 𝑑, 𝑡 , 𝑠 (9) Figure 1: Description of a trivial algorithm for computation of 𝑃 𝜉 𝑎, 𝑡 1 𝑠 . Subsequently, susceptible agents' movements are used together with these estimates to compute the probabilities of their state change: ∀𝑡 ∈ 𝑇, 𝑃𝑟 𝜉 𝑎, 𝑡 1 𝑠 1 1 𝛷 𝑎, 𝑠 1 𝐼 𝜙 𝑎, 𝑡, ℎ , ℎ, 𝜉 𝑎, 𝑡 , 𝑠 (10) ∈ Even though, there are no evidence that Equation 7 holds for real world, it can be slightly adjusted, so that the scope of possible use cases of this model increases. If Φt can be decomposed into functions Φtr, Φtd so that: ∀𝑟 ∈ 𝐴, 𝑠 ∈ 𝑆: 1 𝛷 𝑎, 𝑑, 𝑠 𝛷 𝑟, 1 𝛷 𝑑, 𝑠 ,𝑠 , (11) ∈ ∈ the Algorithm 2 can be used. We don't try to solve this equation in the scope of this work, but the following solution is used in our experiments: ∀𝑓: 𝐴 → 𝑅 , 𝛷 𝑟, 𝑥, 𝑠 𝑥 (12) ∀𝛷 𝑑, 𝑠 : 𝐴 𝑆 → 0,1 Figure 2: Description of a linear algorithm for computation of 𝑃 𝜉 𝑎, 𝑡 1 𝑠 . In a general case, when Algorithm 2 cannot be applied, there is still room for an optimization. Algorithm 1 considers iterating over all possible pairs of agents, even though most of them weren't even in contact. By iterating over pairs of agents, which are in the same contact group, the overall number of computation can be decreased. Here, a contact group is a subset of the agent set A, which were at the same location at a specified simulation step t and time step h: ∀𝑎 , 𝑎 ∈ 𝐺 , , ⊆ 𝐴: 𝜙 𝑎 , 𝑡, ℎ 𝜙 𝑎 , 𝑡, ℎ (13) This computational optimization can be explained by the following: 𝐺, , |𝐴 𝐻| ∈ (14) 𝐺, , |𝐴 𝐻| ∈ Figure 3: Description of a grouping-based algorithm for computation of 𝑃 𝜉 𝑎, 𝑡 1 𝑠 . As one may notice, the worst-case computational complexity of Algorithm 3 is O(n2), depending on the number of agents when |P|=1. But, when using with real-world models, it's easy to show that it's equal to O(n). In the case of real world, we can assume, the number of places is proportional to the number of agents: |𝐴| ∝ |𝑃| (15) This can be explained by the fact, that, for example, the average number of residents of a household is independent of the total number of agents in the model4. Given this assumption, the model can be divided into a set of contact groups types, where i-th type has a number of associated contact groups, proportional to the number of agent's with coefficient ki and number of agents in each group equal to xi from some probability distribution Xi. Given this, computational complexity of Algorithm 3 can be estimated as: 𝑂 𝑔 𝑛 𝐸 𝑛𝑘 𝑥 𝑂 𝑔 𝑛 𝑛 𝑘 𝐸𝑥 𝑂 𝑔 𝑛 𝑛 , (16) where gT(n) is algorithmic complexity of the grouping algorithm itself. Both space and time complexity of the grouping algorithm depends on the choice and implementation of the underlying data structure which holds values of function Φ. It must support the following operations: Add tuple 𝑎, ℎ, 𝑝 into collection (𝑎 ∈ 𝐴, ℎ ∈ 𝐻, 𝑝 ∈ 𝑃); For tuple ℎ, 𝑝 return all 𝑎 , so that 𝑎 , ℎ, 𝑝 was added into collection before; Clear all data from the collection. 4 There can be some variation for the models, differing in size by orders of magnitude. For example, rural settlement with population of 103 cannot be modeled by scaling down the model of a large city with 106 citizens — the economical and sociodemographic gap will be too significant. Moreover, the following constraints are present: Overall number of tuples 𝑎, ℎ, 𝑝 is equal to |𝐻| ⋅ |𝐴|; Number of unique tuples ℎ, 𝑝 in the collection is at most |𝐻| ⋅ |𝑃|; For a given tuple ℎ, 𝑝 , maximum number of 𝑎 , so that 𝑎 , ℎ, 𝑝 was added into the collection is at most |𝐴|. Such a data structure is a multimap, which is an extension of the associative array for the case of storing several values by one key. Being an abstract data type, a multimap can be implemented in several different ways, however, having a common feature: dividing the structure into two components. The first component is a data structure that maps a key to a reference to a container containing a list of values; the second is the type of this container. So, for example, the first data structure can be a hash table [21], and the second — a linked list [22]. Based on this, the computational algorithm complexity 𝑔 can be given as: 𝑔 𝑛 𝑔 𝑛 𝑔 𝑛 𝑔 𝑛 𝑂 𝑛 ⋅ 𝑖𝑛𝑑 𝑛 𝑖𝑛𝑠 𝑛 (17) 𝑔 𝑛 𝑂 𝑖𝑡𝑟 𝑛 𝑖𝑡𝑟 𝑛 , where ind1 is the algorithmic complexity of searching by the index in the first data structure, ins2 is the algorithmic complexity of inserting at the end of the second data structure, itr1,itr2 is the algorithmic complexity of iterating the elements of the first and second data structures, respectively. Based on the aforementioned constraints and the fact that all keys can be numbered from 1 to |𝑃| ⋅ |𝐻|, the obvious candidates for the first data structure are the static array and the list because it is valid for them: ind1(n)=O(1),itr1(n)=O(n). However, due to the use of the CPU cache, in practice the use of a static array is more efficient (Figure 4). Figure 4: Comparison of container iteration speed. Cargo 1.56.0 compiler, compilation optimizations disabled In addition to implementing a multiset using a container of containers, an approach based on hash tables can be used. In this case, it involves two data structures, the first of which maps the value of the key to the number of elements by this key stored in the structure, and the second - the key and the index of the element to its value. This approach is appropriate, because the operations of adding and obtaining a value by key in such a structure will have an algorithmic complexity equal to O(1). This is justified due to the fact that similar operations on the hash table also have a constant complexity (in the absence of collisions) [23]. To evaluate the time performance of each approach, a number of experiments were conducted, which involved the calculation of the execution time of the grouping algorithm for each approach with different numbers of agents and the ratio of the number of agents to the number of places. The results are shown in figures 5, 6, 7. Figure 5: The execution time of the grouping algorithm for the average ratio of the number of agents to the number of places equal 16:1 Figure 6: The execution time of the grouping algorithm for the average ratio of the number of agents to the number of places equal 256:1 Figure 7: The execution time of the grouping algorithm for the average ratio of the number of agents to the number of places equal 4096:1 The results show that the implementation of a multiset through a vector of vectors is much more effective. This can be explained by the fact that the number of memory allocation can be reduced to a minimum with an optimally configured schedule for changing the size of the vector. And although it is impossible to predict the number of elements for each key in advance without having prior information about it, in the case of using a multiset to count visits of agents to places in the framework of infection modeling, it can be assumed that this number remains approximately the same. That is: ∀𝑝 ∈ 𝑃, ∀𝑡 ∈ 𝑇, ∀ℎ ∈ 𝐻: 1 𝑝 𝜙 𝑎, 𝑡, ℎ 1 𝑝 𝜙 𝑎, 𝑡 1, ℎ (18) Obviously, in the real world, this statement is generally not true. For example, it is valid to assume that the number of visitors of shopping malls increases significantly on Saturdays, compared to Fridays in countries where the working week ends on Friday. However, this can be partially solved by keeping the size of the buffers equal to at least the last number of elements multiplied by some constant. In the general case, one can achieve a minimum number of memory allocations equal to zero if set the size of each buffer equal to the number of agents. However, due to the assumption 15, the number of buffers is proportional to the number of agents. This makes the space complexity of such an algorithm equal to O(n2), which is unacceptable. Apart from aforementioned algorithmic optimizations, computational optimizations can also be applied. One may notice, the evaluation of state change probabilities involves large amount of multiplication operations if the form: 𝑥 𝑥 …𝑥 ...𝑥 𝑥 …𝑥 (19) Computation of such expression can be reformulated as: 𝑥 𝑒𝑥𝑝 𝑝 𝑙𝑛𝑥 (20) If we denote by 𝑡 , 𝑡 , 𝑡 , 𝑡 the calculation time of the product, sum, exponential function, and natural algorithm, respectively, the calculation time of the left part will be equal to 𝑡 ∑ 𝑝 1 , the right one - 𝑡 𝑡 ∑𝑝 𝑡 𝑡 . In the general case, the latter value is greater because the calculation of the natural logarithm is a time-consuming operation. However, in the case where the values of xi are constants known during the compilation, the computation time of the right-hand side will be equal to 𝑡 ∑ 𝑝 1 𝑡 . Because computing the product of two floating-point numbers requires more CPU cycles (for most architectures) than computing the sum [24], computing the optimized version is more efficient when: 𝑡 𝑡* 𝑡 , (21) ∑𝑝 1 which is valid for sufficiently large values of ∑ 𝑝 . Thus, if the disease distribution function depends only on the resulting state and the recipient and donor states, the specified optimization can be used. To check the effect of this optimization, an experiment was conducted, during which execution times with and without it were measured for various model sizes. We observed some difference between optimized and default approach on the experimental machine for small number of agents (Fig. 5). As, the test was done on a single machine with Intel Core i7-8700K, for more general results, additional experiments should be performed. We leave this problem for a future research. Figure 8: Simulation step performance increase (higher values are better). White areas correspond to absent data points. 4.3. Performance estimation Comparing the performance of algorithms across different machines is not a trivial task. Modern CPUs are able to reach significant performance boost by using caching, instruction pipelines and other technologies, the impact of which on a specific algorithm is difficult to predict [25]. So, to compare out approach to existing ones, we perform two sets of measurements: first one on a desktop machine with Intel Core i7-8700K and second — on a 12 years old laptop with Intel Core i3-2330M. In order to account for various model sizes and CPU core numbers, we use the following metric: 𝑇 , (22) 10 𝑁 ∗ 𝑁 ∗ 𝑁 ∗ 𝑁 where T - simulation execution time in seconds, Na,Nd,Ns - number of agents, simulation steps and time steps in a model respectively, Nc - number of CPU cores. This metric is only suitable for algorithms with linear computational complexity as function of number of agents. 5. Results We compared several models and their implementation approaches with CTrace [26] modeling language, whose translator implements aforementioned algorithms. Our test model didn’t captured all details of each model, but instead was based around their common features like agent activity scheduling, several different place types, etc. To compute the aforementioned metric for it, a number of experiments were conducted with various model sizes. Table 1 Performance comparison of various agent‐based epidemiological models/approaches Method Metric, s Hardware FluTe[12] 4.00 — 13.71 32x Intel Core2 Duo T9400 AvilovNetLogo[16] 511.36 — 10960.15 Intel Core i3 330M 2.13Ghz AvilovOracle[16] 5.75 — 13.71 FRED[19] 1.20 — 7.29 SGI Altix UV supercomputer MontañolaNetLogo[27] 2.47 — 24.71 Intel Core i5 3.20 GHz GiacopelliLombardy[28] 240 AMD 3900X 12-core 0.67 Intel Core i7-8700K 3.70 GHz CTrace[26] 1.38 Intel Core i3-2330M 2.20 GHz To be able to compare the developed approach to previously developed, we measured it’s performance on both modern CPU and 12-years old one. Even though, the CPU generation range is significantly wide, we can assume that our approach is at least as performant as previously developed, offering the most flexible model specification. This flexibility is explained by CTrace’s independence of modeling space structure, number and types of places (and other modeling entities), dynamics of a disease, etc. Given these results, we can conclude that usage of the aforementioned optimization approaches can provide significant benefits for agent-based epidemiological model computational performance on a single CPU core. 6. Conclusions This paper presents a discussion of the challenges encountered in the implementation of agent- based modeling for infectious diseases. Specifically, the focus is on the time complexity of the simulation algorithm, which is identified as a key issue. To address this problem, a mathematical framework for agent-based epidemiological modeling is developed. This framework enables exploration of various optimization techniques for the simulation process. Several algorithms for contact tracing, which have linear time complexity, are proposed, depending on the nature of the infectious disease being modeled. These algorithms are tested and compared with existing models. Results demonstrate that the proposed approach is at least as effective as other implementations, while offering increased model flexibility through the use of the CTrace language interface. Described approach scales up by running multiple models, each per CPU core, which is often desirable for this kind of models as it leads to statistically more significant results. The question of single model parallelization we leave for a future research 7. References [1] R. Dandekar, G. Barbastathis, Quantifying the effect of quarantine control in Covid-19 infectious spread using machine learning, medRxiv, 2020, pp. 1-13. doi: 10.1101/2020.04.03.20052084. [2] W. O. Kermack, A. G. McKendrick, A contribution to the mathematical theory of epidemics, Proceedings of the royal society of London, Vol. 115, 1927, pp. 700–721. doi: 10.1098/rspa.1927.0118. [3] K. S. Sharov, Creating and applying SIR modified compartmental model for calculation of COVID-19 lockdown efficiency, Chaos, Solitons & Fractals, Vol. 141, 2020, pp. 1-14. doi: 10.1016/j.chaos.2020.110295. [4] S. Nie, W. Li, Using lattice SIS epidemiological model with clustered treatment to investigate epidemic control, Biosystems, Vol. 191, 2020, doi: 10.1016/j.biosystems.2020.104119. [5] A. Ceria, K. Köstler, R. Gobardhan, H. Wang, Modeling airport congestion contagion by heterogeneous SIS epidemic spreading on airline networks, PLoS One, Vol. 16, 2021. doi: 10.1371/journal.pone.0245043. [6] W. Sanusi, N. Badwi, A. Zaki, S. Sidjara, N. Sari, M. I. Pratama, S. Side, Analysis and Simulation of SIRS Model for Dengue Fever Transmission in South Sulawesi, Indonesia, Journal of Applied Mathematics, Vol. 2021, 2021, pp. 1-8. doi: 10.1155/2021/2918080. [7] H.-Y. Shin, A multi-stage SEIR(D) model of the COVID-19 epidemic in Korea, Annals of Medicine, Vol. 53, 2021, pp. 1160–1170. doi: 10.1080/07853890.2021.1949490. [8] S. Annas, M. I. Pratama, M. Rifandi, W. Sanusi, S. Side, Stability analysis and numerical simulation of SEIR model for pandemic COVID-19 spread in Indonesia, Chaos, Solitons & Fractals, Vol. 139, 2020, pp. 1-7. doi: 10.1016/j.chaos.2020.110072. [9] Z. Ahmad, M. Arif, F. Ali, I. Khan, K. S. Nisar, A report on COVID-19 epidemic in Pakistan using SEIR fractional model, Scientific Reports, Vol. 10, 2020, pp. 1–14. doi: 10.1038/s41598- 020-79405-9. [10] W. Li, J. Gong, J. Zhou, L. Zhang, D. Wang, J. Li, C. Shi, H. Fan, An evaluation of COVID-19 transmission control in Wenzhou using a modified SEIR model, Epidemiology & Infection Vol. 149, 2021, pp. 1-12. doi: 10.1017/S0950268820003064. [11] M. W. Levin, M. Shang, R. Stern, Effects of short-term travel on COVID-19 spread: A novel SEIR model and case study in Minnesota, PLoS One, Vol. 16, 2021, pp. 1-16. doi: 10.1371/journal.pone.0245919. [12] D. L. Chao, M. E. Halloran, V. J. Obenchain, I. M. Longini Jr, FluTE, a publicly available stochastic influenza epidemic simulation model, PLoS computational biology, Vol. 6, 2010, pp. 1-8. doi: 10.1371/journal.pcbi.1000656. [13] Minar, N., Burkhart, R., Langton, C., & Askenazi, M. The Swarm Simulation System: A Toolkit for Building Multi-Agent Simulations, ACM Transactions on Modeling and Computer Simulation, 1996. [14] P. Patlolla, V. Gunupudi, A. R. Mikler, R. T. Jacob, Agent-based simulation tools in computational epidemiology, International Workshop on Innovative Internet Community Systems, Springer, 2004, pp. 212–223. doi: 10.1007/11553762_21. [15] Tisue, S., & Wilensky, U. (2004, May). Netlogo: A simple environment for modeling complexity. In International conference on complex systems, Vol. 21, pp. 16-21. [16] K. Avilov, O. Y. Solovey, Agent-Based Models: Approaches and Applicability to Epidemiology, 2012, pp. 425-443. doi: 10.1146/annurev-publhealth-040617-014317. [17] Collier, Nick. Repast: An extensible framework for agent simulation, The University of Chicago’s Social Science Research, Vol. 36, 2003, pp. 1-18. [18] N. Collier, M. North, Parallel agent-based simulation with Repast for high performance computing, Simulation, Vol. 89, 2013, pp. 1215–1235. doi:10.1177/0037549712462620. doi: 10.1177/0037549712462620. [19] J. J. Grefenstette, S. T. Brown, R. Rosenfeld, J. DePasse, N. T. Stone, P. C. Cooley, W. D. Wheaton, A. Fyshe, D. D. Galloway, A. Sriram, et al., FRED (a Framework for Reconstructing Epidemic Dynamics): an open-source software system for modeling infectious diseases and control strategies using census-based populations, BMC public health, Vol. 13, 2013, pp. 1–14. doi: 10.1186/1471-2458-13-940. [20] W. Van den Broeck, C. Gioannini, B. Gonçalves, M. Quaggiotto, V. Colizza, A. Vespignani, The GLEaMviz computational tool, a publicly available software to explore realistic epidemic spreading scenarios at the global scale, BMC infectious diseases, Vol. 11 2011, pp. 1–14. doi: 10.1186/1471-2334-11-37. [21] D. P. Mehta, S. Sahni, Handbook of Data Structures and Applications, Chapman and Hal l/CRC, 2004, p. 163. doi: 10.1201/9781420035179. [22] D. P. Mehta, S. Sahni, Handbook of Data Structures and Applications, Chapman and Hall/CRC, 2004, p. 47. doi: 10.1201/9781420035179. [23] T. Van Dijk, Analysing and Improving Hash Table Performance, 10th Twente Student Conference on IT. University of Twente, Faculty of Electrical Engineering and Computer Science, Citeseer, 2009. [24] A. Fog, et al., Instruction tables: Lists of instruction latencies, throughputs and micro-operation breakdowns for Intel, AMD, and VIA CPUs, Copenhagen University College of Engineering, 2011. [25] J. Domínguez, E. Alba, A Methodology for Comparing the Execution Time of Metaheuristics Running on Different Hardware, European Conference on Evolutionary Computation in Combinatorial Optimization, Springer, 2012, pp. 1–12. doi: 10.1007/978-3-642-29124-1_1. [26] Sarnatskyi V., Baklan I. CTrace: Language for Definition of Epidemiological Models with Contact-Tracing Transmission, International Scientific Conference “Intellectual Systems of Decision Making and Problem of Computational Intelligence”. – Springer, Cham, 2023, pp. 426- 448. doi: 10.1007/978-3-031-16203-9_25. [27] C. Montañola-Sales, J. F. Gilabert-Navarro, J. Casanovas-Garcia, C. Prats, D. López, J. Valls, P. J. Cardona, C. Vilaplana, Modeling tuberculosis in Barcelona. A solution to speed-up agent- based simulations, in: 2015 Winter Simulation Conference (WSC), IEEE, 2015, pp. 1295–1306. doi: 10.1109/WSC.2015.7408254. [28] G. Giacopelli, et al., A Full-Scale Agent-Based Model to Hypothetically Explore the Impact of Lockdown, Social Distancing, and Vaccination During the COVID-19 Pandemic in Lombardy, Italy: Model Development, Jmirx med, Vol. 2, 2021, pp. 1-11. doi: 10.2196/24630.