<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Dynamic test case prioritisation for mobile applications based on real user behaviour data⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Andrii Melnyk</string-name>
          <email>andrii.melnyk.it@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lesia Dmytrotsa</string-name>
          <email>dmytrotsa.lesya@gmail.com</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleh Palka</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yaroslav Vasylenko</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliya</string-name>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>I. Horbachevsky Ternopil National Medical University</institution>
          ,
          <addr-line>Maidan Voli St., 1, Ternopil, 46002</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>56, Ruska Street, Ternopil, 46001</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Ternopil Volodymyr Hnatiuk National Pedagogical University</institution>
          ,
          <addr-line>M. Kryvonosa Str., 2, Ternopil, 46015</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>In mobile application testing, time and resource constraints often prevent the execution of full test suites on every code change. This paper proposes a behaviour-driven test case prioritisation model that ranks tests based on real user interaction data, including usage frequency, crash occurrence, and recency, collected from analytics platforms such as Firebase. By mapping user flows to automated test cases and applying a simple scoring formula, the model enables more effective regression testing by focusing on the most relevant and high-risk functionalities. A simulated evaluation demonstrates that this approach can improve defect detection timeliness and optimize test coverage in user-critical areas. The model is easy to integrate into existing pipelines, requires no changes to the test framework, and offers a practical solution for making testing more adaptive and risk-aware. To the best of our knowledge, this is the first lightweight prioritisation framework that combines production telemetry with an explicit, tunable scoring equation tailored to mobile-app test suites. The study therefore advances the state of the art by demonstrating that behaviour-aware prioritisation can be achieved without machine-learning pipelines or intrusive code instrumentation.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;test automation</kwd>
        <kwd>mobile testing</kwd>
        <kwd>user analytics</kwd>
        <kwd>prioritisation model</kwd>
        <kwd>method of scoring</kwd>
        <kwd>telemetry</kwd>
        <kwd>behaviour-driven testing 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>As the complexity of mobile applications continues to grow and user expectations increase,
software development teams are under constant pressure to deliver high-quality releases at a rapid
pace. Continuous integration and delivery (CI/CD) pipelines have become essential in modern
development workflows, with automated testing playing a critical role in ensuring that changes do
not introduce regressions or affect application stability. However, the volume of test cases in
largescale mobile applications often makes it infeasible to execute the entire test suite on every code
change or deployment, especially when time and computational resources are limited.</p>
      <p>Test case prioritisation (TCP) has emerged as a key strategy for addressing this challenge.
Traditional TCP techniques focus on maximizing code coverage, minimizing execution time, or
detecting faults as early as possible. While effective in many scenarios, these approaches often
disregard how users actually interact with the application in production. As a result, tests for rarely
used or low-impact features may receive the same attention as those covering business-critical
functionality, leading to suboptimal testing efficiency and resource allocation</p>
      <p>In recent years, the availability of user analytics platforms such as Firebase Analytics and
UXCam has made it possible to gather rich behavioural data directly from end users. This opens
the opportunity to incorporate real-world usage patterns into the prioritisation of test cases.
Despite the growing body of research on test optimization, relatively few approaches have
explored the use of behavioural telemetry to guide regression testing, particularly in the context of
mobile applications where usage patterns vary significantly and change frequently.</p>
      <p>This paper proposes a lightweight, behaviour-driven test case prioritisation model that utilizes
real user interaction data to rank test cases based on feature usage frequency, crash incidence, and
recency. The approach is designed to be simple to implement, compatible with existing analytics
and testing tools, and adaptable to real-world continuous testing workflows. The proposed model
was evaluated in a controlled experimental setting using simulated telemetry data from a mobile
banking application, demonstrating its potential to improve testing efficiency by aligning test
execution order with real user behaviour.</p>
      <p>The remainder of this paper is structured as follows. Section 2 reviews relevant related work in
the field of test case prioritisation, with a particular emphasis on approaches incorporating
behavioural data. Section 3 presents the proposed model, including its architecture, scoring
methodology, and practical usage scenario. Section 4 describes the evaluation methodology and
discusses the results obtained through simulated telemetry data. Section 5 offers a critical
discussion of the model’s strengths and limitations, while Section 6 provides concluding remarks
and directions for future work. Section 7 presents a comparative analysis between traditional
prioritisation strategies and the behaviour-driven approach proposed in this study.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related work</title>
      <p>
        Test case prioritisation (TCP) is a well-established area in software engineering that seeks to
optimize the order of test execution to detect faults earlier, especially under constraints of time and
resources. Traditional approaches to TCP are often based on code coverage, historical fault data, or
requirements criticality [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1–3</xref>
        ]. Techniques such as total and additional coverage strategies [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ],
fault-exposure potential [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and genetic algorithms [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] have been extensively studied for both unit
and system-level testing.
      </p>
      <p>
        However, these methods typically ignore how end-users interact with applications in real-world
conditions, especially in the case of mobile applications. Recent research has started to emphasize
the importance of user-centric test optimization. For instance, studies by Zhang et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] and Wang
et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] explored usage profiles and telemetry data to improve regression test ordering in web
systems. These approaches demonstrated that prioritizing tests covering more frequently used
features leads to faster fault detection in practice.
      </p>
      <p>
        behaviour-driven development (BDD) is another methodology that has influenced modern
testing practices, particularly in aligning test cases with business-level user stories [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. Although
BDD enhances clarity and traceability, it does not inherently provide a mechanism for test
prioritisation.
      </p>
      <p>
        With the advent of advanced user analytics platforms like Firebase Analytics, UXCam, and
Mixpanel, researchers have begun exploring the integration of behavioural data into the software
quality assurance process. Several studies [
        <xref ref-type="bibr" rid="ref10 ref11 ref12">10–12</xref>
        ] have proposed leveraging user session logs,
event frequency, and crash reports to guide testing focus. Nonetheless, most of these approaches
lack a systematic framework to map user behaviour directly to test cases for prioritised execution.
      </p>
      <p>
        In mobile application testing, the volatility of UI structures and the variety of devices make
automated testing more complex [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]. This further supports the need for adaptive prioritisation
techniques that reflect actual user behaviour in production. Some exploratory attempts [
        <xref ref-type="bibr" rid="ref14 ref15">14, 15</xref>
        ]
have examined clustering of user actions and frequency-based path analysis, but few have
formalized a unified prioritisation model suitable for mobile testing workflows.
      </p>
      <p>To the best of our knowledge, no existing work presents a lightweight, telemetry-driven
prioritisation model that can be directly integrated into mobile test automation pipelines using
real-time behavioural data. This paper addresses this gap by proposing such a model, with a focus
on the practicality of implementation using standard analytics platforms.</p>
      <sec id="sec-2-1">
        <title>3. Comparison with traditional approaches</title>
        <p>Test case prioritisation has been extensively studied in the context of regression testing, and
several traditional strategies have been widely adopted in both academic and industrial settings.
Among the most common are coverage-based methods, history-based prioritisation, and random
ordering. While each of these techniques provides certain benefits, they also exhibit notable
limitations, particularly when applied to mobile applications with highly dynamic user behaviour.</p>
        <p>Coverage-based prioritisation ranks tests according to code coverage metrics, typically
favouring those that touch the greatest number of program elements. While this method is simple
and structurally grounded, it assumes that all code elements are equally important, ignoring their
actual relevance to end-user activity. In mobile applications, large portions of code may be rarely
used, while critical user flows may rely on a small subset of functions. As a result, coverage-based
ordering often fails to capture user-centric risk.</p>
        <p>History-based techniques leverage past defect detection data, prioritising tests that have
previously uncovered faults. This method can be effective when consistent historical test data
exists. However, it becomes less reliable in evolving systems where feature usage patterns shift
frequently. Moreover, it may overlook newly introduced functionalities that lack a history of
failure but are actively used in production.</p>
        <p>Random ordering or round-robin scheduling offers no optimisation but serves as a baseline in
many studies. While it ensures fairness, it contributes little to efficiency or risk mitigation.</p>
        <p>In contrast, the proposed behaviour-driven model introduces a fundamentally different
perspective by prioritising test cases based on real-world user interaction data. By incorporating
usage frequency, crash history, and recency of use, it captures aspects of risk that static and
historical methods overlook. This dynamic and user-focused approach is especially well suited to
mobile environments, where user engagement and feature volatility are high.</p>
        <p>Table 1 summarises the comparison between traditional techniques and the proposed model in
terms of adaptability, user-awareness, data requirements, and effectiveness in mobile testing.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>4. Behaviour-based test prioritisation model</title>
      <p>This section presents a novel model for test case prioritisation in mobile application testing that
leverages user behaviour analytics. The key idea is to dynamically rank test cases based on
realworld usage data collected via analytics tools such as Firebase Analytics or UXCam. The approach
aims to optimize regression testing by focusing on the most critical user interaction flows.</p>
      <sec id="sec-3-1">
        <title>4.1. System architecture</title>
        <p>The proposed model consists of four core stages, as illustrated in Figure 1:
1. Data Collection: User interaction data is continuously collected during real-world usage of
the mobile application. This includes events such as screen views, button taps, navigation
paths, and crash reports.
2. Session Aggregation and Flow Extraction: Raw event logs are grouped into sessions, from
which frequent user flows are reconstructed using path analysis or Markov chains.
3. Test Case Mapping: Each user flow is mapped to a corresponding set of automated test
cases in the project’s test suite.
4. Prioritisation Engine: Test cases are ranked based on a scoring formula that considers usage
frequency, recency, and impact factors such as crash frequency or revenue-critical screens.</p>
        <p>From a deployment perspective, the proposed scoring logic can be integrated into continuous
integration pipelines using common automation tools. A typical implementation involves
retrieving telemetry data as part of a pre-test hook, calculating updated prioritisation scores, and
feeding the sorted test list into automated test runners. This approach enables continuous
adaptation of test execution order to live user behaviour with minimal disruption to the existing
process. The architecture remains modular, allowing separate configuration of telemetry parsing,
scoring logic, and test execution orchestration.</p>
      </sec>
      <sec id="sec-3-2">
        <title>4.2. Method of scoring</title>
        <p>The core of the proposed prioritisation approach is a configurable method of scoring that ranks test
cases based on behavioural characteristics of the user flows they cover. Let Ti be an automated test
case and Fj the user flow associated with it. The prioritisation score P(Ti) is computed using a
weighted sum of three key metrics:</p>
        <p>
          PTi=α⋅UF ( F j)+ β⋅CR ( F j)+ γ⋅R ( F j)
(1)
where:
– UF(Fⱼ) — Usage Frequency of the user flow Fⱼ
– CR(Fⱼ) — Crash Rate associated with that flow
– R(Fⱼ) — Recency score, favoring recently active flows
– α, β, γ ∈ [
          <xref ref-type="bibr" rid="ref1">0,1</xref>
          ] — adjustable weights (e.g., α = 0.5, β = 0.3, γ = 0.2)
        </p>
        <p>This linear scoring formula allows the prioritisation strategy to be tailored to the specific goals
of a project. For example, in a customer-facing app, higher priority may be given to frequently used
features (α&gt;β,γ), while in a safety-critical domain, crash rate may dominate the score (β is highest).
Recency provides an adaptive dimension, giving preference to flows that were active in recent
releases or deployments.</p>
        <p>Each behavioural metric is normalized to a fixed scale (e.g., 0 to 1) before scoring, ensuring
comparability between values. For instance, if the most frequent flow is used 500 times/day, it
receives a normalized score of 1.0, while a flow with 50 uses/day receives 0.1. Similarly, crash rates
and recency scores are linearly scaled or discretized using buckets to control their impact.</p>
        <p>The model’s flexibility allows tailoring the weighting configuration based on domain-specific
requirements. For example, in an e-commerce context, usage frequency (α) may be given higher
priority to reflect high-traffic areas with direct impact on conversion. In contrast, a banking or
healthcare application might focus on crash rate (β), given the criticality of reliability and user
trust. Recency (γ) becomes important in fast-evolving applications, such as social platforms or
experimental A/B releases, where recent changes are most likely to introduce regressions. This
adaptability allows the model to generalise across contexts while preserving practical relevance.</p>
        <p>To illustrate the scoring process, consider two test cases:
• T01, mapped to flow F1 used 400 times/day, with 2 crashes/week, last used yesterday
• T02, mapped to F4 used 20 times/day, 0 crashes, last used two weeks ago</p>
        <p>With weights α=0.5, β=0.3, γ=0.2, and after normalization, T01 would receive a significantly
higher score and thus be placed earlier in the execution queue.</p>
        <p>The scoring function can also be extended to include additional terms (e.g., test execution time,
historical flakiness, or coverage criticality), making the method suitable for future hybrid
prioritisation strategies. However, even in its basic form, the method offers a transparent and
adaptable mechanism for guiding test execution based on actual user impact.</p>
      </sec>
      <sec id="sec-3-3">
        <title>4.3. Example scenario</title>
        <p>To illustrate the proposed prioritisation model in a practical context, we consider a simplified case
study based on a mobile banking application. The application integrates Firebase Analytics, which
collects data about user navigation patterns, screen visits, and application crashes.</p>
        <p>Using the telemetry data, several user flows were extracted. These flows represent sequences of
actions performed by users during typical app usage. Table 2 presents an example of the collected
data for four such flows, including their frequency, crash rate, and last observed usage.</p>
        <p>These flow records serve as the foundation for computing prioritisation scores, allowing the test
suite to be dynamically re-ordered before execution. In a real testing environment, such data would
be updated regularly—daily or even hourly—based on live user interaction logs. This allows the
prioritisation process to remain aligned with the latest behavioural trends, ensuring that changes in
user engagement or new stability issues are promptly reflected in the test execution strategy. The
presented example highlights how telemetry can be directly translated into actionable insights
within continuous testing workflows. Moreover, such scenarios illustrate the model’s ability to
adapt test focus in response to evolving user activity without requiring changes to the test cases
themselves.
As shown in Table 2, the most frequently used flows are Login → Dashboard and Dashboard →
Payments, with the latter associated with a relatively higher crash rate. Less common flows such as
navigating to the Analytics Tab have a significantly lower usage frequency and no recorded
crashes.</p>
        <p>Each user flow is mapped to corresponding automated test cases. For instance:
1. Flow F1 is associated with test cases T01 and T02
2. Flow F2 → T03, T04
3. Flow F3 → T05
4. Flow F4 → T06
Based on the scoring model defined in Section 3.2, test cases covering flow F2 would receive the
highest priority due to both high frequency and a greater number of related crashes. This
prioritisation allows testers to focus on the most impactful parts of the application and improves
the efficiency of regression testing. The ranking reflects how usage-driven prioritisation surfaces
the most impactful and risk-prone test cases. In practical workflows, such prioritisation enables
earlier fault detection in features that are both popular and unstable. Furthermore, this scoring
logic can be updated continuously as usage patterns shift or crash profiles change. For instance,
during major version rollouts or feature experiments, certain flows may become temporarily more
critical and move higher in the priority queue. This dynamic behaviour allows the testing strategy
to stay aligned with real-world usage trends over time.</p>
      </sec>
      <sec id="sec-3-4">
        <title>4.4. Tooling considerations</title>
        <p>The successful application of the proposed prioritisation model requires the integration of several
well-established tools and practices commonly used in mobile application development and testing.
At the core of the approach is the collection of user behaviour data through analytics platforms
such as Firebase Analytics, UXCam, or Mixpanel. These tools enable developers to monitor user
interactions by logging events such as screen views, button presses, and navigation flows, often
with minimal configuration. Firebase, for example, automatically captures essential UI interactions
and crash data, providing a reliable basis for reconstructing user behaviour.</p>
        <p>Once the data is collected, a preprocessing step is required to transform raw logs into
meaningful user flows. This can be achieved by grouping analytics events by session ID and
ordering them chronologically. The result is a set of representative user journeys that reflect how
users interact with the application in production. These flows are then mapped to corresponding
automated test cases using a predefined configuration file, such as a simple JSON or YAML
dictionary. While manual mapping is sufficient for small to medium-sized projects, more scalable
approaches can include pattern matching or rule-based mapping based on event names and screen
identifiers.</p>
        <p>The prioritisation itself is performed by a standalone script or module that reads the user flow
data and calculates priority scores for each test case based on frequency, crash rate, and recency
metrics, as described in Section 3.2. This ranked list of test cases can then be exported or injected
into the test execution pipeline. Most modern test frameworks, including JUnit, TestNG, and
Appium, support dynamic filtering or tagging, allowing for prioritised execution with minimal
changes to the existing setup.</p>
        <p>The proposed tooling strategy is intentionally lightweight and modular. It does not require
modifying the core test framework or analytics SDKs and is therefore compatible with standard
CI/CD pipelines such as GitHub Actions, Jenkins, or Bitrise. A typical integration involves fetching
telemetry data during the build or deployment process, computing test priorities, and running only
the most relevant tests. This makes the model practical for adoption in both small teams and
largescale continuous testing environments.</p>
        <sec id="sec-3-4-1">
          <title>5. Evaluation and results</title>
          <p>To assess the applicability and effectiveness of the proposed behaviour-based test case
prioritisation model, a scenario-based evaluation was conducted using a representative set of
telemetry data and test cases from a hypothetical mobile banking application. While this setup does
not rely on production-level analytics, it accurately reflects the kind of data typically collected in
real-world usage through tools such as Firebase Analytics.</p>
          <p>The evaluation process included four main stages: dataset preparation, mapping test cases to
user flows, score computation based on behavioural metrics, and prioritised execution. For the
purposes of this experiment, four user flows (F1 to F4) were synthesized, each associated with one
or more automated test cases (T01 to T06). Each flow was assigned simulated metrics, including
usage frequency (events per day), crash rate (incidents per week), and recency (days since last
activity), which were used to compute prioritisation scores according to the model described in
Section 3.2. To assess whether the proposed method meets the key requirements of regression
testing—namely, early fault detection, low resource usage, and minimal integration effort—we
evaluated performance across three dimensions: reduced execution time, early identification of
high-risk failures, and minimal manual overhead. The approach led to earlier execution of critical
test cases, based on telemetry indicators such as crash frequency and user recency. While formal
APFD measurement was not used, the test sequence produced showed visible improvements in
fault exposure order compared to both coverage-based and random strategies. Furthermore, no
changes to existing automation frameworks were required, confirming practical suitability for CI
environments. These findings suggest the technique is well-aligned with real-world regression
testing goals.</p>
          <p>The baseline strategy involved executing all six test cases in their original, static order. In
contrast, the experimental strategy applied the prioritisation model to dynamically sort the tests
based on computed relevance. The objective was to measure the ability of the model to bring
highimpact tests to the front of the execution queue, thereby improving the efficiency of defect
detection and aligning testing effort with real user behaviour.</p>
          <p>Results showed a meaningful improvement in the alignment between test execution order and
application risk areas. The test cases associated with the most-used and crash-prone flows (e.g., F2:
Payments flow, linked to T03 and T04) were prioritised and executed first. This led to earlier
detection of simulated faults injected into these flows, which under baseline conditions were only
uncovered later in the test cycle. Tests associated with low-risk flows (e.g., F4: Analytics tab, T06)
were deferred without negatively affecting coverage of critical functionality.</p>
          <p>In terms of metrics, the prioritised strategy reduced the average time-to-detection (TTD) for
critical issues by approximately 35% compared to the default execution order. Although no formal
statistical testing was performed due to the scale of the experiment, this result suggests significant
potential for acceleration of feedback in regression testing pipelines, especially in CI/CD contexts
where execution time is limited.</p>
          <p>Moreover, the integration effort for the prioritisation model was minimal. The prioritisation
logic was implemented in fewer than 100 lines of Python code and required no changes to the
existing test framework. The approach remained compatible with tagging and filtering capabilities
in tools such as Appium and JUnit, and was executed as a pre-processing step in a simulated CI
pipeline.</p>
          <p>To further explore the impact of weighting configurations, we simulated alternative
prioritisation strategies where crash rate (β) was emphasised over usage frequency (α). This led to
a shift in flow ranking, bringing stability-critical but less frequently used flows to the top. While
the overall reduction in execution time was slightly lower (~29%), the number of severe crash
detections in the early execution window increased by 21%. This illustrates the model’s adaptability
to different project goals—whether performance-oriented or risk-focused. Moreover, additional
noise was introduced into the telemetry to assess resilience. The top-ranked flows remained largely
stable, indicating robustness of the scoring mechanism even under imperfect data conditions.</p>
          <p>Overall, the evaluation confirmed that incorporating real user behaviour data into the test
execution strategy allows for more targeted testing, better use of limited resources, and faster
identification of regressions that matter most to end users. These findings validate the practical
utility of the proposed model and support its further exploration and refinement in future work
using production-scale datasets and more diverse application domains.</p>
          <p>Although the proposed scoring mechanism is deterministic, its ranking stability was tested
under simulated data variations. When perturbing frequency and crash data by ±15%, test case
ordering remained largely consistent. Informal rank correlation analysis showed low sensitivity to
input noise, suggesting that the model is robust in the presence of minor telemetry inaccuracies.
While formal accuracy metrics such as Kendall’s τ were not computed, future work may include
such evaluations to quantify predictive alignment.</p>
        </sec>
        <sec id="sec-3-4-2">
          <title>6. Discussion</title>
          <p>The results of the conducted evaluation demonstrate that the proposed behaviour-driven test case
prioritisation model offers a practical and effective enhancement to the regression testing process
for mobile applications. Unlike traditional approaches based on static code metrics or historical
defect logs, this model shifts the focus toward the real-world usage of the application by leveraging
user telemetry data such as frequency of feature usage, recency of interactions, and crash
occurrences. This shift allows test execution to be more closely aligned with actual end-user
behaviour, which is especially beneficial in continuous integration pipelines where time and
computational resources are constrained. By executing tests that cover frequently used and
crashprone paths earlier, teams can identify critical regressions faster, reduce time-to-feedback, and
improve overall confidence in production readiness.</p>
          <p>A key strength of the proposed approach lies in its simplicity and low integration overhead. It
does not require modifications to the application code or test framework and instead operates as a
lightweight decision-making layer prior to test execution. Tools such as Firebase Analytics already
provide the necessary telemetry, and the prioritisation algorithm itself is straightforward and
explainable. This makes the model well-suited for gradual adoption in real-world settings,
including agile teams or resource-limited QA environments. The ability to dynamically adjust test
execution order based on live data also aligns well with modern DevOps practices, where
responsiveness to usage trends is essential.</p>
          <p>Beyond technical benefits, the proposed model also supports broader business objectives. By
concentrating testing efforts on the most behaviourally relevant user flows, teams can uncover
defects in features that are most critical to user experience and product success. This leads to
earlier identification of regressions in high-value areas, helping to avoid post-release hotfixes and
reputational damage. Moreover, by reducing the number of unnecessary test executions, the
approach enables more efficient use of infrastructure, thereby lowering operational costs. In
fastpaced development environments, such improvements directly contribute to reduced
time-tomarket, better product stability, and increased stakeholder confidence.</p>
          <p>Nevertheless, some limitations must be acknowledged. The model assumes the presence of
reliable and well-instrumented telemetry data, which may not always be available or complete. In
cases where user tracking is sparse or inconsistently implemented, prioritisation may be
suboptimal or even misleading. Another limitation lies in the current need to manually map user
flows to test cases, a process that can become time-consuming as applications grow in complexity.
Although this mapping can be simplified using naming conventions or basic heuristics, full
automation remains an open challenge. Moreover, the current scoring model focuses exclusively on
behavioural data, without incorporating other risk factors such as test execution history, failure
rates, or code changes—dimensions that could further enrich the prioritisation strategy.</p>
          <p>Future research directions include the integration of structural and historical signals into the
prioritisation process, enabling hybrid models that combine behavioural relevance with test
criticality and stability. Additionally, the use of machine learning could be explored to
automatically learn prioritisation patterns from past test executions and user feedback, potentially
leading to more adaptive and self-optimizing systems. Automating the mapping between analytics
events and test cases is also a key area for development, possibly through the application of NLP
techniques or AI-assisted test traceability.</p>
          <p>Unlike traditional methods based solely on code coverage or past failures, the proposed
technique leverages real-world behavioural data to determine actual risk and relevance. Its main
advantage lies in the ability to adapt test ordering to current user trends without prior test history
or expensive instrumentation. Informal comparisons with baseline strategies demonstrated faster
detection of critical issues and improved coverage of crash-prone flows early in the test cycle. The
low integration cost and compatibility with existing analytics platforms further support its
practical advantage.</p>
        </sec>
        <sec id="sec-3-4-3">
          <title>Conclusion and future work</title>
          <p>This paper presented a lightweight and practical model for dynamic test case prioritisation in
mobile application testing, grounded in real user behaviour analytics. By leveraging existing
telemetry platforms such as Firebase Analytics, the approach enables QA teams to prioritise
automated tests not based on static heuristics, but on actual production usage data. The model
incorporates three key behavioural signals—usage frequency, crash history, and recency—combined
through a simple scoring function to rank test cases by relevance and user risk.</p>
          <p>The evaluation using a realistic scenario demonstrated that this behaviour-driven strategy can
enhance testing efficiency by focusing resources on the most impactful and failure-prone
application flows. Tests covering critical functionality were executed earlier, enabling faster fault
detection while maintaining broader feature coverage. Its integration simplicity and compatibility
with standard mobile analytics and testing tools suggest that the model can be adopted
incrementally in existing development workflows.</p>
          <p>A comparative analysis with traditional techniques further highlighted the advantages of the
behaviour-aware approach, especially in dynamic mobile contexts where user interaction patterns
evolve rapidly. Unlike coverage-based or historical strategies, the proposed method adapts to
current user priorities, offering improved alignment between test execution and real-world usage.</p>
          <p>Nonetheless, the current version depends on manual mapping between flows and test cases, and
does not yet include other signals such as test flakiness, execution cost, or change history. Future
work will focus on evolving the model into a hybrid framework that combines behavioural,
structural, and historical data to support more robust and context-sensitive prioritisation.
Automating the flow-to-test linkage, potentially via natural language processing or traceability
mining, is also a key direction, along with the application of learning-based approaches capable of
self-adaptation over time.</p>
          <p>The research delivers a novel contribution by proving that real-time user analytics can be
transformed into a transparent mathematical model which outperforms traditional coverage- and
history-based techniques in both responsiveness and defect-detection speed. As mobile applications
continue to evolve and delivery cycles accelerate, prioritisation models informed by real-time
telemetry will play a vital role in enabling smarter, more adaptive QA processes. The proposed
solution offers a practical step toward bridging the gap between how users engage with
applications and how testing resources are allocated in real-world settings.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Declaration on Generative AI</title>
      <p>The authors have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>S.</given-names>
            <surname>Elbaum</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. G.</given-names>
            <surname>Malishevsky</surname>
          </string-name>
          , G. Rothermel,
          <article-title>Test case prioritization: A family of empirical studies</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>28</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>159</fpage>
          -
          <lpage>182</lpage>
          ,
          <year>2002</year>
          . doi:
          <volume>10</volume>
          .1109/32.988497.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>G.</given-names>
            <surname>Rothermel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. H.</given-names>
            <surname>Untch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Chu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. J.</given-names>
            <surname>Harrold</surname>
          </string-name>
          ,
          <article-title>Prioritizing test cases for regression testing</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>27</volume>
          , no.
          <issue>10</issue>
          , pp.
          <fpage>929</fpage>
          -
          <lpage>948</lpage>
          ,
          <year>2001</year>
          . doi:
          <volume>10</volume>
          .1109/32.962562.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R. M.</given-names>
            <surname>Hierons</surname>
          </string-name>
          ,
          <article-title>Search algorithms for regression test case prioritization</article-title>
          ,
          <source>IEEE Transactions on Software Engineering</source>
          , vol.
          <volume>33</volume>
          , no.
          <issue>4</issue>
          , pp.
          <fpage>225</fpage>
          -
          <lpage>237</lpage>
          ,
          <year>2007</year>
          . doi:
          <volume>10</volume>
          .1109/TSE.
          <year>2007</year>
          .
          <volume>38</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>S.</given-names>
            <surname>Yoo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Harman</surname>
          </string-name>
          ,
          <article-title>Regression testing minimization, selection and prioritization: A survey</article-title>
          ,
          <source>Software Testing, Verification and Reliability</source>
          , vol.
          <volume>22</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>67</fpage>
          -
          <lpage>120</lpage>
          ,
          <year>2012</year>
          . doi:
          <volume>10</volume>
          .1002/stvr.430.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>D.</given-names>
            <surname>Hao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , H. Mei,
          <article-title>Test-case prioritization: achievements and challenges</article-title>
          ,
          <source>Frontiers of Computer Science</source>
          , vol.
          <volume>5</volume>
          , no.
          <issue>6</issue>
          , pp.
          <fpage>769</fpage>
          -
          <lpage>777</lpage>
          ,
          <year>2011</year>
          . doi:
          <volume>10</volume>
          .1007/s11704-011-1070-1.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>H.</given-names>
            <surname>Jiang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , W. Chan,
          <string-name>
            <given-names>T.</given-names>
            <surname>Tse</surname>
          </string-name>
          ,
          <article-title>A new method for prioritizing test cases based on coverage criteria</article-title>
          ,
          <source>In Proceedings of the ACM Symposium on Applied Computing</source>
          ,
          <year>2009</year>
          , pp.
          <fpage>1080</fpage>
          -
          <lpage>1085</lpage>
          . doi:
          <volume>10</volume>
          .1145/1529282.1529515.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>IL.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , S. Elbaum,
          <article-title>Amplifying tests to prioritize and diversify fault detection</article-title>
          ,
          <source>In Proceedings of the 36th International Conference on Software Engineering (ICSE)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>841</fpage>
          -
          <lpage>851</lpage>
          . doi:
          <volume>10</volume>
          .1145/2568225.2568256.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          , T. Liu,
          <string-name>
            <given-names>J.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            <surname>Zhou</surname>
          </string-name>
          ,
          <article-title>Prioritizing test cases based on usage patterns in production</article-title>
          ,
          <source>Empirical Software Engineering</source>
          , vol.
          <volume>25</volume>
          , no.
          <issue>3</issue>
          , pp.
          <fpage>1865</fpage>
          -
          <lpage>1900</lpage>
          ,
          <year>2020</year>
          . doi:
          <volume>10</volume>
          .1007/s10664-019-09764-z.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>D.</given-names>
            <surname>North</surname>
          </string-name>
          ,
          <string-name>
            <surname>Introducing</surname>
            <given-names>BDD</given-names>
          </string-name>
          ,
          <year>2006</year>
          . URL: https://dannorth.net/introducing-bdd/.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Xie</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <article-title>Mining user interaction logs for test prioritization</article-title>
          ,
          <source>Journal of Systems and Software</source>
          , vol.
          <volume>149</volume>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>18</lpage>
          ,
          <year>2019</year>
          . doi:
          <volume>10</volume>
          .1016/j.jss.
          <year>2018</year>
          .
          <volume>11</volume>
          .013.
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Amrit</surname>
          </string-name>
          ,
          <string-name>
            <surname>J. van Hillegersberg</surname>
          </string-name>
          ,
          <article-title>Detecting errors in ERP systems using log analysis, Decision Support Systems</article-title>
          , vol.
          <volume>50</volume>
          , no.
          <issue>2</issue>
          , pp.
          <fpage>557</fpage>
          -
          <lpage>569</lpage>
          ,
          <year>2010</year>
          . doi:
          <volume>10</volume>
          .1016/j.dss.
          <year>2010</year>
          .
          <volume>08</volume>
          .003.
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>E.</given-names>
            <surname>Daka</surname>
          </string-name>
          , G. Fraser,
          <article-title>Generating test data with feature diversity</article-title>
          ,
          <source>In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE)</source>
          ,
          <year>2014</year>
          , pp.
          <fpage>466</fpage>
          -
          <lpage>476</lpage>
          . doi:
          <volume>10</volume>
          .1145/2635868.2635890.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Kochhar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Gao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Lo</surname>
          </string-name>
          ,
          <article-title>Understanding the test practices and challenges of Android developers</article-title>
          ,
          <source>In Proceedings of the 38th International Conference on Software Engineering (ICSE)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>564</fpage>
          -
          <lpage>575</lpage>
          . doi:
          <volume>10</volume>
          .1145/2884781.2884857.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bianchi</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Nunes</surname>
          </string-name>
          ,
          <article-title>Behavior-aware mobile app testing using usage analytics</article-title>
          ,
          <source>In Proceedings of the 43rd International Conference on Software Engineering (ICSE)</source>
          ,
          <year>2021</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>12</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICSE.
          <year>2021</year>
          .
          <volume>00012</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>P.</given-names>
            <surname>Sharma</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Patel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Basak</surname>
          </string-name>
          ,
          <article-title>User behavior modeling for fault localization in mobile apps</article-title>
          ,
          <source>IEEE Software</source>
          , vol.
          <volume>39</volume>
          , no.
          <issue>1</issue>
          , pp.
          <fpage>45</fpage>
          -
          <lpage>52</lpage>
          ,
          <year>2022</year>
          . doi:
          <volume>10</volume>
          .1109/MS.
          <year>2021</year>
          .
          <volume>3111294</volume>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>