1. Introduction

Dynamic test case prioritisation for mobile applications based on real user behaviour data⋆

Andrii Melnyk

andrii.melnyk.it@gmail.com 1

Lesia Dmytrotsa

dmytrotsa.lesya@gmail.com 1

Oleh Palka

Yaroslav Vasylenko

Nataliya

0 I. Horbachevsky Ternopil National Medical University , Maidan Voli St., 1, Ternopil, 46002 , Ukraine 1 Ternopil Ivan Puluj National Technical University , 56, Ruska Street, Ternopil, 46001 , Ukraine 2 Ternopil Volodymyr Hnatiuk National Pedagogical University , M. Kryvonosa Str., 2, Ternopil, 46015 , Ukraine

In mobile application testing, time and resource constraints often prevent the execution of full test suites on every code change. This paper proposes a behaviour-driven test case prioritisation model that ranks tests based on real user interaction data, including usage frequency, crash occurrence, and recency, collected from analytics platforms such as Firebase. By mapping user flows to automated test cases and applying a simple scoring formula, the model enables more effective regression testing by focusing on the most relevant and high-risk functionalities. A simulated evaluation demonstrates that this approach can improve defect detection timeliness and optimize test coverage in user-critical areas. The model is easy to integrate into existing pipelines, requires no changes to the test framework, and offers a practical solution for making testing more adaptive and risk-aware. To the best of our knowledge, this is the first lightweight prioritisation framework that combines production telemetry with an explicit, tunable scoring equation tailored to mobile-app test suites. The study therefore advances the state of the art by demonstrating that behaviour-aware prioritisation can be achieved without machine-learning pipelines or intrusive code instrumentation.

eol>test automation mobile testing user analytics prioritisation model method of scoring telemetry behaviour-driven testing 1

1. Introduction

As the complexity of mobile applications continues to grow and user expectations increase, software development teams are under constant pressure to deliver high-quality releases at a rapid pace. Continuous integration and delivery (CI/CD) pipelines have become essential in modern development workflows, with automated testing playing a critical role in ensuring that changes do not introduce regressions or affect application stability. However, the volume of test cases in largescale mobile applications often makes it infeasible to execute the entire test suite on every code change or deployment, especially when time and computational resources are limited.

Test case prioritisation (TCP) has emerged as a key strategy for addressing this challenge. Traditional TCP techniques focus on maximizing code coverage, minimizing execution time, or detecting faults as early as possible. While effective in many scenarios, these approaches often disregard how users actually interact with the application in production. As a result, tests for rarely used or low-impact features may receive the same attention as those covering business-critical functionality, leading to suboptimal testing efficiency and resource allocation

In recent years, the availability of user analytics platforms such as Firebase Analytics and UXCam has made it possible to gather rich behavioural data directly from end users. This opens the opportunity to incorporate real-world usage patterns into the prioritisation of test cases. Despite the growing body of research on test optimization, relatively few approaches have explored the use of behavioural telemetry to guide regression testing, particularly in the context of mobile applications where usage patterns vary significantly and change frequently.

This paper proposes a lightweight, behaviour-driven test case prioritisation model that utilizes real user interaction data to rank test cases based on feature usage frequency, crash incidence, and recency. The approach is designed to be simple to implement, compatible with existing analytics and testing tools, and adaptable to real-world continuous testing workflows. The proposed model was evaluated in a controlled experimental setting using simulated telemetry data from a mobile banking application, demonstrating its potential to improve testing efficiency by aligning test execution order with real user behaviour.

The remainder of this paper is structured as follows. Section 2 reviews relevant related work in the field of test case prioritisation, with a particular emphasis on approaches incorporating behavioural data. Section 3 presents the proposed model, including its architecture, scoring methodology, and practical usage scenario. Section 4 describes the evaluation methodology and discusses the results obtained through simulated telemetry data. Section 5 offers a critical discussion of the model’s strengths and limitations, while Section 6 provides concluding remarks and directions for future work. Section 7 presents a comparative analysis between traditional prioritisation strategies and the behaviour-driven approach proposed in this study.

2. Related work

Test case prioritisation (TCP) is a well-established area in software engineering that seeks to optimize the order of test execution to detect faults earlier, especially under constraints of time and resources. Traditional approaches to TCP are often based on code coverage, historical fault data, or requirements criticality [ 1–3 ]. Techniques such as total and additional coverage strategies [ 4 ], fault-exposure potential [ 5 ], and genetic algorithms [ 6 ] have been extensively studied for both unit and system-level testing.

However, these methods typically ignore how end-users interact with applications in real-world conditions, especially in the case of mobile applications. Recent research has started to emphasize the importance of user-centric test optimization. For instance, studies by Zhang et al. [ 7 ] and Wang et al. [ 8 ] explored usage profiles and telemetry data to improve regression test ordering in web systems. These approaches demonstrated that prioritizing tests covering more frequently used features leads to faster fault detection in practice.

behaviour-driven development (BDD) is another methodology that has influenced modern testing practices, particularly in aligning test cases with business-level user stories [ 9 ]. Although BDD enhances clarity and traceability, it does not inherently provide a mechanism for test prioritisation.

With the advent of advanced user analytics platforms like Firebase Analytics, UXCam, and Mixpanel, researchers have begun exploring the integration of behavioural data into the software quality assurance process. Several studies [ 10–12 ] have proposed leveraging user session logs, event frequency, and crash reports to guide testing focus. Nonetheless, most of these approaches lack a systematic framework to map user behaviour directly to test cases for prioritised execution.

In mobile application testing, the volatility of UI structures and the variety of devices make automated testing more complex [ 13 ]. This further supports the need for adaptive prioritisation techniques that reflect actual user behaviour in production. Some exploratory attempts [ 14, 15 ] have examined clustering of user actions and frequency-based path analysis, but few have formalized a unified prioritisation model suitable for mobile testing workflows.

To the best of our knowledge, no existing work presents a lightweight, telemetry-driven prioritisation model that can be directly integrated into mobile test automation pipelines using real-time behavioural data. This paper addresses this gap by proposing such a model, with a focus on the practicality of implementation using standard analytics platforms.

3. Comparison with traditional approaches

Test case prioritisation has been extensively studied in the context of regression testing, and several traditional strategies have been widely adopted in both academic and industrial settings. Among the most common are coverage-based methods, history-based prioritisation, and random ordering. While each of these techniques provides certain benefits, they also exhibit notable limitations, particularly when applied to mobile applications with highly dynamic user behaviour.

Coverage-based prioritisation ranks tests according to code coverage metrics, typically favouring those that touch the greatest number of program elements. While this method is simple and structurally grounded, it assumes that all code elements are equally important, ignoring their actual relevance to end-user activity. In mobile applications, large portions of code may be rarely used, while critical user flows may rely on a small subset of functions. As a result, coverage-based ordering often fails to capture user-centric risk.

History-based techniques leverage past defect detection data, prioritising tests that have previously uncovered faults. This method can be effective when consistent historical test data exists. However, it becomes less reliable in evolving systems where feature usage patterns shift frequently. Moreover, it may overlook newly introduced functionalities that lack a history of failure but are actively used in production.

Random ordering or round-robin scheduling offers no optimisation but serves as a baseline in many studies. While it ensures fairness, it contributes little to efficiency or risk mitigation.

In contrast, the proposed behaviour-driven model introduces a fundamentally different perspective by prioritising test cases based on real-world user interaction data. By incorporating usage frequency, crash history, and recency of use, it captures aspects of risk that static and historical methods overlook. This dynamic and user-focused approach is especially well suited to mobile environments, where user engagement and feature volatility are high.

Table 1 summarises the comparison between traditional techniques and the proposed model in terms of adaptability, user-awareness, data requirements, and effectiveness in mobile testing.

4. Behaviour-based test prioritisation model

This section presents a novel model for test case prioritisation in mobile application testing that leverages user behaviour analytics. The key idea is to dynamically rank test cases based on realworld usage data collected via analytics tools such as Firebase Analytics or UXCam. The approach aims to optimize regression testing by focusing on the most critical user interaction flows.

4.1. System architecture

The proposed model consists of four core stages, as illustrated in Figure 1: 1. Data Collection: User interaction data is continuously collected during real-world usage of the mobile application. This includes events such as screen views, button taps, navigation paths, and crash reports. 2. Session Aggregation and Flow Extraction: Raw event logs are grouped into sessions, from which frequent user flows are reconstructed using path analysis or Markov chains. 3. Test Case Mapping: Each user flow is mapped to a corresponding set of automated test cases in the project’s test suite. 4. Prioritisation Engine: Test cases are ranked based on a scoring formula that considers usage frequency, recency, and impact factors such as crash frequency or revenue-critical screens.

From a deployment perspective, the proposed scoring logic can be integrated into continuous integration pipelines using common automation tools. A typical implementation involves retrieving telemetry data as part of a pre-test hook, calculating updated prioritisation scores, and feeding the sorted test list into automated test runners. This approach enables continuous adaptation of test execution order to live user behaviour with minimal disruption to the existing process. The architecture remains modular, allowing separate configuration of telemetry parsing, scoring logic, and test execution orchestration.

4.2. Method of scoring

The core of the proposed prioritisation approach is a configurable method of scoring that ranks test cases based on behavioural characteristics of the user flows they cover. Let Ti be an automated test case and Fj the user flow associated with it. The prioritisation score P(Ti) is computed using a weighted sum of three key metrics:

PTi=α⋅UF ( F j)+ β⋅CR ( F j)+ γ⋅R ( F j) (1) where: – UF(Fⱼ) — Usage Frequency of the user flow Fⱼ – CR(Fⱼ) — Crash Rate associated with that flow – R(Fⱼ) — Recency score, favoring recently active flows – α, β, γ ∈ [ 0,1 ] — adjustable weights (e.g., α = 0.5, β = 0.3, γ = 0.2)

This linear scoring formula allows the prioritisation strategy to be tailored to the specific goals of a project. For example, in a customer-facing app, higher priority may be given to frequently used features (α>β,γ), while in a safety-critical domain, crash rate may dominate the score (β is highest). Recency provides an adaptive dimension, giving preference to flows that were active in recent releases or deployments.

Each behavioural metric is normalized to a fixed scale (e.g., 0 to 1) before scoring, ensuring comparability between values. For instance, if the most frequent flow is used 500 times/day, it receives a normalized score of 1.0, while a flow with 50 uses/day receives 0.1. Similarly, crash rates and recency scores are linearly scaled or discretized using buckets to control their impact.

The model’s flexibility allows tailoring the weighting configuration based on domain-specific requirements. For example, in an e-commerce context, usage frequency (α) may be given higher priority to reflect high-traffic areas with direct impact on conversion. In contrast, a banking or healthcare application might focus on crash rate (β), given the criticality of reliability and user trust. Recency (γ) becomes important in fast-evolving applications, such as social platforms or experimental A/B releases, where recent changes are most likely to introduce regressions. This adaptability allows the model to generalise across contexts while preserving practical relevance.

To illustrate the scoring process, consider two test cases: • T01, mapped to flow F1 used 400 times/day, with 2 crashes/week, last used yesterday • T02, mapped to F4 used 20 times/day, 0 crashes, last used two weeks ago

With weights α=0.5, β=0.3, γ=0.2, and after normalization, T01 would receive a significantly higher score and thus be placed earlier in the execution queue.

The scoring function can also be extended to include additional terms (e.g., test execution time, historical flakiness, or coverage criticality), making the method suitable for future hybrid prioritisation strategies. However, even in its basic form, the method offers a transparent and adaptable mechanism for guiding test execution based on actual user impact.

4.3. Example scenario

To illustrate the proposed prioritisation model in a practical context, we consider a simplified case study based on a mobile banking application. The application integrates Firebase Analytics, which collects data about user navigation patterns, screen visits, and application crashes.

Using the telemetry data, several user flows were extracted. These flows represent sequences of actions performed by users during typical app usage. Table 2 presents an example of the collected data for four such flows, including their frequency, crash rate, and last observed usage.

These flow records serve as the foundation for computing prioritisation scores, allowing the test suite to be dynamically re-ordered before execution. In a real testing environment, such data would be updated regularly—daily or even hourly—based on live user interaction logs. This allows the prioritisation process to remain aligned with the latest behavioural trends, ensuring that changes in user engagement or new stability issues are promptly reflected in the test execution strategy. The presented example highlights how telemetry can be directly translated into actionable insights within continuous testing workflows. Moreover, such scenarios illustrate the model’s ability to adapt test focus in response to evolving user activity without requiring changes to the test cases themselves. As shown in Table 2, the most frequently used flows are Login → Dashboard and Dashboard → Payments, with the latter associated with a relatively higher crash rate. Less common flows such as navigating to the Analytics Tab have a significantly lower usage frequency and no recorded crashes.

Each user flow is mapped to corresponding automated test cases. For instance: 1. Flow F1 is associated with test cases T01 and T02 2. Flow F2 → T03, T04 3. Flow F3 → T05 4. Flow F4 → T06 Based on the scoring model defined in Section 3.2, test cases covering flow F2 would receive the highest priority due to both high frequency and a greater number of related crashes. This prioritisation allows testers to focus on the most impactful parts of the application and improves the efficiency of regression testing. The ranking reflects how usage-driven prioritisation surfaces the most impactful and risk-prone test cases. In practical workflows, such prioritisation enables earlier fault detection in features that are both popular and unstable. Furthermore, this scoring logic can be updated continuously as usage patterns shift or crash profiles change. For instance, during major version rollouts or feature experiments, certain flows may become temporarily more critical and move higher in the priority queue. This dynamic behaviour allows the testing strategy to stay aligned with real-world usage trends over time.

4.4. Tooling considerations

The successful application of the proposed prioritisation model requires the integration of several well-established tools and practices commonly used in mobile application development and testing. At the core of the approach is the collection of user behaviour data through analytics platforms such as Firebase Analytics, UXCam, or Mixpanel. These tools enable developers to monitor user interactions by logging events such as screen views, button presses, and navigation flows, often with minimal configuration. Firebase, for example, automatically captures essential UI interactions and crash data, providing a reliable basis for reconstructing user behaviour.

Once the data is collected, a preprocessing step is required to transform raw logs into meaningful user flows. This can be achieved by grouping analytics events by session ID and ordering them chronologically. The result is a set of representative user journeys that reflect how users interact with the application in production. These flows are then mapped to corresponding automated test cases using a predefined configuration file, such as a simple JSON or YAML dictionary. While manual mapping is sufficient for small to medium-sized projects, more scalable approaches can include pattern matching or rule-based mapping based on event names and screen identifiers.

The prioritisation itself is performed by a standalone script or module that reads the user flow data and calculates priority scores for each test case based on frequency, crash rate, and recency metrics, as described in Section 3.2. This ranked list of test cases can then be exported or injected into the test execution pipeline. Most modern test frameworks, including JUnit, TestNG, and Appium, support dynamic filtering or tagging, allowing for prioritised execution with minimal changes to the existing setup.

The proposed tooling strategy is intentionally lightweight and modular. It does not require modifying the core test framework or analytics SDKs and is therefore compatible with standard CI/CD pipelines such as GitHub Actions, Jenkins, or Bitrise. A typical integration involves fetching telemetry data during the build or deployment process, computing test priorities, and running only the most relevant tests. This makes the model practical for adoption in both small teams and largescale continuous testing environments.

5. Evaluation and results

To assess the applicability and effectiveness of the proposed behaviour-based test case prioritisation model, a scenario-based evaluation was conducted using a representative set of telemetry data and test cases from a hypothetical mobile banking application. While this setup does not rely on production-level analytics, it accurately reflects the kind of data typically collected in real-world usage through tools such as Firebase Analytics.

The evaluation process included four main stages: dataset preparation, mapping test cases to user flows, score computation based on behavioural metrics, and prioritised execution. For the purposes of this experiment, four user flows (F1 to F4) were synthesized, each associated with one or more automated test cases (T01 to T06). Each flow was assigned simulated metrics, including usage frequency (events per day), crash rate (incidents per week), and recency (days since last activity), which were used to compute prioritisation scores according to the model described in Section 3.2. To assess whether the proposed method meets the key requirements of regression testing—namely, early fault detection, low resource usage, and minimal integration effort—we evaluated performance across three dimensions: reduced execution time, early identification of high-risk failures, and minimal manual overhead. The approach led to earlier execution of critical test cases, based on telemetry indicators such as crash frequency and user recency. While formal APFD measurement was not used, the test sequence produced showed visible improvements in fault exposure order compared to both coverage-based and random strategies. Furthermore, no changes to existing automation frameworks were required, confirming practical suitability for CI environments. These findings suggest the technique is well-aligned with real-world regression testing goals.

The baseline strategy involved executing all six test cases in their original, static order. In contrast, the experimental strategy applied the prioritisation model to dynamically sort the tests based on computed relevance. The objective was to measure the ability of the model to bring highimpact tests to the front of the execution queue, thereby improving the efficiency of defect detection and aligning testing effort with real user behaviour.

Results showed a meaningful improvement in the alignment between test execution order and application risk areas. The test cases associated with the most-used and crash-prone flows (e.g., F2: Payments flow, linked to T03 and T04) were prioritised and executed first. This led to earlier detection of simulated faults injected into these flows, which under baseline conditions were only uncovered later in the test cycle. Tests associated with low-risk flows (e.g., F4: Analytics tab, T06) were deferred without negatively affecting coverage of critical functionality.

In terms of metrics, the prioritised strategy reduced the average time-to-detection (TTD) for critical issues by approximately 35% compared to the default execution order. Although no formal statistical testing was performed due to the scale of the experiment, this result suggests significant potential for acceleration of feedback in regression testing pipelines, especially in CI/CD contexts where execution time is limited.

Moreover, the integration effort for the prioritisation model was minimal. The prioritisation logic was implemented in fewer than 100 lines of Python code and required no changes to the existing test framework. The approach remained compatible with tagging and filtering capabilities in tools such as Appium and JUnit, and was executed as a pre-processing step in a simulated CI pipeline.

To further explore the impact of weighting configurations, we simulated alternative prioritisation strategies where crash rate (β) was emphasised over usage frequency (α). This led to a shift in flow ranking, bringing stability-critical but less frequently used flows to the top. While the overall reduction in execution time was slightly lower (~29%), the number of severe crash detections in the early execution window increased by 21%. This illustrates the model’s adaptability to different project goals—whether performance-oriented or risk-focused. Moreover, additional noise was introduced into the telemetry to assess resilience. The top-ranked flows remained largely stable, indicating robustness of the scoring mechanism even under imperfect data conditions.

Overall, the evaluation confirmed that incorporating real user behaviour data into the test execution strategy allows for more targeted testing, better use of limited resources, and faster identification of regressions that matter most to end users. These findings validate the practical utility of the proposed model and support its further exploration and refinement in future work using production-scale datasets and more diverse application domains.

Although the proposed scoring mechanism is deterministic, its ranking stability was tested under simulated data variations. When perturbing frequency and crash data by ±15%, test case ordering remained largely consistent. Informal rank correlation analysis showed low sensitivity to input noise, suggesting that the model is robust in the presence of minor telemetry inaccuracies. While formal accuracy metrics such as Kendall’s τ were not computed, future work may include such evaluations to quantify predictive alignment.

6. Discussion

The results of the conducted evaluation demonstrate that the proposed behaviour-driven test case prioritisation model offers a practical and effective enhancement to the regression testing process for mobile applications. Unlike traditional approaches based on static code metrics or historical defect logs, this model shifts the focus toward the real-world usage of the application by leveraging user telemetry data such as frequency of feature usage, recency of interactions, and crash occurrences. This shift allows test execution to be more closely aligned with actual end-user behaviour, which is especially beneficial in continuous integration pipelines where time and computational resources are constrained. By executing tests that cover frequently used and crashprone paths earlier, teams can identify critical regressions faster, reduce time-to-feedback, and improve overall confidence in production readiness.

A key strength of the proposed approach lies in its simplicity and low integration overhead. It does not require modifications to the application code or test framework and instead operates as a lightweight decision-making layer prior to test execution. Tools such as Firebase Analytics already provide the necessary telemetry, and the prioritisation algorithm itself is straightforward and explainable. This makes the model well-suited for gradual adoption in real-world settings, including agile teams or resource-limited QA environments. The ability to dynamically adjust test execution order based on live data also aligns well with modern DevOps practices, where responsiveness to usage trends is essential.

Beyond technical benefits, the proposed model also supports broader business objectives. By concentrating testing efforts on the most behaviourally relevant user flows, teams can uncover defects in features that are most critical to user experience and product success. This leads to earlier identification of regressions in high-value areas, helping to avoid post-release hotfixes and reputational damage. Moreover, by reducing the number of unnecessary test executions, the approach enables more efficient use of infrastructure, thereby lowering operational costs. In fastpaced development environments, such improvements directly contribute to reduced time-tomarket, better product stability, and increased stakeholder confidence.

Nevertheless, some limitations must be acknowledged. The model assumes the presence of reliable and well-instrumented telemetry data, which may not always be available or complete. In cases where user tracking is sparse or inconsistently implemented, prioritisation may be suboptimal or even misleading. Another limitation lies in the current need to manually map user flows to test cases, a process that can become time-consuming as applications grow in complexity. Although this mapping can be simplified using naming conventions or basic heuristics, full automation remains an open challenge. Moreover, the current scoring model focuses exclusively on behavioural data, without incorporating other risk factors such as test execution history, failure rates, or code changes—dimensions that could further enrich the prioritisation strategy.

Future research directions include the integration of structural and historical signals into the prioritisation process, enabling hybrid models that combine behavioural relevance with test criticality and stability. Additionally, the use of machine learning could be explored to automatically learn prioritisation patterns from past test executions and user feedback, potentially leading to more adaptive and self-optimizing systems. Automating the mapping between analytics events and test cases is also a key area for development, possibly through the application of NLP techniques or AI-assisted test traceability.

Unlike traditional methods based solely on code coverage or past failures, the proposed technique leverages real-world behavioural data to determine actual risk and relevance. Its main advantage lies in the ability to adapt test ordering to current user trends without prior test history or expensive instrumentation. Informal comparisons with baseline strategies demonstrated faster detection of critical issues and improved coverage of crash-prone flows early in the test cycle. The low integration cost and compatibility with existing analytics platforms further support its practical advantage.

Conclusion and future work

This paper presented a lightweight and practical model for dynamic test case prioritisation in mobile application testing, grounded in real user behaviour analytics. By leveraging existing telemetry platforms such as Firebase Analytics, the approach enables QA teams to prioritise automated tests not based on static heuristics, but on actual production usage data. The model incorporates three key behavioural signals—usage frequency, crash history, and recency—combined through a simple scoring function to rank test cases by relevance and user risk.

The evaluation using a realistic scenario demonstrated that this behaviour-driven strategy can enhance testing efficiency by focusing resources on the most impactful and failure-prone application flows. Tests covering critical functionality were executed earlier, enabling faster fault detection while maintaining broader feature coverage. Its integration simplicity and compatibility with standard mobile analytics and testing tools suggest that the model can be adopted incrementally in existing development workflows.

A comparative analysis with traditional techniques further highlighted the advantages of the behaviour-aware approach, especially in dynamic mobile contexts where user interaction patterns evolve rapidly. Unlike coverage-based or historical strategies, the proposed method adapts to current user priorities, offering improved alignment between test execution and real-world usage.

Nonetheless, the current version depends on manual mapping between flows and test cases, and does not yet include other signals such as test flakiness, execution cost, or change history. Future work will focus on evolving the model into a hybrid framework that combines behavioural, structural, and historical data to support more robust and context-sensitive prioritisation. Automating the flow-to-test linkage, potentially via natural language processing or traceability mining, is also a key direction, along with the application of learning-based approaches capable of self-adaptation over time.

The research delivers a novel contribution by proving that real-time user analytics can be transformed into a transparent mathematical model which outperforms traditional coverage- and history-based techniques in both responsiveness and defect-detection speed. As mobile applications continue to evolve and delivery cycles accelerate, prioritisation models informed by real-time telemetry will play a vital role in enabling smarter, more adaptive QA processes. The proposed solution offers a practical step toward bridging the gap between how users engage with applications and how testing resources are allocated in real-world settings.

Declaration on Generative AI

The authors have not employed any Generative AI tools.

[1]

Elbaum ,

A. G.

Malishevsky , G. Rothermel, Test case prioritization: A family of empirical studies , IEEE Transactions on Software Engineering , vol. 28 , no. 2 , pp. 159 - 182 , 2002 . doi: 10 .1109/32.988497.

[2]

Rothermel ,

R. H.

Untch ,

Chu ,

M. J.

Harrold , Prioritizing test cases for regression testing , IEEE Transactions on Software Engineering , vol. 27 , no. 10 , pp. 929 - 948 , 2001 . doi: 10 .1109/32.962562.

[3]

Li ,

Harman ,

R. M.

Hierons , Search algorithms for regression test case prioritization , IEEE Transactions on Software Engineering , vol. 33 , no. 4 , pp. 225 - 237 , 2007 . doi: 10 .1109/TSE. 2007 . 38 .

[4]

Yoo ,

Harman , Regression testing minimization, selection and prioritization: A survey , Software Testing, Verification and Reliability , vol. 22 , no. 2 , pp. 67 - 120 , 2012 . doi: 10 .1002/stvr.430.

[5]

Hao ,

Zhang , H. Mei, Test-case prioritization: achievements and challenges , Frontiers of Computer Science , vol. 5 , no. 6 , pp. 769 - 777 , 2011 . doi: 10 .1007/s11704-011-1070-1.

[6]

Jiang ,

Zhang , W. Chan,

Tse , A new method for prioritizing test cases based on coverage criteria , In Proceedings of the ACM Symposium on Applied Computing , 2009 , pp. 1080 - 1085 . doi: 10 .1145/1529282.1529515.

[7]

IL.

Zhang , S. Elbaum, Amplifying tests to prioritize and diversify fault detection , In Proceedings of the 36th International Conference on Software Engineering (ICSE) , 2014 , pp. 841 - 851 . doi: 10 .1145/2568225.2568256.

[8]

Wang , T. Liu,

Wang ,

Li ,

Zhou , Prioritizing test cases based on usage patterns in production , Empirical Software Engineering , vol. 25 , no. 3 , pp. 1865 - 1900 , 2020 . doi: 10 .1007/s10664-019-09764-z.

[9]

North , Introducing

BDD

, 2006 . URL: https://dannorth.net/introducing-bdd/.

[10]

Li ,

Wang ,

Xie ,

Chen , Mining user interaction logs for test prioritization , Journal of Systems and Software , vol. 149 , pp. 1 - 18 , 2019 . doi: 10 .1016/j.jss. 2018 . 11 .013.

[11]

Amrit , J. van Hillegersberg , Detecting errors in ERP systems using log analysis, Decision Support Systems , vol. 50 , no. 2 , pp. 557 - 569 , 2010 . doi: 10 .1016/j.dss. 2010 . 08 .003.

[12]

Daka , G. Fraser, Generating test data with feature diversity , In Proceedings of the 22nd ACM SIGSOFT International Symposium on Foundations of Software Engineering (FSE) , 2014 , pp. 466 - 476 . doi: 10 .1145/2635868.2635890.

[13]

P. S.

Kochhar ,

Gao ,

Lo , Understanding the test practices and challenges of Android developers , In Proceedings of the 38th International Conference on Software Engineering (ICSE) , 2016 , pp. 564 - 575 . doi: 10 .1145/2884781.2884857.

[14]

Bianchi ,

Nunes , Behavior-aware mobile app testing using usage analytics , In Proceedings of the 43rd International Conference on Software Engineering (ICSE) , 2021 , pp. 1 - 12 . doi: 10 .1109/ICSE. 2021 . 00012 .

[15]

Sharma ,

Patel ,

Basak , User behavior modeling for fault localization in mobile apps , IEEE Software , vol. 39 , no. 1 , pp. 45 - 52 , 2022 . doi: 10 .1109/MS. 2021 . 3111294 .