-

10.1007/978

Correction to Operationalize Movement Trajectory Classification

Bowen Xi

bowenxi@asu.edu 0

Kevin Scaria

kscaria@asu.edu 0

Divyagna Bavikadi

Paulo Shakarian

1 0 Arizona State University , Tempe, Arizona , USA 1 Syracuse University , Syracuse , New York

2025

Classification of movement trajectories has many applications in transportation and is a key component for largescale movement trajectory generation and anomaly detection which has key safety applications in environments with unseen movement types. However, the current state-of-the-art (SOTA) are based on supervised deep learning - which leads to challenges when they encounter novel unseen classes. We provide a neuro-symbolic rule-based framework to conduct error correction and detection of these models to integrate into our movement trajectory platform. We provide a suite of experiments on several recent SOTA models where we show highly accurate error detection, the ability to improve accuracy on test data that includes novel movement types not seen in training set, and accuracy improvement for the base use case in addition to a suite of theoretical properties that informed algorithm development. Specifically, we show an F1 scores for predicting errors of up to significant performance increase for unseen movement accuracy ( 8.51% improvement over SOTA for zero-shot accuracy), and accuracy improvement over the SOTA model.

Classification

0.984,

1. Introduction

ISSN1613-0073

In what follows, we provide further background on our domain problem and our current trajectory analysis platform (some of which is a review of [ 6 ]), introduce the algorithmic framework for EDCR including it’s theoretical properties, and provide our suite of experimental results before concluding with our findings and future work.

2. Background

Overall concept and deployed system. Movement types not typically included in the ground truth data emerge with certain target environments (e.g, paid scooters in certain urban areas, auto-rickshaws in South Asia, or boats in Venice). As a result, IARPA (Intelligence Advanced Research Projects Activity) has identified problems relating to the characterization and generation of normal movement as a key problem of study in the HAYSTAC program. Here, the goal is to establish models of normal human movement at a fine-grain level and operationalize those models and techniques in a system deployed to a government environment for evaluation. As a performer on the program, [ 6 ] examine the problem of generating realistic movement trajectories.

Initial government tests for trajectory generation involved movement trajectories consisting of only a single mode of transportation. However, in preparation for the transition to operational use, the government has set requirements to analyze trajectories from various movement types - where the mode of transportation is not known. As such, we look to operationalize a movement trajectory classification module, which we have depicted in the context of our deployed cloud-based architecture shown in Figure 1.

This pipeline interfaces with the government system to access the raw geospatial data with related knowledge for various geolocations as well as historical agent trajectories and their corresponding objective files. Our initial ingest and containerized processes are held in a directed acyclic graph (DAG) as nodes. Our ingest mechanism first parses for the historical trajectories associated with a given agent to stage them in the S3 bucket. Then, geospatial data stored in Neo4j is consolidated into a knowledge graph and staged into the S3 bucket. We instantiate pods on the Amazon Elastic Kubernetes Service (EKS) cluster for all agents with a Docker image to analyze the staging folders and create the respective string commands specific to each agent. The trajectory classification module identifies and tags the modes of transportation in the corresponding trajectory, which is further used to learn rules while considering diferent types of movements. These rules along with the knowledge graph are used to compute the heuristic value for an informed search method (A* search) to generate movement trajectories [ 6 ]. As the container runs, generated movement instruction files are pushed to the appropriate output directory as seen in Figure 1. Additionally, the generated movement abides by predefined spatiotemporal constraints (objectives).

Movement Trajectory Classification Problem. The problem of classifying movement trajectories has been studied in the literature [ 7, 8, 9, 5, 4 ] and we shall refer to it as the movement trajectory classification problem (MTCP). We also note that this line of work difers from and is complementary to trajectory generation [ 6, 10, 11, 12 ] which does not seek to identify the mode of transportation. An MTCP instance is defined as given a sequence of GPS points, , assign one of movement class from which is often defined [ 4, 5 ] as = { walk, bike, bus, drive, train}.

The current paradigm for the MTCP problem is to create a neural model that maps sequences to movement classes using a set of weights, . In this approach traditional methods (i.e., gradient descent) are used to find a set of parameters such that a loss function is minimized based on some training set (where each sample ∈ is associated with a ground truth class () ). Formally: arg min ∈ Loss( (), ()) . Within this paradigm, several approaches have been proposed. Most notably a CNN-based architecture [ 5 ] and the current state-of-the-art approach known as Long-term Recurrent Convolutional Network (LRCN) [ 4 ] which combines lower CNN layers and upper LSTM layers - both of which we use as baselines in addition to an extension of LRCN that uses an additional attention head (LRCNa).

Limitations of Current SOTA. However, there are several limitations to these approaches that are problematic in the context of the IARPA HAYSTAC use case.

• Not designed for unseen movements. Any supervised MTCP model requires a data set whose movement classes match the target environment. To address the more dynamic needs of our government customer, we require approaches that can identify when they are likely to give incorrect results to adapt to novel environments. • All classes known a-priori. In the prior work, set is treated as static and complete meaning that novel movement types not in training set will not be properly classified and not identified as being diferent than a movement type in class . • Previous Results Evaluated on Overlapping Training and Testing Sets. As noted in [ 13 ], the standard evaluation of MTCP approaches has been on datasets that experience leakage between train and test. Due to the operational nature of this work, we must examine other splits.

Deploying movement trajectory classification models to a certain environments can lead to movements not seen in training (e.g., paid scooters are not seen in training but prevalent in certain urban areas). Hence, these “novel movements” will inherently be classified incorrectly. The common element in all of these limitations is an understanding of when such classifiers are likely wrong. However, this goes beyond retraining or selecting from diferent training data as the government customer envisions use-cases with unseen movement types- hence training data would be limited. This generally precludes meta-learning and domain generalization [ 14, 15, 16, 17 ] which attempt to account for changes in the distribution of data and/or selection of a model that was trained on data similar to the current problem. This work also difers from approached like One-Class Support Vector Machine [ 18] because of the inherent rule-based method in EDCR that can be leveraged for explainability and can further be built upon machine learning models.

Additionally, these problems must also be addressed in the context of our existing system (Figure 1), which employs symbolic reasoning to generate movement trajectories - ensuring they attain a degree of normalcy [ 6 ]. As a result, we examined approaches for characterizing failures in machine learning models such as introspection [ 19, 20 ], however, these approaches only predict model failure and do not attempt to explain or correct it. Another area of related work is machine learning verification that [ 21, 22, 23 ]) that looks to ensure the output of an ML model meets a logical specification - however to-date this work has not been applied to correct the output of a machine learning model and generally depend on the logical specifications being known a-priori (not an assumption we could make for our use case). In recent studies on abductive learning [24, 25] and neural symbolic reasoning [26], incorporate error correction mechanisms rooted in inconsistency with domain knowledge as logical rules - but as with verification, we do not have this symbolic knowledge a-priori.

3. Error Detection and Correction Rules

To address the issues of the previous section, we are employing a rule-based approach to correcting MTCP model . The intuition is that using limited data, we will learn a set of rules (denoted Π) that will be able to detect and correct errors of by logical reasoning [27]. Then, upon deployment for some new sequence , we would first compute the class () and then use the rules in set Π to conclude if the result of should be accepted and if not, provide an alternate class in an attempt to correct the mistake. In this section, we formalize the error correcting framework with a simple first order logic (FOL) and provide analytical results relating aspects of learned rules that inform our analytical approach to learning such error detecting and correcting rules. We complete the section with a discussion on how various potential “failure conditions” are extracted to create the rules to correct errors.

In this paper, we shall assume a set of operational sequences for which there is ground truth available after model training. This set can be the set of training data, a subset, or a superset. We notate the set of training data with . Later, in our experiments, we look at cases where = and ⊆

- however these are not requirements as our results are based on model performance on - and we envision use-cases where is significantly diferent from . On these samples, for each class , the model ( ) returns class for of the samples, and for each class we have the number of true positives , false positives , true negatives , and false negatives . We have precision = / = /( + ), recall = /( + ), and prior of predicting class : = / .

Language. We use a simple first-order language where samples are represented by constant symbols ( ). We define set of “condition” unary predicates 1, … , associated with each sample that can be either true or false - these are conditions that can be thought of as potentially leading to failure (but our learning algorithms will identify which ones lead to failure for a given prediction). These predicates can also be features related to a sample in the dataset. We also define unary predicates for each class : , , and

defined below. • • • if () = . prediction.

: True if and only if the model predicts class i.e., () is true if () = . : This predicate is true if and only if the correct movement class for is , i.e., () is true : This predicate is true if and only if an EDCR rule concludes there is an error in the model’s Rules. The set of rules Π will consist of two rules for each class: one “error detecting” and one “error correcting.” Error detecting rules which will determine if a prediction by is not valid. In essence, we can think of such a rule as changing the movement class assigned by to some sample from to “unknown.” For a given class , we will have an associated set of detection conditions that is a subset of conditions, the disjunction of which is used to determine if gave an incorrect classification. () based on a subset of conditions-class pairs are used to correct the class of a given sample.

⊆ × . The disjunction of such condition-class pairs (1) (2)

Support w.r.t. class ( ): given the subset of samples where the model predicts class , the fraction of those samples where the body is true (note the denominator is ).

Confidence ( ): the number of times the body and head are true together divided by the number of times the body is true.

Now we present some analytical results that inform our learning algorithms. Our strategy for learning involves first learning detection rules (which establish conditions for which a given classification decision by is deemed incorrect) and then learning correction rules (which then correct the detected errors by assigning a new movement class to the sample). We formalize these two tasks as follows. Improvement by error detecting rule. For a given class , find a set of conditions such that precision is maximized and recall decreases by, at most .

Improvement by error correcting rule. For a given class , find a subset of × such that both precision and recall are maximized. improvement in precision.2 Properties of Detection Rules. First, we examine the efect on precision and recall when an error detecting rule is used. Our first result shows a bound on precision improvement. If class support ( ) is less than 1 − , which we would expect (as the rule would be designed to detect the 1 − portion of results that failed), then we can also show that the quantity ⋅ gives us an upper bound on the Theorem 1. Consider an error detecting rule with support and confidence , initial precision of model for class , then under the condition ≤ 1 − , the precision of model for class , after applying the error detecting rule increases by a function of both and . The increase is no greater than ⋅ and this quantity is a normalized polymatroid submodular function with respect to the set of conditions in the rule

The error detecting rules can cause the recall to stay the same or decrease. Our next result tells us precisely how much recall will decrease.

Theorem 2. After applying the rule to detect errors, the recall will decrease by (1 − ) is a normalized polymatroid submodular function with respect to the set of conditions in the rule . and this quantity

Algorithm 1 DetRuleLearn

Require: Class , Recall reduction threshold , Condition set Ensure: Subset of conditions ∶= ∅ while ∗ ∶= { ∈

s.t. Add

∗ ≠ ∅ do = arg max∈ to

∗ ∶= { ∈ ∖ end while return {} ≤ ⋅ }

∗

∪{} s.t. ∪{} ≤ ⋅ }

As the quantities identified Theorems 1 and 2 are submodular and monotonic, we can see that the selection of a set of rules to maximize ⋅ subject to the constraint that (1 − ) the “Submodular Cost Submodular Knapsack” (SCSK) problem and can be approximated with a simple greedy algorithm [28] with approximation guarantee of polynomial run time (Theorem 4.7 of [28]). Our algorithm DetRuleLearn is an instantiation of such an approach to creating an error detecting rule for a given class that maximize precision while not reducing recall more than . Here, is treated as a hyperparameter. Also, and

for some set

and are true errors (for set of condition class pairs and the rule of interest, are simply the number of samples that satisfy the conditions ) and non-errors (for

). In other words, given a here is the number of examples that satisfy the ≤ is a special case of

2Complete proofs for all formal results can be found at

body (class-condition pair) of the error detection rules, and here is the number of examples that satisfy the body (class-condition pair) and the head of the error detection rules. , are precision and recall for class while is the number of samples that the model classifies as class . Properties of Corrective Rules. In what follows, we shall examine the results for corrective rules. Here, the error correcting rule with predicate

in the head will have a disjunction of elements of set ⊆ ×

. Also, note that here the support is used instead of class support ( ). Here we find that both precision and recall increase with rule confidence (Theorem 3).

Theorem 3. For the application of error correcting rules, both precision and recall increase if and only if rule confidence ( ) increases.

This result suggests that optimizing confidence will optimize both precision and recall. However, this is not a monotonic function over , so we adopt a fast, heuristic approach for non-monotonic optimization based on [29], presented by CorrRuleLearn in this paper. Here, we will consider an initial set of condition-class pairs error correcting rule, we select that is a subset of ×

. For a given class for which we create an from this larger set using our approach. Note here that is the number of samples that satisfy the rule body and head ( () in this case) given a set of condition-class pairs while

is the number of samples that satisfy the body formed with set .

Algorithm 2 CorrRuleLearn

Require: Class , Set of condition-class pairs Ensure: Subset of condition-class pairs ∶= ∅ ′ ∶= for (, ) ∈ ∶= ∶= if ≥ else ′ ∶= ′ ∖ {(, )}

selected in order of the sorted list do Sort each (, ) ∈ from greatest to least by {(,)} and remove {(,)} {(,)} {(,)} ≤ Learning Detection and Correction Rules Together. Error correcting rules created using CorrRuleLearn will provide optimal improvement to precision and recall for the rule in the target class, but in the case of multi-class problems, it will cause recall to drop for some other classes. However, we can combine error detecting and correcting rules to overcome this dificulty. The intuition is first to create error detecting rules for each class, which efectively re-assigns any sample into an “unknown” class. Then, we create a set (used as input for CorrRuleLearn) based on the conditions selected by the error detecting rules. In this way, we will not decrease recall beyond what occurs in the application of is quadratic in the number of conditions and linear in the number of samples. However, in practice it actually performs better, as the outer loop iterates significantly less than the total number of conditions and the number of selected conditions is reduced with each iteration. Likewise, the algorithm CorRuleLearn is linear in the number of samples and linear in the number of condition-class pairs. error detecting rules.

Algorithmic Eficiency.

We note that these algorithms are quite eficient. For example, DetRuleLearn

Algorithm 3 DetCorrRuleLearn

Require: Recall reduction threshold , Condition set Ensure: Set of rules Π Π ∶= ∅

∶= ∅ for Each class do ∶= DetRuleLearn(, , ) if ≠ ∅ then Π ∶= Π∪ { () ∶= CorrRuleLearn(, ) if ≠ ∅ then Π ∶= Π∪ approaches to this. First, we use a binary version of the classifier – for given class , we have a binary classifier which returns “true” for sample if assigns it as and “false” otherwise. In this way, for each sample we have a () condition for each of the classes. The second way we create conditions is based on outlier behavior based on the velocity of the vehicle in the sample. Here, if the velocity of a given sample is above a threshold (based on the maximum value for ground truth in the training data) this velocity condition is true - and it is false otherwise.

4. Experimental Evaluation

Experimental Setup. Previous work such as [ 4 ] is known to have data leakage based on the split between training and test primarily due to segments of a movement sequence existing in both training and test sets [ 13 ]. In this paper, we examine a training-test split with no overlap between the two avoiding this error and more closely resembling our target use-case. The assessments in this paper used GPS trajectories obtained from the GeoLife project [ 7 ] which include ground truth (note that ground truth data for our target application was unavailable at the time of this writing). All experiments were conducted on an NVIDIA A100 GPU using Python 3.10, with an 80/20 train–test split. Source code is available via https://github.com/lab-v2/Error-Detection-and-Correction.

Error Detection Experiments. First we examined the ability of learned error detection rules to detect errors in the underlying model. Here we examined three base model architectures CNN [ 5 ], LRCN [ 4 ], and our version of LRCN with an additional attention head (LRCNa). In this experiment, error detection rules were trained from the same training data as the model. Similar to previous work on examining the ability to detect errors in a machine learning model [ 19 ] we evaluated precision, recall, and F1 of the ability of rules to identify errors. These can be thought of as the fraction of results where our learned error detection rules correctly return an error (error precision), the fraction of errors identified Evaluated Model LRANa LRCN CNN (EDCR) consistently high precision and recall for detecting errors across all model types - specifically obtaining a 0.875 F1 for errors in the SOTA model (LRCN) and a top F1 of 0.984 (for CNN).

Test Data with Additional Classes. A key set of concerns for our use-case was the ability to deploy movement trajectory classification in an environment where the data difers from the training data - specifically containing previously unidentified classes. To examine this, we trained CNN, LRCN, and LRCNa models without incorporating the walk and drive classes (Figure 2). We note here both detection and correction are used. We initially learned the EDCR rules with the same training data in the model - which results in no sample being corrected to a class unseen in training data and efectively is zero-shot tuning of the base model by EDCR. However, due to detection, this still resulted in accuracy improvements of 6.41%, 8.51%, and 7.76% for LRCNa, LRCN, and CNN respectively. We then added few-shot samples from the unseen data (the x-axis of Figure 2) giving us few-shot tuning of the base model. Here with only 20% of the samples with the unseen classes, we obtained an overall accuracy of 0.65 on all three models representing a 17 − 18% improvement. We note these results are obtained without direct access to the underlying model, which may indicate that EDCR has the potential for adaptation of arbitrary models to novel scenarios - a key use case for our government customer. Precision-Recall Trade-of.

A key intuition in our algorithmic design with the ability for the hyperparameter to to trade-of precision and recall. Hence, we examined the efect in varying on test data that resembled training data (results for LRCN are shown in Figure 3). Recall that is interpreted as the maximum decrease in recall. We observed and validated the theoretical reduction (TR) in recall empirically and the experiments show us that in all cases, recall was no lower than the threshold specified by the hyperparameter though recall decreases as increases. In many cases, the experimental evaluation reduced recall significantly less than expected. We also see a clear relationship between , precision, and recall: increasing leads to increased precision and decreased recall - which also aligns with our analytical results. We also note that while DetCorrRuleLearn calls for a single hyperparameter, it is possible to set it diferently for each class (e.g., lower values for classes where recall is important, higher values for classes where false positives are expensive). This may be beneficial as F1 for diferent classes seemed to peak for diferent values of . We leave the study of heterogeneous settings to future work.

Evaluated Model LRCNa LRCN CNN

No EDCR (baseline)

Accuracy Improvement via EDCR. We also investigated EDCR’s ability to provide overall accuracy improvement to the base model. Here we trained each of the three models (LRCNa, LRCN, CNN) and associated EDCR rules (on the same training data as the model) and evaluated the overall accuracy on the test set both with and without applying rules (see Table 3). We found that that EDCR provided a noticeable improvement in both LRCN and LRCNa models - efectively establishing a new SOTA when evaluated with no overlap between training and testing. We also examined other splits between training and testing (not depicted) and obtained comparable results.

5. Conclusion

We propose a rule-based framework for the error detection and correction of supervised neural models for classification of movement trajectories. Our framework uses the training data to learn rules to be employed in the testing phase. Firstly, we use the detection rules to identify the movement trajectories that are misclassified by the supervised model and then we use the correction rules to re-classify the movement. Further, we formally prove the relation of confidence and support of the learned rules to the changes in the classification metrics like precision and recall. To show EDCR’s emperical validation, we first report the framework’s ability to identify errors with the F1 scores going up to 0.984. We also show overall accuracy imporvement over the SOTA model by employing the EDCR framework. Our framework is specifically useful in cases of encountering novel classes not seen in training data as shown by a 8.51% improvement of unseen movement accuracy over SOTA for zero-shot tuning. Additionally, we discuss operationalizing our trajectory classification method in our deployed system. There are several directions for future work. First, we look to explore other methods to create the conditions, in particular leveraging ideas from conformal prediction [30]. Another direction is to look at alternative solutions to learn the rules allowing for more complicated rule structures. Human validation of the rules responsible for a corrected label can be conducted for further evaluation. Finally, the use of rules for error detection and correction of machine learning models presented here may be useful in domains such as vision. To reliably incorporate vision models in real-world applications for tasks like object detection, image classification, and motion tracking, etc., EDCR framework can be leveraged to improve the overall system’s accuracy and robustness by identifying and correcting it’s misclassification.

Ethical Statement

There are no ethical issues.

6. Acknowledgments

This research is supported by the Intelligence Advanced Research Projects Activity (IARPA) via the Department of Interior/ Interior Business Center (DOI/IBC) contract number 140D0423C0032. The U.S. Government is authorized to reproduce and distribute reprints for Governmental purposes notwithstanding any copyright annotation thereon. Disclaimer: The views and conclusions contained herein are those of the authors and should not be interpreted as necessarily representing the oficial policies or endorsements, either expressed or implied, of IARPA, DOI/IBC, or the U.S. Government. Additionally, some of the authors are supported by ONR grant N00014-23-1-2580 and ARO grant W911NF-24-1-0007.

7. Declaration on Generative AI

During the preparation of this work, the author(s) used Grammarly in order to: Grammar and spelling check. After using this tool/service, the author(s) reviewed and edited the content as needed and take(s) full responsibility for the publication’s content.

arXiv:1607.08665 [cs]. [18] H. J. Shin, D.-H. Eom, S.-S. Kim, One-class support vector machines‚Äîan application in machine fault detection and classification, Computers & Industrial Engineering 48 (2005) 395–408. URL: https://www.sciencedirect.com/science/article/pii/S0360835205000100. doi:https://doi.org/10. //arxiv.org/abs/1707.00051. doi:10.1109/LRA.2018.2857402. arXiv:1707.00051 [cs]. ral network controllers using taylor model preconditioning, in: Computer Aided Verification: 33rd International Conference, CAV 2021, Virtual Event, July 20–23, 2021, Proceedings, Part I, Springer-Verlag, 2021, pp. 249–262. URL: https://doi.org/10.1007/978-3-030-81685-8_11. [22] K. Jothimurugan, S. Bansal, O. Bastani, R. Alur, Compositional reinforcement learning from logical specifications, in: Advances in Neural Information Processing Systems, 2021. [23] M. Ma, J. Gao, L. Feng, J. Stankovic, Stlnet: Signal temporal logic enforced multivariate recurrent neural networks, Advances in Neural Information Processing Systems 33 (2020) 14604–14614. [24] Y.-X. Huang, W.-Z. Dai, Y. Jiang, Z.-H. Zhou, Enabling knowledge refinement upon new concepts [25] W.-Z. Dai, Q. Xu, Y. Yu, Z.-H. Zhou, Bridging machine learning and logical reasoning by abductive in abductive learning (2023). learning, NeurIPS 32 (2019).

2022. [26] C. Cornelio, J. Stuehmer, S. X. Hu, T. Hospedales, Learning where and when to reason in neurosymbolic inference, in: The Eleventh International Conference on Learning Representations, [27] D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. Shakarian, PyReason: Software for open world temporal logic, in: AAAI Spring Symposium, 2023. [28] R. Iyer, J. Bilmes, Submodular optimization with submodular cover and submodular knapsack constraints, in: Proceedings of the 26th International Conference on Neural Information Processing Systems - Volume 2, NIPS’13, Curran Associates Inc., Red Hook, NY, USA, 2013, p. 2436–2444. [29] N. Buchbinder, M. Feldman, J. Naor, R. Schwartz, A tight linear time (1/2)-approximation for unconstrained submodular maximization, in: 2012 IEEE 53rd Annual Symposium on Foundations of Computer Science, 2012, pp. 649–658. doi:10.1109/FOCS.2012.73. Conformal prediction for uncertainty-aware planning with difusion dynamics model, in:

NeurIPS, volume 36, 2023, pp. 80324–80337. URL: https://proceedings.neurips.cc/paper_files/paper/2023/file/

A. Appendix

A.1. Proof of Theorem 1 Under the condition ≤ 1 − , the precision of model for class , with initial precision , after applying an error detecting rule with support and confidence increases by a function of and and is no greater than ⋅ and this quantity a normalized polymatroid submodular function with respect to the set of conditions in the rule

Proof. CLAIM 1: The precision of model for class , with initial precision , after applying an error detecting rule with support and confidence increases by: 1 − ( + − 1) (3) The total number of items that will attempt to classify as before error detection is = + . Out of those, ⋅ will be detected by the rule. However, a fraction of (1 − ) will be samples that would have been true positives if not detected. Hence, the new precision can be written as follows: As ⋅ = , we have: Now we subtract from that quantity the initial precision.

− (1 − ) ⋅

− ⋅ ⋅ − (1 − ) ⋅ (1 − ) = − (1 − )

(1 − ) − (1 − ) −

(1 − ) = − (1 − ) − (1 − ) (1 − ) ∈ CLAIM 2: If ≤ 1 − then ⋅ is a upper bound on the improvement in precision.

BWOC, then by Claim 1 we have.

However, as ≤ 1 this is a contradiction.

CLAIM 3: ⋅ = /

where satisfied.

Let is equivalent to the statement of the claim.

CLAIM 4: The quantity ⋅ is submodular w.r.t. set .

We show this by the submodularity of We can re-write ( 1 ∪ 1) as:

is the number of samples where both the rule body and head are be the number of samples that the body of the rule is true. This gives us ⋅ = which as is a constant as well as the result of Claim 3. BWOC, is not submodular for some set

. We use the symbol () existence of two sets of conditions 1, 2. Then, the following must be true: to denote this and assume the Substituting this back into inequality 16, we can re-write the right-hand side as: = = ( ( ( Which give us our contradiction.

CLAIM 5: ⋅ monotonically increases with .

By claim 1, as the quantity equals / and is a constant, we just need to show monotonicity of . Clearly increases monotonically as additional elements in can only make it increase. CLAIM 6: When = ∅ , ⋅ = 0.

Follows directly from the fact that we define as zero is no conditions are used.

Proof of theorem. Follows directly from claims 1-6.

A.2. Proof of Theorem 2 After applying the rule to detect errors, the recall will decrease by (1 − ) and this quantity is a normalized polymatroid submodular function with respect to the set of conditions in the rule . Proof. CLAIM 1: After applying the rule to detect errors, the recall will decrease by (1 − ) . The number of corrections made by the rule is ( + ) with (1 − ) fraction of these being incorrect (so the false negatives increases by ( + )1 − )). Note that the sum + does not change after error detection, as any true positive “detected” as being incorrect becomes a false negative, and false negatives do not otherwise change from error detection. Therefore, the new recall is: When this quantity is subtracted from the original recall ( ), we obtain: − (1 − )( + )

+ = (1 − ) ( =

− (1 − )( + ) − + − ( − (1 − )( + ))

+ = (1 − )( + ))

+ + ) + + = (1 − ) ( + ) + We note that = − = − ⋅ which gives us: (1 − ) ( +

− ⋅ ( + ) ( + )

) = (1 − ) ( + − )

= (1 − ) (28) (29) (30) (31) (32) (33) (34) CLAIM 2: (1 − ) conditions in the rule is a normalized polymatroid submodular function with respect to the set of . Note that is the number of samples that satisfy the body, while is the number of samples that satisfy the body and head, = − = (1 − = =

)

1 for

statement of the theorem follows.

A.3. Proof of Theorem 3 As 1 is a constant, we need to show the submodularity of which follows the same argument as per Claim 4 of Theorem 1. Likewise, is monotonic (mirroring the argument of Claim 5 of Theorem 1) and normalized by the definition of in the case where there are no conditions. The For the application of error correcting rules, both precision and recall increase if and only if rule confidence ( ) increases.

Proof. CLAIM 1: Precision increases by − .

The new precision is equal to the following:

+ The improvement of the precision can be derived as follows.

+ + + + − = = = = = + − + − − + + + − + − − CLAIM 2: If count of samples satisfying both rule body and head (the numerator of confidence) increases, then precision increases.

Suppose BWOC the claim is not true. Then for some value of for which the improvement in precision is greater than ′ = + 1

. Note that, in this case, the number of samples satisfying the body also increases by 1. First, we know that we can re-write the result of claim 1 as follows. Therefore, using the result from Claim 1, the following relationship must hold. > > + 1 − This gives us a contradiction, as (1 − ) ≥ 0 and ≤

CLAIM 3: If the diference in precision increases, the number of samples satisfying both rule body and By definition, the only way for this to occur is if increases and does not - as they can both increase. If neither there is no change, and it is not possible for to increase . Therefore the following must be true. < However, this is clearly a contradiction the expression on the right is clearly smaller (the numerator is smaller as is positive, and the denominator is larger).

CLAIM 4: Precision increases if and only if increases.

Follows directly from claims 1-3.

CLAIM 5: When adding more samples that satisfy the body of the rule, confidence increases if and only head must increase. increase or only without if

increases. but not true.

+ + < < < 2 >

1 > 2 > 1 1 2

to increase alone. Therefore, BWOC, the following must hold This is a contradiction as ≥

Going other way, suppose BWOC confidence increases but POS does not. We get: However, by the statement, as we add more samples that satisfy the body of the rule, we must have 1 ≤

2. Hence a cotnradiction.

CLAIM 6: Recall increases if and only if

increases.

As we can write the new recall in this case simply as the following, the claim immediately follows. CLAIM 7: Recall increases if and only if increases.

Proof of theorem.

Follows directly from claims 5-6.

Follows directly from claims 4 and 7.

A.4. Overall Accuracy Results for Other Data Splits Previous work such as [ 4 ] is known to have data leakage based on the split between training and test primarily due to segments of a movement sequence existing in both training and test sets resulting from ransom assignment to each. To address this data leakage issue, we examine our algorithms under various conditions based on ordering and overlap. For ordering, we examine random (which can allow previous behavior of the same agent in the training set, as in previous work) and sequential (which

Random LRCNa (ours) 0.747 0.751 LRCNa+EDCR (ours) 0.759 (+1.6%) 0.763 (+1.6%) LRCN (prev. SOTA) 0.749 0.747 LRCN+EDCR (ours) 0.761 (+1.6%) 0.760 (+1.7%) CNN 0.742 0.755 CNN+EDCR (ours) 0.743 (+0.1%) 0.755 (± 0%)

No Overlap

Sequential Random (least leakage) (known leakage, prev. studies) 0.971 0.971 (± 0%) 0.952 0.952 (± 0%) 0.851 0.866 (+1.8%)

Segment Overlap

Sequential

[1]

Huang , Y. Cheng, R. Weibel, Transport mode detection based on mobile phone network data: A systematic review , Transportation Research Part C: Emerging Technologies 101 ( 2019 ) 297 - 312 .

[2]

Lin ,

W.-J.

Hsu , Mining gps data for mobility patterns: A survey , Pervasive and mobile computing 12 ( 2014 ) 1 - 16 .

[3]

Fikioris ,

Patroumpas ,

Artikis ,

Pitsikalis , G. Paliouras, Optimizing vessel trajectory compression for maritime situational awareness , GeoInformatica 27 ( 2023 ) 565 - 591 .

[4]

Kim ,

J. H.

Kim ,

Lee , Gps data-based mobility mode inference model using long-term recurrent convolutional networks , Transportation Research Part C: Emerging Technologies 135 ( 2022 ) 103523 .

[5]

Dabiri ,

Heaslip , Inferring transportation modes from gps trajectories using a convolutional neural network, Transportation research part C: emerging technologies 86 ( 2018 ) 360 - 371 .

[6]

Bavikadi ,

Aditya ,

Parkar ,

Shakarian , G. Mueller,

Parvis , G. I. Simari , Geospatial trajectory generation via eficient abduction: Deployment for independent testing , in: Proceedings of the 40th International Conference on Logic Programming (ICLP 2024 ), 2024 .

[7]

Zheng ,

Li ,

Chen ,

Xie , W.-Y. Ma, Understanding mobility based on gps data , in: Proceedings of the 10th international conference on Ubiquitous computing , 2008 , pp. 312 - 321 .

[8]

Wang , G. Liu,

Duan , L. Zhang, Detecting transportation modes using deep neural network , IEICE TRANSACTIONS on Information and Systems 100 ( 2017 ) 1132 - 1135 .

[9]

Simoncini ,

Taccari ,

Sambo ,

Bravi ,

Salti ,

Lori , Vehicle classification from lowfrequency gps data with recurrent neural networks , Transportation Research Part C: Emerging Technologies 91 ( 2018 ) 176 - 191 .

[10]

Janner ,

Li ,

Levine , Ofline reinforcement learning as one big sequence modeling problem , in: Advances in Neural Information Processing Systems , 2021 .

[11]

Chen ,

Lu ,

Rajeswaran ,

Lee ,

Grover ,

Laskin ,

Abbeel ,

Srinivas , I. Mordatch , Decision transformer: Reinforcement learning via sequence modeling , CoRR abs/2106 .01345 ( 2021 ). URL: https://arxiv.org/abs/2106.01345. arXiv: 2106 . 01345 .

[12]

Itkina ,

M. J.

Kochenderfer , Interpretable self-aware neural networks for robust trajectory prediction , 2022 . arXiv: 2211 . 08701 .

[13]

Zeng ,

Yu ,

Chen ,

Yang ,

Zhang ,

Wang , Trajectory-as-a-sequence: A novel travel mode identification framework 146 ( 2023 ) 103957 . URL: https://www.sciencedirect.com/science/ article/pii/S0968090X22003709. doi:https://doi.org/10.1016/j.trc. 2022 . 103957 .

[14]

Hospedales ,

Antoniou ,

Micaelli , A . Storkey, Meta-learning in neural networks: A survey , IEEE transactions on pattern analysis and machine intelligence 44 ( 2021 ) 5149 - 5169 .

[15]

Zhou ,

Liu ,

Qiao ,

Xiang ,

C. C.

Loy , Domain generalization: A survey , IEEE Transactions on Pattern Analysis and Machine Intelligence ( 2022 ).

[16]

Vanschoren , Meta-learning: A survey , arXiv preprint arXiv: 1810 . 03548 ( 2018 ).

[17]

Maes , D. Nardi, Meta-level architectures and reflection ( 1988 ).

[19]

Daftry ,

Zeng ,

J. A.

Bagnell , M. Hebert, Introspective perception: Learning to predict failures in vision systems , 2016 . URL: http://arxiv.org/abs/1607.08665. doi: 10 .48550/arXiv.1607.08665.

[20]

M. S.

Ramanagopal ,

Anderson ,

Vasudevan , M. Johnson-Roberson, Failing to learn: Autonomously identifying perception failures for self-driving cars 3 ( 2018 ) 3860 - 3867 . URL: http:

[21]

Ivanov ,

Carpenter ,

Weimer ,

Alur , G. Pappas, I. Lee , Verisig 2 .0: Verification of neu[30]

Sun ,

Jiang ,

Qiu ,

Nobel ,

M. J.

Kochenderfer , M. Schwager,