<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Towards New Data Quality Rules for Modeling Data Change</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author role="corresp">
							<persName><forename type="first">Nishttha</forename><surname>Sharma</surname></persName>
							<email>sharmn99@mcmaster.ca</email>
							<affiliation key="aff0">
								<orgName type="institution">McMaster University</orgName>
								<address>
									<settlement>Hamilton</settlement>
									<region>ON</region>
									<country key="CA">Canada</country>
								</address>
							</affiliation>
							<affiliation key="aff1">
								<address>
									<settlement>Barcelona</settlement>
									<country key="ES">Spain</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Towards New Data Quality Rules for Modeling Data Change</title>
					</analytic>
					<monogr>
						<idno type="ISSN">1613-0073</idno>
					</monogr>
					<idno type="MD5">0485A1239A1B7DB8A8B794C589535EDB</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2025-04-23T18:37+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Data Dependencies</term>
					<term>Dynamic Data Dependencies</term>
					<term>Change Exploration</term>
					<term>Change Dependency</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>Data is not static, and attribute value changes often trigger changes in another set of attributes. Traditional methods for analyzing data changes often treat these changes in isolation, failing to consider the broader context in which they occur. This lack of contextual awareness limits the ability to capture relationships between attributes or interpret their significance, especially when distinguishing between normal variations and potential anomalies. In this paper, we discuss the importance of context-awareness and the need to identify normal change behaviour. To achieve this, we introduce a new data quality rule, called change rule, capable of capturing changes in both antecedent and consequent attributes within ordered tuples of a relational instance.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>In real-world datasets, values rarely remain static as data continuously changes over time. These changes often carry critical information, revealing patterns, trends, and triggers that are essential for understanding environmental conditions, system, user behaviour and trends. Existing database systems have limited functionality to manage changes, and to identify abnormal changes, often relying on triggers to recognize out-of-bound changes. In this work, we consider changes to relational attributes for an entity. To simplify our setting, attribute changes are modeled as a sequence of ordered tuples, implicitly with respect to time. Hence, a tuple represents the value each attribute holds for an entity at a specific point in time.</p><p>Data changes occur in numeric and non-numeric attributes. Changes to numeric attributes are often measured using absolute difference, percentage change, rate of change, rolling average <ref type="bibr" target="#b0">[1]</ref>. While these metrics are easy to compute, they fail to capture the broader context of the change, such as the influence of related attributes or the significance of the change.</p><p>For non-numeric attributes, changes are often measured using edit distances (Levenshtein, Jaro-Winkler, Hamming) <ref type="bibr" target="#b1">[2]</ref> or set-based coefficients (Overlap, Jaccard, Dice) <ref type="bibr" target="#b2">[3]</ref>. However, these metrics are insufficient because they ignore the semantic meaning of the changes and the context in which they occur. Context is critical because it provides the necessary information to interpret the significance of a change. Without context, changes are reduced to isolated events, which can lead to misleading interpretations of the data change.</p><p>Example 1. Table <ref type="table" target="#tab_0">1</ref> shows two employees (Emp) E1 and E2 and their Position, Salary and number of employees managed (EmpMng) as of a specific Year. Consider the following changes and the need for greater context: Numeric attribute value changes: As observed in tuples 𝑡1 − 𝑡3 of Table <ref type="table" target="#tab_0">1</ref>, after only two years as a Software Developer, E1 was promoted to the position of Senior Software Developer accompanied by a significant increase in salary ($68,400 to $82,000). Whereas tuples 𝑡9 − 𝑡13 show that E2 spent four years as a Software Developer before being pro- moted to Senior Software Developer with a similar salary increase as E1's (from $71,500 to $80,000). Changes in salary are typically quantified using percentage change (+19.9% for E1 and +11.9% for E2). While this provides a numerical summary of the change, it fails to account for the broader context. For instance, E1 received a larger raise after a shorter tenure and took on the responsibility of managing four employees, whereas E2 had to wait twice as long for a similar promotion and gained the responsibility of managing two fewer employees compared to E1.</p><p>Non-numeric changes within and between classes: Traditional edit distance metrics such as Levenshtein distance (LD) quantify changes based on character modifications. The transition from Software Developer to Senior Software Developer has an LD = 7, whereas for Senior Software Developer to Lead Developer, LD = 13. These values suggest that the latter change is almost twice as significant as the former despite both changes being promotions to the next position within the same class (development roles), as shown in Figure <ref type="figure" target="#fig_0">1</ref>.</p><p>The implications of a change can be much greater between different classes. For instance, the LD between Lead Developer and Manager is 11 which suggests that this transition is smaller than the transition from Senior Software Developer to Lead Developer (LD = 13). However, this inter-  <ref type="table" target="#tab_0">1</ref> pretation is misleading. The change from Lead Developer to Manager represents a more significant career shift as compare to the change from Senior Software Developer to Lead Developer (where both positions are in the same class), as it involves moving from a development role to a managerial position (2 levels up as per Figure <ref type="figure" target="#fig_0">1</ref>) which is accompanied by a significant increase in the number of people managed. Existing distance measures fail to capture semantic interpretations of the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Problem 1:</head><p>The need for context. The example highlights that not all changes are equally significant. Context is often needed to interpret data change, and there is a need to augment existing distance metrics with context.</p><p>While identifying (significant) changes is important, it is equally critical to differentiate between normal changes vs abnormal changes. Traditional methods have used declarative methods such as data dependencies of the form 𝑋 → 𝑌 , where 𝑋, 𝑌 are attribute sets, representing antecedent and consequent attributes. Order Dependencies (OD) <ref type="bibr" target="#b3">[4]</ref>, Sequential Dependencies (SD) <ref type="bibr" target="#b4">[5]</ref>, and Differential Dependencies (DD) <ref type="bibr" target="#b5">[6]</ref> specify expected relationships between attribute sets. ODs introduce ordering relationships but do not explicitly quantify changes in attribute values. SDs model consequent attribute changes but do not account for variations in the antecedent attributes. DDs, while addressing changes in both antecedent and consequent attributes, apply to unordered data.</p><p>Example 2. Consider a sequential dependency (SD) stating that when ordered by Position, the change in Salary between consecutive tuples should be between 5% and 20%. For E1, the SD is violated between 𝑡3 and 𝑡4 with a salary change of 2.6% falling below the range. It is also violated between 𝑡7 and 𝑡8 , where the salary change (23.8%) exceeds the upper bound. These violations help in identifying abnormal changes. However, we also want to identify patterns where different changes in the antecedent attributes, such as changes within Position will elicit different changes in the consequent (Salary). For instance, with no change in position, salary still changes annually by 2% to 10%. Whenever there is a promotion (change in position &gt; 0), the salary always changes by 10% to 25%. The existing dependencies do not capture relationships of this form.</p><p>To address this, we define a data quality rule called change rule. The change rule captures relationships between changes in attribute values of an ordered relational instance. Exhaustive enumeration of all attribute sets and their values is not feasible, and efficient methods to evaluate the large space of attribute sets are needed.</p><p>• Filtering spurious changes. Rule mining is known to produce spurious rules. Determining which changes are most relevant and defining (support) measures that filter less meaningful changes is necessary.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.2.">Contributions</head><p>We expect to make the following contributions.</p><p>• Context-aware change metric: A metric for quantifying changes in both numeric and non-numeric attributes, augmenting them with contextual information from related attributes.</p><p>• Change rules: A new rule that captures the relationship of changes from one attribute set 𝑋 to another attribute set 𝑌 across ordered tuples.</p><p>• Change rule discovery algorithm: An efficient discovery algorithm for change rules over ordered datasets. The algorithm adapts the FastDD method to handle ordered data <ref type="bibr" target="#b6">[7]</ref>, and identifies changes in sequential attribute values using context-aware metrics.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Related Work</head><p>We discuss the relationship of our work to existing metrics, data dependencies, association rules and statistical/ML approaches.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.1.">Similarity, Distance Metrics</head><p>Traditional numeric metrics analyze individual attributes in isolation, missing contextual relationships between changes in different attributes. Measures of central tendency (mean, median, mode) summarize values but can be skewed by outliers. Dispersion metrics (variance, standard deviation, IQR) capture data spread, but also prone to outlier sensitivity. Shape distribution measures (skewness, kurtosis, CV) describe asymmetry and variability but can be biased when data is highly skewed or sparse <ref type="bibr" target="#b0">[1]</ref>.</p><p>For non-numeric (categorical, text) data, cosine similarity is commonly used. Cosine similarity measures the cosine of the angle between vectors <ref type="bibr" target="#b7">[8]</ref> and is often used with embeddings to capture semantic similarity. Overlap, Jaccard, and Dice Coefficients <ref type="bibr" target="#b2">[3]</ref> are used to quantify the similarity and diversity of sets. Edit distance such as Levenshtein, Jaro-Winkler, Hamming quantifies the number of operations needed to transform one string into another <ref type="bibr" target="#b1">[2]</ref>.</p><p>While these metrics are widely used, they do not capture semantic distances. For example, "Software Developer" and "Senior Software Developer" have a high edit distance despite being closely related in meaning. Embedding-based approaches (e.g., BERT) address this by capturing contextual meaning but require pre-trained models and domain-specific tuning. An effective approach for measuring semantic similarity between non-numeric values is to compute cosine similarity on BERT embeddings, which allows for a contextaware representation of the data.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.2.">Data Dependencies</head><p>Order Dependencies (ODs) extend functional dependencies by enforcing ordering relationships <ref type="bibr" target="#b3">[4]</ref>. They ensure that a positive change in the antecedent corresponds to a positive change in the consequent. However, the semantics of ODs do not declaratively capture the change in any attribute values.</p><p>Sequential Dependencies (SDs) declaratively specify the change in consequent attributes <ref type="bibr" target="#b4">[5]</ref>. They enforce constraints on how the consequent changes in response to an instance ordered on the antecedent, i.e. when the instance is ordered on 𝑋, the changes in the consecutive 𝑌 -values will be within a range 𝑔. However, they fail to capture the change in the antecedent. Conditional SDs (CSDs) focus on identifying intervals within ordered data that satisfy a given SD. They prefer larger, contiguous intervals that capture a substantial portion of the data satisfying the embedded SD. However, the continuity of these intervals requires a trade-off with the specificity of the bound 𝑔, which is not addressed in the paper.</p><p>Differential Dependencies (DDs) model differences between any two tuples in a relation independent of the tuple ordering, i.e., if the antecedent attribute differences lie within a range 𝑔𝑥, then the consequence attribute value differences must lie within a range 𝑔𝑦 <ref type="bibr" target="#b5">[6]</ref>. By not capturing order, DDs miss critical contextual information like trends or patterns across consecutive tuples.</p><p>TSDDs <ref type="bibr" target="#b8">[9]</ref> designed for time-series data capture temporal relationships by treating data within a given time window as an ordered set and supporting real-valued function operations. However, similar to SDs, they do not account for changes in the antecedent attributes over time. Additionally, selecting an optimal time window remains a challenge, as an overly narrow window may overlook significant trends, while a broader one risks diluting the relevance of dependencies.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.3.">Association Rules</head><p>Association rules identify co-occurrences of items within a dataset, typically expressed in the form of {𝐴, 𝐵} → 𝐶, stating that if items 𝐴 and 𝐵 appear together, then 𝐶 is likely to appear as well <ref type="bibr" target="#b9">[10]</ref>. Unlike data dependencies, which enforce constraints that all instances must satisfy, association rules identify probabilistic relationships without guaranteeing consistency. Dependencies ensure structural integrity, while association rules uncover patterns that may not hold universally.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.4.">Statistical and Machine Learning Approaches</head><p>Statistical and machine learning approaches leverage patterns in historical data to identify deviations that fall outside expected behavior. Statistical methods rely on predefined thresholds and assumptions about data distribution, while machine learning approaches adapt to complex, highdimensional datasets.</p><p>Statistical and machine learning approaches offer complementary techniques for identifying and differentiating normal and abnormal changes in data. Statistical techniques include rule-based thresholds and hypothesis testing. For example, Z-scores and modified Z-scores are commonly used to detect anomalies by measuring how far a data point deviates from the mean, relative to the standard deviation <ref type="bibr" target="#b0">[1]</ref>. For example, if a data point's z-score exceeds a certain threshold (e.g., 3), it may be flagged as abnormal. Similarly, control charts and statistical process control (SPC) methods monitor data streams over time, flagging points that fall outside control limits as potential anomalies <ref type="bibr" target="#b10">[11]</ref>.</p><p>Machine learning provides various techniques for distinguishing normal from abnormal changes in data, particularly through anomaly detection algorithms. Isolation Forest <ref type="bibr" target="#b11">[12]</ref> isolates anomalies by partitioning the dataset into smaller subsets. Points that require fewer partitions to be isolated are identified as anomalies. This method works well in high-dimensional data but may struggle with datasets containing overlapping clusters or anomalies that are close to the decision boundary.</p><p>While numerous anomaly detection methods exist, our approach specifically targets anomalies in the change of attribute values. We achieve this by defining a change rule that not only identifies abnormal behavior but also captures the relationships between changes across multiple attributes.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Preliminaries</head><p>Let 𝑅 be a relational schema on attributes 𝐴1, 𝐴2, ..., 𝐴𝑁 , and 𝑋 and 𝑌 be sets of attributes such that 𝑋 ⊆ 𝑅 and 𝑌 ⊆ 𝑅. Let 𝐼 = {𝑡1, 𝑡2, ..., 𝑡𝑁 } be a relational instance of 𝑅 with 𝑁 tuples, ordered on X (implicitly ordered on time). The distance between consecutive tuples in 𝐼 for an attribute 𝐴 is given via a context-aware distance measure: 𝑑𝑖𝑠𝑡(𝑡𝑖[𝐴], 𝑡𝑖+1[𝐴]). We define a permissible range for 𝑑𝑖𝑠𝑡 as 𝑔𝐴 = (𝑝, 𝑞), where 𝑝, 𝑞 are real values, i.e., if</p><formula xml:id="formula_0">𝑑𝑖𝑠𝑡(𝑡𝑖[𝐴], 𝑡𝑖+1[𝐴]) ∈ 𝑔𝑎, then 𝑝 ≤ 𝑑𝑖𝑠𝑡(𝑡𝑖[𝐴], 𝑡𝑖+1[𝐴]) ≤ 𝑞.</formula><p>We define a support function 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝜎, 𝐼)) that measures the relative strength of a change rule 𝜎 in 𝐼. Naturally, we seek high-support rules to ensure that they have sufficient evidence in the instance. We introduce change rules in the next section, and focus on their discovery (as part of Problem 2). Problem Definition: Given a minimum support threshold 𝜃, find all change rules Σ such that 𝐼 satisfies Σ (𝐼 |= Σ), such that for all 𝜎 ∈ Σ, 𝑠𝑢𝑝𝑝𝑜𝑟𝑡(𝜎, 𝐼) ≥ 𝜃.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Current Work: Change Rules</head><p>A change rule is a novel data quality rule which describes a relationship between the changes in attributes within 𝐼. It states that when the change in the antecedent is within some range 𝑔𝑥 = (𝑔 𝑥𝑙 , 𝑔𝑥𝑢), then the corresponding change in the consequent will also be within a defined range 𝑔𝑦 = (𝑔 𝑦𝑙 , 𝑔𝑦𝑢). DEFINITION 1. Let 𝜋 be the permutation of tuples of 𝐼 increasing on 𝑋 (that is, 𝑡 𝜋 <ref type="bibr" target="#b0">(1)</ref> [𝑋] &lt; 𝑡 𝜋 <ref type="bibr" target="#b1">(2)</ref> [𝑋] &lt; . . . &lt; 𝑡 𝜋(𝑁 ) [𝑋]).</p><p>Change rule 𝜎 :</p><formula xml:id="formula_1">𝑋𝑔 𝑥 → 𝑌𝑔 𝑦 holds over 𝐼 if for all 𝑖 such that 1 ≤ 𝑖 ≤ 𝑁 − 1, when 𝑑𝑖𝑠𝑡(𝑡 𝜋(𝑖) [𝑋], 𝑡 𝜋(𝑖+1) [𝑋]) ∈ 𝑔𝑥 then 𝑑𝑖𝑠𝑡(𝑡 𝜋(𝑖) [𝑌 ], 𝑡 𝜋(𝑖+1) [𝑌 ]) ∈ 𝑔𝑦.</formula><p>When ordered on X, if the 𝑑𝑖𝑠𝑡 between any two consecutive 𝑋-values is within the range 𝑔𝑥 then the 𝑑𝑖𝑠𝑡 between the corresponding 𝑌 -values must be within 𝑔𝑦. A change rule with a minimum support threshold 𝜃 holds when at least 𝜃% pairs of consecutive tuples in the instance satisfy the conditions of the change rule. This rule states that if the change in Position is between 5 and 15, then the change in Salary will be between 10% to 25%. This holds true for most of the table except when E2 is promoted from Senior Software Developer to Lead Developer. In this case, the salary increase is only 8.7%, which is below the expected 10% to 25% increase. This deviation from the rule highlights that the employee received a smaller-than-normal raise with their promotion.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>Example 3. Consider the change rule over</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.1.">Discovery of Change Rules</head><p>We build upon the Differential Dependency discovery algorithm, FastDD <ref type="bibr" target="#b6">[7]</ref> over unordered data.</p><p>• Diff-Set Construction: Encodes pairwise differences between all tuples into a diff-set, where each element represents a differential constraint violation (e.g., 𝑡𝑖[𝐴] − 𝑡𝑗[𝐴] &gt; 𝜑), where 𝑡𝑖 and 𝑡𝑗 are any two tuples in a relational instance and 𝜑 is a numerical value. For change rules, we modify this step by using a sorted instance 𝐼 on the antecedent attributes 𝑋 to compute the 𝑑𝑖𝑠𝑡 between consecutive pairs of tuples. This eliminates the redundant comparisons by restricting diff-set construction to adjacent tuple pairs in the sorted instance 𝐼.</p><p>• Set Cover Enumeration: Finds minimal subsets of differential functions (antecedent) that cover all violations of the consequent. For change rules, instead of fixed thresholds, use intervals 𝑔𝑥 and 𝑔𝑦 for antecedent and consequent gaps. That is, we find the minimal subsets of 𝑑𝑖𝑠𝑡(𝑡 𝜋(𝑖) [𝑋], 𝑡 𝜋(𝑖+1) [𝑋]) ∈ 𝑔𝑥 that cover all violations of 𝑑𝑖𝑠𝑡(𝑡 𝜋(𝑖) [𝑌 ], 𝑡 𝜋(𝑖+1) [𝑌 ]) ∈ 𝑔𝑦.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusion and Next Steps</head><p>Data changes over time, however, we want to capture relationships between these changes. In this paper, we discussed the importance of context-awareness when capturing these changes and the relevance of identifying normal change behaviour. We introduced a new data rule, called change rule, that captures the relationship between the changes in antecedent and the changes in the consequent.</p><p>As next steps, we plan to address the aforementioned problems and challenges:</p><p>• Exploring transformer-based embeddings (e.g., BERT), to quantify and accurately capture contextaware changes in numeric and non-numeric data without compromising semantic information.</p><p>• Optimize Set Cover Enumeration by developing an efficient method to minimize the search space when identifying minimal subsets of antecedent changes that explain consequent violations.</p><p>• Consider the lagged effects of earlier changes on subsequent changes, i.e., the change in an attribute at one time step influences changes at a later time step.</p></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: Hierarchy of the company in Table 1</figDesc><graphic coords="2,82.69,65.60,192.30,149.48" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_1"><head>Problem 2 :</head><label>2</label><figDesc>Differentiating normal vs. abnormal data change. Existing dependencies do not capture the dependence between changes from antecedent attributes to consequent attributes on ordered tuples. A declarative specification is needed that models the expected range of value change between attribute sets. We propose change rules to address this problem. 1.1. Challenges • Context representation: Context helps to interpret the significance of data changes. Which attributes, and which subset of values are used to provide this context? Is this context time-dependent? How are existing distance measures augmented to consider this context? • Efficient rule mining: Manual specification of change rules is not practically feasible, and automated solutions are needed. Determining dependent sets of attributes is important towards identifying meaningful data change.</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Example employee changes in position, salary.</figDesc><table><row><cell>𝑡 𝐼𝐷</cell><cell cols="2">Year Emp</cell><cell>Position</cell><cell>Salary</cell><cell>EmpMng</cell></row><row><cell>𝑡 1</cell><cell>2012</cell><cell>E1</cell><cell>Software Dev</cell><cell>65,000</cell><cell>0</cell></row><row><cell>𝑡 2</cell><cell>2013</cell><cell>E1</cell><cell>Software Dev</cell><cell>68,400</cell><cell>0</cell></row><row><cell>𝑡 3</cell><cell>2014</cell><cell>E1</cell><cell>Sr. Software Dev</cell><cell>82,000</cell><cell>4</cell></row><row><cell>𝑡 4</cell><cell>2015</cell><cell>E1</cell><cell>Sr. Software Dev</cell><cell>84,100</cell><cell>5</cell></row><row><cell>𝑡 5</cell><cell>2016</cell><cell>E1</cell><cell>Sr. Software Dev</cell><cell>86,700</cell><cell>5</cell></row><row><cell>𝑡 6</cell><cell>2017</cell><cell>E1</cell><cell>Lead Dev</cell><cell>96,700</cell><cell>25</cell></row><row><cell>𝑡 7</cell><cell>2018</cell><cell>E1</cell><cell>Lead Dev</cell><cell>105,000</cell><cell>25</cell></row><row><cell>𝑡 8</cell><cell>2019</cell><cell>E1</cell><cell>Manager</cell><cell>130,000</cell><cell>140</cell></row><row><cell>𝑡 9</cell><cell>2015</cell><cell>E2</cell><cell>Software Dev</cell><cell>64,500</cell><cell>0</cell></row><row><cell>𝑡 10</cell><cell>2016</cell><cell>E2</cell><cell>Software Dev</cell><cell>67,000</cell><cell>0</cell></row><row><cell>𝑡 11</cell><cell>2017</cell><cell>E2</cell><cell>Software Dev</cell><cell>69,200</cell><cell>0</cell></row><row><cell>𝑡 12</cell><cell>2018</cell><cell>E2</cell><cell>Software Dev</cell><cell>71,500</cell><cell>0</cell></row><row><cell>𝑡 13</cell><cell>2019</cell><cell>E2</cell><cell>Sr. Software Dev</cell><cell>80,000</cell><cell>2</cell></row><row><cell>𝑡 14</cell><cell>2020</cell><cell>E2</cell><cell>Sr. Software Dev</cell><cell>82,100</cell><cell>3</cell></row><row><cell>𝑡 15</cell><cell>2021</cell><cell>E2</cell><cell>Sr. Software Dev</cell><cell>84,000</cell><cell>3</cell></row><row><cell>𝑡 16</cell><cell>2022</cell><cell>E2</cell><cell>Sr. Software Dev</cell><cell>88,400</cell><cell>4</cell></row><row><cell>𝑡 17</cell><cell>2023</cell><cell>E2</cell><cell>Lead Dev</cell><cell>96,100</cell><cell>28</cell></row></table></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_1"><head>Table 1 :</head><label>1</label><figDesc>𝜎 : 𝑃 𝑜𝑠𝑖𝑡𝑖𝑜𝑛 (5,15) → 𝑆𝑎𝑙𝑎𝑟𝑦 (0.1,0.25)</figDesc><table /></figure>
		</body>
		<back>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<monogr>
		<author>
			<persName><forename type="first">N</forename><forename type="middle">A</forename><surname>Heckert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">J</forename><surname>Filliben</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">M</forename><surname>Croarkin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hembree</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">F</forename><surname>Guthrie</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Tobias</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Prinz</surname></persName>
		</author>
		<title level="m">Nist/sematech e-handbook of statistical methods</title>
				<imprint>
			<date type="published" when="2002">2002</date>
			<biblScope unit="volume">151</biblScope>
		</imprint>
	</monogr>
	<note>Handbook</note>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">A guided tour to approximate string matching</title>
		<author>
			<persName><forename type="first">G</forename><surname>Navarro</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM computing surveys (CSUR)</title>
		<imprint>
			<biblScope unit="volume">33</biblScope>
			<biblScope unit="page" from="31" to="88" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">S</forename></persName>
		</author>
		<ptr target=".com/similarity-measures-and-graph-adjacency-with-sets" />
		<title level="m">Similarity measures and graph adjacency with sets</title>
				<imprint>
			<date type="published" when="2022-10-28">2022. 28-Oct-2022</date>
		</imprint>
	</monogr>
	<note>URL: towardsdatascience</note>
</biblStruct>

<biblStruct xml:id="b3">
	<monogr>
		<author>
			<persName><forename type="first">J</forename><surname>Szlichta</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Godfrey</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Golab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Kargar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1608.06169</idno>
		<title level="m">Effective and complete discovery of order dependencies via set-based axiomatization</title>
				<imprint>
			<date type="published" when="2016">2016</date>
		</imprint>
	</monogr>
	<note type="report_type">arXiv preprint</note>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Sequential dependencies</title>
		<author>
			<persName><forename type="first">L</forename><surname>Golab</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Karloff</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Korn</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Saha</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Srivastava</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the VLDB Endowment</title>
				<meeting>the VLDB Endowment</meeting>
		<imprint>
			<date type="published" when="2009">2009</date>
			<biblScope unit="volume">2</biblScope>
			<biblScope unit="page" from="574" to="585" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Differential dependencies: Reasoning and discovery</title>
		<author>
			<persName><forename type="first">S</forename><surname>Song</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Chen</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">ACM Transactions on Database Systems (TODS)</title>
		<imprint>
			<biblScope unit="volume">36</biblScope>
			<biblScope unit="page" from="1" to="41" />
			<date type="published" when="2011">2011</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Efficient differential dependency discovery</title>
		<author>
			<persName><forename type="first">S</forename><surname>Kuang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Yang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Tan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ma</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Proceedings of the VLDB Endowment</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="page" from="1552" to="1564" />
			<date type="published" when="2024">2024</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">A survey of text similarity approaches</title>
		<author>
			<persName><forename type="first">W</forename><forename type="middle">H</forename><surname>Gomaa</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">A</forename><surname>Fahmy</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">international journal of Computer Applications</title>
		<imprint>
			<biblScope unit="volume">68</biblScope>
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Tsddiscover: Discovering data dependency for time series data</title>
		<author>
			<persName><forename type="first">X</forename><surname>Ding</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Li</surname></persName>
		</author>
		<author>
			<persName><forename type="first">H</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Wang</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">2024 IEEE 40th International Conference on Data Engineering (ICDE)</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2024">2024</date>
			<biblScope unit="page" from="3668" to="3681" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Fast algorithms for mining association rules</title>
		<author>
			<persName><forename type="first">R</forename><surname>Agrawal</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Srikant</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Proc. 20th int. conf. very large data bases, VLDB</title>
				<meeting>20th int. conf. very large data bases, VLDB<address><addrLine>Santiago</addrLine></address></meeting>
		<imprint>
			<date type="published" when="1994">1994</date>
			<biblScope unit="volume">1215</biblScope>
			<biblScope unit="page" from="487" to="499" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Statistical process control charts as a tool for analyzing big data</title>
		<author>
			<persName><forename type="first">P</forename><surname>Qiu</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">Big and Complex Data Analysis: Methodologies and Applications</title>
				<imprint>
			<date type="published" when="2017">2017</date>
			<biblScope unit="page" from="123" to="138" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Isolation forest</title>
		<author>
			<persName><forename type="first">F</forename><forename type="middle">T</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">M</forename><surname>Ting</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z.-H</forename><surname>Zhou</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="m">in: 2008 eighth ieee international conference on data mining</title>
				<imprint>
			<publisher>IEEE</publisher>
			<date type="published" when="2008">2008</date>
			<biblScope unit="page" from="413" to="422" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
