Association Rule Mining to Study Process-Related Cause-Effect-Relationships in Pig Farming Tobias Zimpel1 , Andrea Wild2 , Hansjörg Schrade2 and Stefan Kirn1 1 University of Hohenheim, Schloß Hohenheim 1, Stuttgart, 70599, Germany 2 Boxberg Teaching and Research Centre, Seehöfer Str. 50, Boxberg, 97944, Germany Abstract Association rule mining is a technique for discovering relationships in large data sets and thus can obtain insights into process-related phenomena described by the data. In pig farming, necrosis (dead tissue) in the rearing period is an important phenomenon because it negatively affects animal welfare and reduces the share of usable young pigs. Pig rearing is a long-lasting process involving multiple stakeholders and management activities. Causes of necrosis are often unknown and their identification requires considerable time and effort. Pig rearing is a long-lasting process involving multiple stakeholders and management activities. Causes of necrosis are often unknown and their identification requires considerable time and effort. The objectives of this research are to (1) develop an association rule mining approach for generating plausible suggestions for cause-effect relationships of necrosis in pig rearing and (2) empirically evaluate the predictive power of the discovered rule set. We propose a procedure for generating and comparing association rules for different aggregation intervals. We used data from 672 pigs, collected over a ten-month period (Oct. 2018 - Apr. 2019 & Oct. 2019 - Dec. 2019). Association rules were created on the training set and tested on the test set. Association rules were evaluated using the metrics support, confidence, and lift, and expert knowledge. The association rules focused on temperature-related attributes and achieved confidence values between 0.65 and 0.99. Association rules suggest temperature and underlying processes as (contributing) causes of necrosis. Process experts provided knowledge that supports these suggestions and indicates their plausibility. Keywords association rule mining, pig farming, cause-effect-relationships, process-related data analysis 1. Introduction Association rule mining is a popular technique for creating relationships between attributes in data sets. Therefore, association rule analysis can analyze process-related from different sources to obtain insights into relevant process phenomena [1, 2]. In pig farming, tail necrosis (dead tissue) is such an phenomena in the rearing period [3]. Pig rearing is often a seven-week process for increasing pigs’ weight (e.g., from 5 to 25 kg), that involves multiple sub-processes, such as providing food or controlling the ventilation and heating system in a pen (pigs’ environment). These sub-processes affect pigs and their environment (e.g., air temperature) and can trigger conditions for necrosis development (e.g., tail biting due to stress) [4, 5, 6]. While sensors and corresponding data processing are present PMAI@IJCAI22: International IJCAI Workshop on Process Management in the AI era, July 23, 2022, Vienna, Austria $ tobias.zimpel@uni-hohenheim.de (T. Zimpel); andrea.wild@lsz.bwl.de (A. Wild); hansjoerg.schrade@lsz.bwl.de (H. Schrade); stefan.kirn@uni-hohenheim.de (S. Kirn) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) (e.g., to detect pigs [7] or predict losses [8]), the development of necrosis and its causes are usually unobserved or unknown (e.g., due to fewer specific sensors concerning the process output or the presence of non-speaking stakeholders). We assume that the causes of necrosis lie in the process. Therefore, association rule mining may suggest cause-effect relationships. Previous work propose suggestions for causes should be based on association rules for a train- ing set and a test set (as commonly used for supervised machine learning tasks), differentially preprocessed process data, and the incorporation of process-related knowledge [2, 9, 10, 11, 12]. However, association rules describe a correlation and not a causality between attributes. Thus, we refer to the concept of interestingness to assess possible suggestions. Interestingness depends on the application and is objective or subjective [13, 14]. Objective interestingness is calculated based on the underlying data (e.g., confidence), while subjective interestingness is given when the association rule is unexpected or actionable for the user [15, 14]. However, metrics and the properties unexpected and actionable do not include process knowledge. We propose the plausibility property to assess suggestions. We call a suggestion plausible if it is justified by process knowledge and based on underlying objective interesting association rules. Justification by process knowledge can be integrated into the analysis of the created association rules [16]. Objective interestingness is measured in this work using the metrics of support, confidence and lift. While lift is calculated on support, support and confidence are already necessary metrics for algorithm configuration with respect to minimum support and minimum confidence [17, 18]. Minimum support and minimum confidence influence the creation of association rules in terms of inclusion or exclusion of attributes and rules, while the algorithm decides how association rules are created [17, 18]. Consequently, association rules and their analysis depend mainly on the data preprocessing, e.g., the discretization of the continuous values. Against this backdrop, we address the following research question: How to design a procedure to create plausible association rules for suggestions of cause-effect-relationships in pig rearing? The main contributions of this paper are as follows: • We investigate association rule mining in terms plausible suggestions of cause-effect relationships using the example of pig farming. • We propose a procedure for creating association rules based on data from three aggregation intervals (hourly, daily, weekly) to serve as the basis for proposing causes. • We evaluate generated association rules by using an independent test set and justifications based on the process knowledge. Our paper is structured as follows: In section 2, we provide background literature. In section 3, we describe the data set and procedure. In section 4, we analyze generated association rules by integrating process-knowledge, and in section 5, we provide a brief conclusion. 2. Background 2.1. Association rule mining Association rule mining describes an approach to describing relationships in data by processing binary coded data to create association rules [19]. Given is a set 𝑆 = {𝑠0 , . . . , 𝑠𝑛 } with 𝑛 ∈ N elements 𝑠 (e.g., 𝑠 is a pig). Each element 𝑠, in turn, is a set with up to 𝑚 ∈ N attributes of the set 𝐴. 𝐴 contains all possible attributes of 𝑠. Thereby, 𝑠 has an attribute 𝑎 ∈ 𝐴, if 𝑎 is true for 𝑠 (e.g., {𝑛𝑒𝑐𝑟𝑜𝑠𝑖𝑠} ∈ 𝑠 if a pig has necrosis). Given is a set 𝑋 ⊂ 𝐴, a set 𝑌 ⊂ 𝐴, and 𝑋 ∩ 𝑌 = ∅ we define an association rule as a relationship between the two sets as follows [19]: 𝑋⇒𝑌 (1) Relationship means, 𝑌 is probably true if 𝑋 is given [20]. To assess this relationship, we use the metrics of support (supp), confidence (conf ), and lift. Supp describes how frequent a subset of attributes is in 𝑆. Conf describes, how often an association rule is true. Lift describes if 𝑋 ∪ 𝑌 occur less or more often than expected. These metrics are calculated as follows [19, 21]: | {𝑠 ∈ 𝑆|𝑋 ⊆ 𝑠} | supp (𝑋) = (2) |𝑆| supp (𝑋) conf (𝑋 ⇒ 𝑌 ) = (3) supp (𝑋 ∪ 𝑌 ) supp (𝑋 ∪ 𝑌 ) lift (𝑋 ⇒ 𝑌 ) = (4) supp (𝑋) · supp (𝑌 ) Lift (𝑋 ⇒ 𝑌 ) = 1 means 𝑋 and 𝑌 are independent, while lift (𝑋 ⇒ 𝑌 ) > 1 indicates a positive correlation and lift (𝑋 ⇒ 𝑌 ) < 1 indicates a negative correlation. The process of association rule generation consists of two steps. First: generation of a candidate set 𝐶𝑆 consisting of several sets 𝑋 and or 𝑌 with minimal support (supp𝑚𝑖𝑛 ). Second, deriving association rules based on 𝐶𝑆 with minimal confidence (conf𝑚𝑖𝑛 ). Supp𝑚𝑖𝑛 affects |𝐶𝑆| and thus the theoretically number of association rules as well. The selection of supp𝑚𝑖𝑛 can be set manually for all attributes or for each individual attribute [22, 23]. In practice, however, there may be difficulties in determining supp𝑚𝑖𝑛 because of the tradeoff between the number of association rules that can be processed by humans and the loss of rules (and thus of suggestions for causes) [2]. 2.2. Association rule mining for process-related cause-effect relationships [2] propose association rule mining for analyzing problems in a drill manufacturing process. The report on the need to include process experts for the subsequent analysis of the association rules in order to improve manufacturing processes. [9] analyzed construction project accidents by comparing association rules for two types of projects. The association rules created were applicable to only one of the two project types with one exception (with varying confidence), indicating the specificity of the association rules with respect to the training data. [11] also analyzed accidents on construction sites. They report different confidence values for the same rules on different training sets, suggesting that training sets with different focus support the analysis of association rules. They also show that process-related knowledge can support the analysis by containing information that is not in the data set. [12] investigated faults of distribution terminals in power supply networks. They report on a reduced downtime by performing maintenance based on association rules. While this suggests that association rule mining could find suggestions for causes, it is unclear whether the association rules describe actual causes or whether the reduced downtime is a result of more intensive (e.g., earlier) maintenance. [10] studied diseases in two broiler farms. They report on specific association rules each applicable to only one of two farms. For more general association rules, discretizing the attributes differently (so that an association rule for one farm applies to another farm) and evaluating on independent data for each farm can increase the explanatory power. In summary, a method for plausible association rules consists of multiple preprocessing approaches (in terms of generalization), training and testing, and the inclusion of processd knowledge. 3. Materials and methods We used process data from pig rearing provided by the Boxberg Teaching and Research Centre in Germany. The Boxberg Teaching and Research Centre is the central educational, experimental, and testing facility of the state of Baden-Wurttemberg in the field of pig farming. This data set describes the rearing process in the winter season between 10/18/2018 and 4/10/2019 and 10/10/2019 and 12/01/2019. A total of 672 pigs were reared in seven groups of 96 pigs each. Our data set consists of eleven pig-related and eleven environmental factors (figure 1). From our point of view, these factors represent events or results of the underlying processes. Pig-related factors were weights, sex, daily gain between birth and start of rearing, mother, father, age at the start of rearing, and the presence of necrosis and bloody tails. Weights, sex, daily gain, age at start of rearing, and identification of mother and father were recorded manually. The presence of pig necrosis and bloody tails was recorded during a weekly assessment according to a uniform grading scale (see [24]). Environmental factors were temperatures, humidity, brightness, and water consumption, recorded at five-second intervals, and daily dispensed food and manipulable material. One part of the pen was covered by a false ceiling about 0.75 cm high, while the larger part had a ceiling height of 2.9 meters. Thus, air and ground temperatures are divided into temperatures under and outside the false ceiling. Our approach consists of analyzing association rules created for three aggregation intervals of environmental factors: hourly, daily, and weekly (figure 1) [25]. Aggregation intervals means that attributes are calculated on the basis of hourly, daily or weekly periods (e.g., mean humidity in an hour). The implementation uses python 3.8, pandas 1.1.5, scikit-learn 0.24.2, and mlextend 0.19. 3.1. Data preprocessing During the preprocessing, we performed the steps: (1) unification of labels and data formats, (2) refilling environmental data, (3) removal of outliers, (4) creation of samples, (5) one hot encoding of categorical features, and (6) creation of training and test sets. The source data of temperatures, water, humidity, and brightness consisted of entries when a change was measured. Missing values were filled in a five-second interval by forward fill. We detected outliers using lower and upper limits set by experts. Outliers (e.g., temperature = 3, 000∘ 𝐶) are errors and have been removed [26]. In step (4) we combined pig with environmental data for each aggregation interval, resulting in the same environmental attributes for each pig in a pen (figure 2). For the aggregation interval hourly, we calculated average values for temperatures, humidity, brightness and water for each hour in a week. We also calculated the difference between the flow and return flow temperatures, and the designated amount for food and manipulable materials on a daily basis. For the daily aggregation interval, we used the hourly data to calculate the Pig data Environmental data Age at the beginning of rearing (days) Air temperature (°C) Birth weight (kg) Air temperature (false ceiling) (°C) Bloody tail at the beginning of rearing (true/false) Brightness (lx) Data collection Bloody tail last assessment (true/false) Dispensed food (kg) Control weight on the 21st day after birth (kg) Dispensed manipulable material (kg) Daily increase between birth and start of rearing (g) Flow temperature (°C) Identifier mother and father Humidity (%) Necrosis at the beginning of rearing (true/false) Minimum ground temperature (°C) Weight at the beginning of rearing (kg) Minimum ground temperature (false ceiling) (°C) Sex (female, male castrated) Return flow temperature (°C) Necoris at the current assessment (true/false) Water consumption (l) Value calculation in the Value calculation in the Value calculation in the Data preprocessing aggregation interval hourly aggregation interval daily aggregation interval weekly Discretization of continuous values Discretization of continuous values Discretization of continuous values Feature selection, random undersampling Training Creation of association rules on the training set with Y={necrosis}, suppmin=0.3, confmin=0.4 Test Testing association rules on the test set Figure 1: Procedure for studying cause-effect relationship Aggregation interval weekly Week Min., max., max. diff between two following hours, or sum per week Aggregation interval daily Day 1 Day 2 Day 3 Day 4 Day 5 Day 6 Day 7 Min., max., max. diff between two Aggregation interval hourly ... following hours, or sum per day 00:00 01:00 23:00 00:00 Sum or mean value per hour Environmental data (interval: 5 s) ... :00:00 :00:05 :59:50 :59:55 Figure 2: Aggregation of environmental data minimum, maximum, and maximum difference (between two hours) for temperatures, brightness (without minimum), humidity, and the difference between flow and return flow temperatures for each day in a week. We also added the sum of water, food and manipulable material for each day. For the weekly aggregation interval, we calculated the minimum, maximum, and maximum difference between two hours for temperatures, brightness (without minimum), humidity, and the difference between flow and return flow temperatures in a week. We also added the weekly sum and maximum daily difference of water, food, and manipulable material. Then we assigned continuous values to the corresponding classes with the value 1 (true) if the continuous value is in the corresponding range, and 0 (false) otherwise (discretization into binary values, table A) [27]. Classes were determined in discussion with domain experts. In step (5), one-hot coding was performed for sex and mother and father identifiers. In step (6), we created our training and independent testing set based on the chronological start of rearing of the groups used (see figure 3). Due to discretization, our training set consists of more attributes than samples, which can lead to overfitting and potentially exponential growth of possible association rules [17, 28]. Therefore, we used standard recursive feature elimination 10/18/2018 12/01/2019 Group #1 Group #2 Group #3 Group #4 Group #5 Group #6 Group #7 Training set Test set Figure 3: Splitting the data into a training set and a test set Table 1 Overview of the training set and the test set Sample size Quota necrosis Quota no necrosis Training set 380 0.50 0.50 Test set 116 0.65 0.35 with random forest as estimator to select 150 features, reducing 50 per round. We also performed random undersampling to balance necrosis in the training set to specify a higher supp𝑚𝑖𝑛 . A brief description of the training and test set is provided in table 1. 3.2. Training We used the FP-growth algorithm to create association rules because FP-growth does not re- quire candidate generation [29]. The FP-growth algorithm was configured with supp𝑚𝑖𝑛 = 0.3 and conf𝑚𝑖𝑛 = 0.4. The original intention was to allow rare attributes in association rules (supp𝑚𝑖𝑛 = 0.1) since rare traits can also be attributes. However, supp𝑚𝑖𝑛 = 0.1 was not com- putable in the main memory, so we increased supp𝑚𝑖𝑛 by supp𝑚𝑖𝑛 = 0.1 until it was computable, resulting in supp𝑚𝑖𝑛 = 0.4. We chose conf𝑚𝑖𝑛 = 0.4 because we expected a lower value than 0.5 due to unclear multifactorial causes and still wanted to maintain sufficient confidence for practice. We removed all associations rules with 𝑌 ̸= {𝑛𝑒𝑐𝑟𝑜𝑠𝑖𝑠}. We also removed association rules with food-related attributes in 𝑋 after discussions suggesting that altered eating behavior may be a consequence of necrosis and thus not a cause. 3.3. Testing For testing, the created association rules were re-identified on the test set using the FP-Growth algorithm and supp𝑚𝑖𝑛 = 0.1 and conf𝑚𝑖𝑛 = 0.0000001. We re-identified these associations rules to discard rare associations rules in terms of support so that an association rule that only applies to one or two pigs is ignored. Association rules with the highest confidence on the training set tested on the test set are shown in tables 2 (aggregation level hourly), 3 (level daily), and 4 (level weekly). 𝑡 − 𝑖 corresponds to the number of days before an assessment. Association rules in the hourly aggregation interval on the training set (table 2) focused on the flow temperature and air temperature based on the confidence level and frequency when the time period is not considered. Each association rule consisted of only one combination of the attributes air temperature 18∘ 𝐶 ≤ 𝑥 < 21∘ 𝐶, flow temperature 20∘ 𝐶 ≤ 𝑥 < 25∘ 𝐶, or diff. between flow and return flow temperature 0∘ 𝐶 ≤ 𝑥 < 3∘ 𝐶. The highest confidence and lift on the test set is achieved by rule no. 1, while rule no. 3., which consists of one more temperature attribute in 𝑋, achieves the highest confidence on the training set. According to rule no. 3, Table 2 Association rules (aggregation interval hourly, 𝑌 = {𝑛𝑒𝑐𝑟𝑜𝑠𝑖𝑠}) on the test set and (training set). 𝑋 𝑋⇒𝑌 # Attribute Period Function Range (in ∘ 𝐶) supp conf lift 1 Air temperature 𝑡-4 [9ℎ, 10ℎ[ Mean 18 ≤ 𝑥 < 21 0.34 0.98 1.51 Diff. flow and 𝑡-3 [9ℎ, 10ℎ[ Mean 0≤𝑥<3 (0.38) (0.79) (1.57) return flow tem- perature 2 Air temperature 𝑡-4 [9ℎ, 10ℎ[ Mean 18 ≤ 𝑥 < 21 0.34 0.65 1.00 (0.39) (0.70) (1.40) 3 Diff. flow and 𝑡-3 [9ℎ, 10ℎ[ Mean 0≤𝑥<3 0.65 0.98 1.51 return flow tem- (0.48) (0.68) (1.36) perature the temperature between flow and return flow temperature was < 3∘ 𝐶, indicating the heating system emitted no or less heat. As a first conclusion, temperature (especially air temperature and flow temperature) and the corresponding heating system are candidates for suggestions on the causes of necrosis. These association rules are assigned to a specific hourly period. To support the first conclusion, we compare this preliminary conclusion with association rules of the daily and weekly aggregation interval. Association rules at the aggregation interval daily on the training set focused on tempera- ture attributes (see table 3). In particular, maximum air temperature 18∘ 𝐶 ≤ 𝑥 < 21∘ 𝐶, and maximum air temperature under the false ceiling 24∘ 𝐶 ≤ 𝑥 < 27∘ 𝐶, and return flow tem- perature 25∘ 𝐶 ≤ 𝑥 < 30∘ 𝐶 were frequent attributes. Rules no. 4 to no. 6 with the highest confidence level in the training set were not applicable to the test set. The attributes maximum air temperature 18∘ 𝐶 ≤ 𝑥 < 21∘ 𝐶, minimum and maximum air temperature under the false ceiling 24∘ 𝐶 ≤ 𝑥 < 27∘ 𝐶 were also included in other rules during training in other periods, but reached a lower conf. No other process environment attributes such as brightness were present in the association rules of the training set. Although the rules in table 3 were not applicable to the test set and the confidence of the association rules in the test set was lower than in the training set, we conclude that the association rules on the aggregation interval daily support the preliminary conclusion that temperature is a possible cause. At the aggregation interval weekly, the association rules (table 4) on the training set focused on the air temperature under the false ceiling, expressed by two association rules describing the minimum 21∘ 𝐶 ≤ 𝑥 < 24∘ 𝐶) and maximum 24∘ 𝐶 ≤ 𝑥 < 27∘ 𝐶 air temperature under the false ceiling. However, the rule no was not applicable on the test set. The air temperature itself occurs only in one association rule, which refers to the maximum difference between two consecutive hours (1∘ 𝐶 ≤ 𝑥 < 2∘ 𝐶), indicating that the air temperature was almost constant during one week. While another association rule consisted of the maximum difference in manipulable material output between two days, there are no other association rules that do not consist of temperatures. Also based on the association rules for this aggregation interval, we find that temperature is a possible suggestion for the cause for necrosis. Table 3 Association rules (aggregation interval daily, 𝑌 = {𝑛𝑒𝑐𝑟𝑜𝑠𝑖𝑠}) on the test set and (training set). 𝑋 𝑋⇒𝑌 # Attribute Period Function Range (in ∘ 𝐶) supp conf lift 4 Return flow 𝑡-3 Max(Mean) 25 ≤ 𝑥 < 30 temperature <0.1 n.a. n.a. Air temperature 𝑡-6 Max(Mean) 24 ≤ 𝑥 < 27 (0.32) (0.79) (1.59) (false ceiling) 5 Air temperature 𝑡-3 Max(Mean) 24 ≤ 𝑥 < 27 (false ceiling) <0.1 n.a. n.a. Air temperature 𝑡-6 Max(Mean) 24 ≤ 𝑥 < 27 (0.32) (0.79) (1.59) (false ceiling) Air temperature 𝑡-7 Max(Mean) 18 ≤ 𝑥 < 21 6 Air temperature 𝑡-0 Min(Mean) 24 ≤ 𝑥 < 27 (false ceiling) <0.1 n.a. n.a. Air temperature 𝑡-6 Max(Mean) 24 ≤ 𝑥 < 27 (0.30) (0.78) (1.55) (false ceiling) Air temperature 𝑡-7 Max(Mean) 18 ≤ 𝑥 < 21 Table 4 Association rules (aggregation interval weekly, 𝑌 = {𝑛𝑒𝑐𝑟𝑜𝑠𝑖𝑠}) on the test set and (training set). 𝑋 𝑋⇒𝑌 # Attribute Function Range (in ∘ 𝐶) supp conf lift 7 Air temperature Min(Mean) 21 ≤ 𝑥 < 24 0.59 0.99 1.52 (false ceiling) (0.32) (0.67) (1.33) 8 Air temperature Max(Mean) 24 ≤ 𝑥 < 27 <0.1 n.a. n.a. (false ceiling) (0.32) (0.64) (1.27) 9 Air temperature Max(Diff. bet- 1≤𝑥<2 0.34 0.98 1.52 ween two hours) (0.36) (0.61) (1.23) 4. Results and discussion Analysis of association rules per aggregation interval in both the training and test propose temperatures as a cause of necrosis in the pig’s environment. According to process experts, an air temperature of 18∘ 𝐶 < 21∘ 𝐶 indicates a low air temperature. During the study period, tests were conducted in the pens with different heating zones. Air temperatures in the areas under the false ceiling were warmer than in the areas without the false ceiling. Therefore, process-related information is available to support temperatures as a suggestion for the causes of necrosis. Process experts said in discussions that the temperatures in the association rules are quite plausible. Plausibility refers to subjective interestingness by including of process- related knowledge [16, 14]. Thus, we conclude that association rule mining is able to create plausible association rules for use cases where there are few or no possibilities to identify causes of undesirable process outputs. Lack of specific sensors (e.g., necrosis-specific sensors), non-speaking process stakeholders (e.g., pigs), or an environment that limits reproducibility (e.g., different numbers of pigs affected by necrosis living in a similar environment) are examples of such reasons that make the root cause analysis difficult. Other possible applications include diseases, quality variations and yield fluctuations in agriculture or animal husbandry. To use our procedure in these use cases, the application-specific class adaptation in table A is necessary. Temperature is associated with underlying processes, like heating or ventilation control. Thus, process-related causes could lie in the underlying processes. Although association rules establish a relationship between temperatures (or underlying processes) and necrosis, they describe no causality. The suggestion that temperature is a cause needs to be investigated in further experiments, e.g., similar to [12]. As in [9, 10, 11], created association rules are partially specific to the training set, resulting in inapplicable association rules on the test set. This can be a result of our discretization scheme and different properties of the test set. Future work relate to the creation of discretization schemes and the transfer of generalization concepts from attributes in hierarchical structures to non-hierarchical structures, as is the case with the environmental attributes [25]. Generalization improves transferability to other use cases. Similar to [9, 11], we report that process-related knowledge improves the downstream analysis of association rules with respect to justification. This work is subject to the following limitations. First, pig farming consists of sub-processes that are carried out in an orderly structure. Association rules use sets of attributes and therefore do not consider the order of characteristics of the underlying processes. To maintain the ordered structure, sequential rule discovery is one approach [30]. Comparing results using association rule mining or sequential rule mining to study cause-effect relationships is future work. Second, class lower and upper bounds affect the support of concerning attributes, may exclude attributes due to low support. In addition, recursive feature elimination excludes attributes once again. Feature elimination was necessary to ensure computability due to available main memory. Third, we did not remove redundant association rules in the downstream analysis. However, this would not change the association rules presented or the suggestions, but would reduce the number of rules. Fourth, we randomly balanced the samples with and without necrosis in the training set to increase supp𝑚𝑖𝑛 . This directly affects our metrics in the training set. Fifth, we re-identified the association rules in the test set to treat rare association rules as inapplicable association rules because these rules would affect too few pigs to represent a cause-effect relationship. Thus, it is not possible to distinguish between association rules with supp = 0 or 0 < supp < 0.1 . 5. Conclusion This research explored association rule mining for plausible suggestions of process-related cause-effect relationships in pig farming. We used real data over a ten-month period and created association rules based on different intervals of data aggregation. Association rules pointing to temperatures and thus the underlying process regarding heat or air control as a suggestion for causes of necrosis. Moreover, there is process-related knowledge (e.g., different heating zones) that supports this suggestion, so we assume that association rule mining is able to provide plausible (justified by process knowledge) suggestions. Acknowledgments We thank the LABEL-FIT project team, especially Eva Gallmann, William Gordillo, Barbara Keßler, and Svenja Opderbeck for data collection and data provision. This work was supported by the project “Landwirtschaft 4.0: Info-System (Phase 2)”, funded by the Ministry for Food, Rural Areas and Consumer Protection of Baden-Wurttemberg, Germany. This work was also supported by the project LABEL-FIT by funds of the Federal Ministry of Food and Agriculture (BMEL) based on a decision of the Parliament of the Federal Republic of Germany via the Federal Office for Agriculture and Food (BLE) under the innovation support programme (2819200415). References [1] S. Schönig, A. Rogge-Solti, C. Cabanillas, S. Jablonski, J. Mendling, Efficient and cus- tomisable declarative process mining with sql, in: S. Nurcan, P. Soffer, M. Bajec, J. Eder (Eds.), Advanced Information Systems Engineering, volume 9694 of Lecture Notes in Computer Science, Springer International Publishing, Cham, 2016, pp. 290–305. doi:10.1007/978-3-319-39696-5_18. [2] S. Nurcan, P. Soffer, M. Bajec, J. Eder (Eds.), Advanced information systems engineering, Lecture Notes in Computer Science, Springer International Publishing, Cham, 2016. doi:10. 1007/978-3-319-39696-5. [3] G. Reiner, J. Kühling, M. Lechner, H. Schrade, J. Saltzmann, C. Muelling, S. Dänicke, F. Loewenstein, Swine inflammation and necrosis syndrome is influenced by husbandry and quality of sow in suckling piglets, weaners and fattening pigs, Porcine health management 6 (2020) 32. doi:10.1186/s40813-020-00170-2. [4] K. I. Fiesjå, I. Solberg, Pathological lesions in swine at slaughter: Iv. pathological lesions in relation to rearing system and herd size, Acta Veterinaria Scandinavica 22 (1981) 272–282. doi:10.1186/bf03547516. [5] F. Kjell I., I. B. Forus, I. Solberg, Pathological lesions in swine at slaughter: V. patho- logical lesions in relation to some environmental factors in the herds, Acta Veterinaria Scandinavica 23 (1982) 169–183. [6] K. I. Fiesjå, H. O. Ulvesæter, Pathological lesions in swine at slaughter: I. baconers, Acta Veterinaria Scandinavica 20 (1979). [7] M. Riekert, A. Klein, F. Adrion, C. Hoffmann, E. Gallmann, Automatically detecting pig position and posture by 2d camera imaging and deep learning, Computers and Electronics in Agriculture 174 (2020) 105391. doi:10.1016/j.compag.2020.105391. [8] T. Zimpel, M. Riekert, A. Klein, C. Hoffmann, Machine learning for predicting animal welfare risks in pig farming, Agricultural Engineering 76 (2021) 24–35. doi:10.15150/lt. 2021.3261. [9] C.-W. Cheng, C.-C. Lin, S.-S. Leu, Use of association rules to explore cause–effect rela- tionships in occupational accidents in the taiwan construction industry, Safety Science 48 (2010) 436–444. doi:10.1016/j.ssci.2009.12.005. [10] S. Maneewongvatana, S. Maneewongvatana, T. Lojitamnuay, M. Juthasong, Using asso- ciation rules to identify root causes of crd in broilers, in: The 2013 10th International Joint Conference on Computer Science and Software Engineering (JCSSE), IEEE, 2013, pp. 206–210. doi:10.1109/JCSSE.2013.6567346. [11] B. U. Ayhan, N. B. Doğan, O. B. Tokdemir, An association rule mining model for the assessment of the correlations between the attributes of severe accidents, Journal of Civil Engineering and Management 26 (2020) 315–330. doi:10.3846/jcem.2020.12316. [12] X. Zhang, Y. Tang, Q. Liu, G. Liu, X. Ning, J. Chen, A fault analysis method based on association rule mining for distribution terminal unit, Applied Sciences 11 (2021) 5221. doi:10.3390/app11115221. [13] R. Agrawal, R. Srikant, Fast algorithms for mining association rules in large databases, in: Proceedings of the 20th International Conference on Very Large Data Bases, VLDB ’94, Morgan Kaufmann Publishers Inc, San Francisco, CA, USA, 1994, pp. 487–499. [14] A. Silberschatz, A. Tuzhilin, On subjective measures of interestingness in knowledge discovery, in: Proceedings of the First International Conference on Knowledge Discovery and Data Mining, KDD’95, AAAI Press, 1995, pp. 275–281. [15] G. Dong, J. Li, Interestingness of discovered association rules in terms of neighborhood- based unexpectedness, in: X. Wu, R. Kotagiri, K. B. Korb (Eds.), Research and Development in Knowledge Discovery and Data Mining, volume 1394 of Lecture Notes in Computer Science, Springer Berlin Heidelberg, Berlin, Heidelberg, 1998, pp. 72–86. doi:10.1007/ 3-540-64383-4_7. [16] G. Piatetsky-Shapiro, C. J. Matheus, The interestingness of deviations, in: Proceedings of the 3rd International Conference on Knowledge Discovery and Data Mining, AAAIWS’94, AAAI Press, 1994, pp. 25–36. [17] Z. Zheng, R. Kohavi, L. Mason, Real world performance of association rule algorithms, in: F. Provost, R. Srikant, M. Schkolnick, D. Lee (Eds.), Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining - KDD ’01, ACM Press, New York, New York, USA, 2001, pp. 401–406. doi:10.1145/502512.502572. [18] J. Hipp, U. Güntzer, G. Nakhaeizadeh, Algorithms for association rule mining — a general survey and comparison, ACM SIGKDD Explorations Newsletter 2 (2000) 58–64. doi:10. 1145/360402.360421. [19] R. Agrawal, T. Imieliński, A. Swami, Mining association rules between sets of items in large databases, ACM SIGMOD Record 22 (1993) 207–216. doi:10.1145/170036.170072. [20] P. Hájek, I. Havel, M. Chytil, The guha method of automatic hypotheses determination, Computing 1 (1966) 293–308. doi:10.1007/bf02345483. [21] S. Brin, R. Motwani, J. D. Ullman, S. Tsur, Dynamic itemset counting and implication rules for market basket data, ACM SIGMOD Record 26 (1997) 255–264. doi:10.1145/253262. 253325. [22] B. Liu, W. Hsu, Y. Ma, Mining association rules with multiple minimum supports, in: U. Fayyad, S. Chaudhuri, D. Madigan (Eds.), Proceedings of the fifth ACM SIGKDD inter- national conference on Knowledge discovery and data mining - KDD ’99, ACM Press, New York, New York, USA, 1999, pp. 337–341. doi:10.1145/312129.312274. [23] F. Z. El Mazouri, M. C. Abounaima, K. Zenkouar, Data mining combined to the multicriteria decision analysis for the improvement of road safety: case of france, Journal of Big Data 6 (2019). doi:10.1186/s40537-018-0165-0. [24] K. Bönisch, A. vom Brocke, S. Dippel, A. Grümpel, L. Hagemann, C. Jais, D. Lösel, A. Müller, S. Müller, A. Naya, H. Schrade, C. Späth, C. Velt, A. Wild, M. Lechner, Deutscher schweine-boniturschlüssel (dsbs), 2017. URL: https://www.fli.de/fileadmin/FLI/ ITT/Deutscher_Schweine_Boniturschluessel_2017-06-30_de.pdf. [25] R. Srikant, R. Agrawal, Mining generalized association rules, Future Generation Computer Systems 13 (1997) 161–180. doi:10.1016/S0167-739X(97)00019-8. [26] A. Famili, W. M. Shen, R. Weber, E. Simoudis, Data preprocessing and intelligent data analysis, Intelligent data analysis 1 (1997) 3–23. [27] R. Agrawal, T. Imielinski, A. Swami, Database mining: a performance perspective, IEEE Transactions on Knowledge and Data Engineering 5 (1993) 914–925. doi:10.1109/69. 250074. [28] A. Vabalas, E. Gowen, E. Poliakoff, A. J. Casson, Machine learning algorithm validation with a limited sample size, PloS one 14 (2019) e0224365. doi:10.1371/journal.pone. 0224365. [29] J. Han, J. Pei, Y. Yin, Mining frequent patterns without candidate generation, ACM SIGMOD Record 29 (2000) 1–12. doi:10.1145/335191.335372. [30] P. Fournier-Viger, T. Gueniche, S. Zida, V. S. Tseng, Erminer: sequential rule mining using equivalence classes, in: International Symposium on Intelligent Data Analysis, 2014, pp. 108–119. A. Discretization of continuous values Data Function Unit Ranges for a value 𝑥 Age (start of rearing) – kg 𝑥 = 23; 𝑥 = 24; ...; 32 ≤ 𝑥 Birth weight – kg 𝑥 < 0.5; 0.5 ≤ 𝑥 < 1.0; ...; 4 ≤ 𝑥 Brightness Min., Max., Mean, Diff. lx 𝑥 < 40; 40 ≤ 𝑥 < 80; ...; 520 ≤ 𝑥 Control weight – kg 𝑥 < 4; 4 ≤ 𝑥 < 6; ...; 10 ≤ 𝑥 Daily increase – g 𝑥 < 100; 100 ≤ 𝑥 < 125; ...; 275 ≤ 𝑥 Food Sum (daily) kg 𝑥 < 30; 30 ≤ 𝑥 < 40; ...; 60 ≤ 𝑥 Sum (weekly) kg 𝑥 < 180; 180 ≤ 𝑥 < 210; ...; 360 ≤ 𝑥 Diff. (weekly) kg 𝑥 < 30; 30 ≤ 𝑥 < 40; ...; 60 ≤ 𝑥 Humidity Min., Max., Mean, Diff. % 𝑥 < 30; 30 ≤ 𝑥 < 35; ...; 90 ≤ 𝑥 Manipulable material Sum (daily) kg 𝑥 < 0.75; 0.75 ≤ 𝑥 < 1.5; ...; 6 ≤ 𝑥 Sum (weekly) kg 𝑥 < 6; 6 ≤ 𝑥 < 7; ...; 30 ≤ 𝑥 Diff. (weekly) kg 𝑥 < 0.75; 0.75 ≤ 𝑥 < 1.5; ...; 6 ≤ 𝑥 ∘ Temperatures Min., Max., Mean 𝐶 𝑥 < 15; 15 ≤ 𝑥 < 18; ...; 39 ≤ 𝑥 ∘ (air, ground) Diff. 𝐶 𝑥 < 1; 1 ≤ 𝑥 < 2; ...; 5 ≤ 𝑥 ∘ Temperatures Min., Max., Mean 𝐶 𝑥 < 10; 10 ≤ 𝑥 < 15; ...; 100 ≤ 𝑥 ∘ (flow, return flow) Diff. 𝐶 𝑥 < −21; −21 ≤ 𝑥 < −18; ...; 21 ≤ 𝑥 Water Sum (hourly) L 𝑥 < 5; 5 ≤ 𝑥 < 10; ...; 25 ≤ 𝑥 Sum (daily) L 𝑥 < 10; 10 ≤ 𝑥 < 20; ...; 250 ≤ 𝑥 Sum (weekly) L 𝑥 < 50; 50 ≤ 𝑥 < 100; ...; 500 ≤ 𝑥 Diff. (weekly) L 𝑥 < 10; 10 ≤ 𝑥 < 20; ...; 100 ≤ 𝑥 Weight (start of rearing) – kg 𝑥 < 4; 4 ≤ 𝑥 < 6; ...; 10 ≤ 𝑥