<?xml version="1.0" encoding="UTF-8"?>
<TEI xml:space="preserve" xmlns="http://www.tei-c.org/ns/1.0" 
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" 
xsi:schemaLocation="http://www.tei-c.org/ns/1.0 https://raw.githubusercontent.com/kermitt2/grobid/master/grobid-home/schemas/xsd/Grobid.xsd"
 xmlns:xlink="http://www.w3.org/1999/xlink">
	<teiHeader xml:lang="en">
		<fileDesc>
			<titleStmt>
				<title level="a" type="main">Regression Modeling for Monitoring Organochlorine Pesticide Residues</title>
			</titleStmt>
			<publicationStmt>
				<publisher/>
				<availability status="unknown"><licence/></availability>
			</publicationStmt>
			<sourceDesc>
				<biblStruct>
					<analytic>
						<author>
							<persName><forename type="first">Serge</forename><surname>Olszewski</surname></persName>
							<email>olszewski.serge@gmail.com</email>
							<affiliation key="aff0">
								<orgName type="institution">Taras Shevchenko National University of Kyiv</orgName>
								<address>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Iryna</forename><surname>Lurie</surname></persName>
							<email>lurieira@gmail.com</email>
							<affiliation key="aff1">
								<orgName type="institution">Ben-Gurion University of Negev</orgName>
								<address>
									<settlement>Beer Sheva</settlement>
									<country key="IL">Israel</country>
								</address>
							</affiliation>
							<affiliation key="aff2">
								<orgName type="institution">Kherson National Technical University</orgName>
								<address>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Volodymyr</forename><surname>Lytvynenko</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Kherson National Technical University</orgName>
								<address>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Violetta</forename><surname>Demchenko</surname></persName>
							<affiliation key="aff3">
								<orgName type="department">Kundiiev Institute of Occupational Health</orgName>
								<orgName type="institution">National Academy of Medical Sciences of Ukraine</orgName>
								<address>
									<settlement>Kyiv</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Mariia</forename><surname>Voronenko</surname></persName>
							<email>mary_voronenko@i.ua</email>
							<affiliation key="aff2">
								<orgName type="institution">Kherson National Technical University</orgName>
								<address>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Natalia</forename><surname>Kornilovska</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Kherson National Technical University</orgName>
								<address>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<author>
							<persName><forename type="first">Oleg</forename><surname>Boskin</surname></persName>
							<affiliation key="aff2">
								<orgName type="institution">Kherson National Technical University</orgName>
								<address>
									<settlement>Kherson</settlement>
									<country key="UA">Ukraine</country>
								</address>
							</affiliation>
						</author>
						<title level="a" type="main">Regression Modeling for Monitoring Organochlorine Pesticide Residues</title>
					</analytic>
					<monogr>
						<imprint>
							<date/>
						</imprint>
					</monogr>
					<idno type="MD5">3C0025E7616158F7315B332141B4CB16</idno>
				</biblStruct>
			</sourceDesc>
		</fileDesc>
		<encodingDesc>
			<appInfo>
				<application version="0.7.2" ident="GROBID" when="2023-12-29T06:06+0000">
					<desc>GROBID - A machine learning software for extracting information from scholarly documents</desc>
					<ref target="https://github.com/kermitt2/grobid"/>
				</application>
			</appInfo>
		</encodingDesc>
		<profileDesc>
			<textClass>
				<keywords>
					<term>Mass Spectra, organochlorine pesticide residues, Fréchet Distance, Decomposition. Machine learning, eXtreme Gradient Boosting, Categorical Boosting, Light Gradient Boosting Machine 0000-0003-4499-8485 (S. Olszewski)</term>
					<term>0000-0001-8100-1846 (I. Lurie)</term>
					<term>0000-0002-1536-5542 (V. Lytvynenko)</term>
					<term>0000-0001-6239-0882 (V. Demchenko)</term>
					<term>0000−0002−5392−5125 (M. Voronenko)</term>
					<term>0000-0002-8331-8027 (N. Kornilovska)</term>
					<term>0000-0001-7391-0986 (O. Boskin)</term>
				</keywords>
			</textClass>
			<abstract>
<div xmlns="http://www.tei-c.org/ns/1.0"><p>The importance of investigating organochlorine pesticide residues (OCPs) in the environment is vital for understanding their local and global impacts on ecosystems and human health. The primary aim of this study was to identify and assess robust and trustworthy methodologies for creating predictive models based on limited statistical samples from monitoring data. For this purpose, we used experimental data illustrating the spatial and temporal fluctuations of various pesticides concentrations across French provinces. For regression tasks, we implemented regression algorithms like eXtreme Gradient Boosting (XGBoost), Categorical Boosting (CatBoost), and Light Gradient Boosting Machine (LightGBM). To evaluate the predictive performance of XGBoost, CatBoost, and LightGBM, we utilized the root means square error (RMSE), coefficient of determination (CD), and mean absolute error (MAE). The results showed that the XGBoost regression showed the best results with a score of 83% to 93% on the examined data. This study proposes regular and rigorous monitoring strategies that include investigations of OCPs and phthalates for the Loskop Dam and similar water systems worldwide.</p></div>
			</abstract>
		</profileDesc>
	</teiHeader>
	<text xml:lang="en">
		<body>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="1.">Introduction</head><p>Polychlorinated biphenyls (PCBs) and organochlorine pesticides (OCPs), synthetic organic pollutants, have been recognized as significant contaminants for a substantial duration. These substances, both PCBs and OCPs, exhibit hydrophobic and lipophilic properties, resulting in their persistence in the environment over extended periods. OCPs were primarily used as agricultural pesticides until their usage was curtailed due to severe adverse effects. There have been reports of ecological risks as well, including biomagnification -a phenomenon where these pollutants accumulate and magnify within the food chain in marine ecosystems. OCPs, due to their long-term persistence in the environment and harmful effects on human health and the environment, are considered suitable only for restricted use. Nevertheless, in developing nations, these substances are extensively utilized in agriculture for the control of pests.</p><p>Even in countries with a sufficiently high level of development of productive forces focused on the agricultural sector, it is impossible to abandon the use of toxic agricultural products.</p><p>In Ukraine alone, the range of pesticides is about 268 names, and their tonnage reaches 36 thousand tons, while the need is 40 thousand. A number of assortment of pesticides by the criteria of toxicity, persistence in the environment, migration, bioconcentration, and actual contamination of objects refers to the 1-2 class of hazard.The leading component of methods to control and limit the harmful effects of persistent organic pollutants (OCPs) on the environment is comprehensive monitoring of the distribution of their residual concentration in space and the evolution of this distribution over time. However, the experimental results of such monitoring accumulated to date are fragmentary, non-systematic, and highly discrete. Moreover, the sample sizes are statistically small, and the measurement results contain a tangible stochastic component.</p><p>This nature of the accumulated data does not allow us to obtain reliable estimates of the pollution level in the intervals between the control points of OCPs concentration. Moreover, it is difficult to judge the mechanisms of OCPs migration based on monitoring results since these mechanisms are determined by gradients and the rate of concentration change rather than by its absolute value. And obtaining these characteristics directly from experimental data with small sample size and high statistical error significantly increases the dispersion of the results and reduces their reliability.</p><p>Thus, the need to use adequate and reliable predictive models built on statistically small samples of monitoring data is evident and urgent.</p><p>To address this problem, this study compares the performance of three machine learning algorithms for regression problems, called CATBoost, LightGBM regression, and XGBoost for the spatial and temporal distribution of organochlorine pesticides. For this reason, one of the main goals of this article was to compare and evaluate different regression models based on model evaluation metrics. Hence, the primary contributions of this study are as follows:</p><p>•</p><p>We have extensively reviewed papers on airborne pesticide migration problems and the use of machine-learning methods to solve pesticide monitoring problems.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>We evaluated three regression-based machine-learning methods for estimating pesticide distribution. The rest of this paper is organized as follows. In the second section, we present the related work. The third section details the data employed, methodologies including CATBoost, LightGBM regression, and XGBoost, as well as the metrics used to gauge the quality of the derived models. In the fourth section, we provide the outcomes of the methodologies elucidated in the previous section. Finally, the fifth section concludes the paper.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="2.">Review of Literature</head><p>The fastest and least controlled mechanism of the formation of spatial and temporal distributions of OCPS is their transport by air masses. In order to track the formation and development of the distribution of organochlorine pesticide residues (OCPs) through air migration, the authors in reference <ref type="bibr" target="#b0">[1]</ref> scrutinized the outcomes of passive air sampling in diverse areas like urban, suburban, coastal, and agricultural from April 2009 to January 2010 in Tamil Nadu, southern India. Compounds like dichlorodiphenyltrichloroethane, dichlorodiphenyldichloroethylene, heptachlor, and murex were primarily detected during the monsoon season. The presence of prohibited pesticides such as aldrin, dieldrin, and heptachlor in the air signals their unlawful usage. Moreover, murex, a pesticide not registered in India, was identified in the air for the first time. This gathered information can provide significant insights for the future handling of atmospheric OCPs, but without the development of nonlinear regression models, the management procedure appears to be highly challenging <ref type="bibr" target="#b0">[1]</ref>.</p><p>The investigation of persistent organic pollutants (OCPs) in tropical and subtropical urban areas with low latitudes is crucial for comprehending their local and global influences on ecosystems and human health. Despite having studies on OCPs levels in water, soil, and sediments, the analysis of distribution trends, seasonality, and sources of OCPs in urban regions of Nepal is still limited.</p><p>The conclusions drawn from the rather labor-intensive experimental studies are purely qualitative in nature. For example, the movement distances of OCPs suggest that high precipitation levels in tropical climates are insufficient to remove OCPs and that Nepal may be an important source region for OCPs <ref type="bibr" target="#b1">[2]</ref>. At the same time, building a nonlinear regression model based on these monitoring data would allow a quantitative assessment of the feasibility of additional measures of artificial flushing of the region and possible remediation of pollution effects. Concentrations of banned organochlorine pesticides and a number of currently used pesticides in samples from the first four years, roughly overlapping <ref type="bibr">2005, 2006, 2007, and 2008</ref>, show distinct spatial and temporal patterns. Although the wide variety of sampling site types helps characterize the entire global variability of pesticide concentrations, it also greatly increases the number of sites required for reliable regional differentiation <ref type="bibr" target="#b2">[3]</ref>. However, in this case, too, the data are provided in raw form, without any attempt to investigate the possibility of constructing reliable approximations on their basis.</p><p>In <ref type="bibr" target="#b3">[4]</ref>, to improve monitoring efficiency, the authors studied the possibility of using butter as a sampling matrix to reflect the regional and global distribution of PCBs and individual organochlorine pesticides/metabolites in the air. This is because persistent organic pollutants (OCPs) are concentrated in milk fat. Dairy fat concentrations are regulated by feed intake, which in turn is controlled mainly by atmospheric deposition. Therefore, butter is sensitive to local, regional and global spatial and temporal atmospheric trends of many OCPs and can thus serve as a helpful sampling medium for monitoring purposes. However, to improve quantitative information derived from air concentrations, it is necessary to understand the mechanisms of the influence of climatic factors on the processes of transfer of OCPs from air to milk <ref type="bibr" target="#b3">[4]</ref>, which is also problematic without the construction of nonlinear regression models.</p><p>The most potent factor of OCPs migration is water resources. In this regard, the paper's authors investigated the level of OCP contamination in the urbanized river network of Shanghai with high river densities. The task of assessing the environmental and health risk of OCPs in river networks is complicated by the pressure of high population density. The main objective of the research was to establish the relationship between OCP residues and determine their environmental and human health impacts. Without building reliable predictive models of the spatial and temporal distribution of OCPs, the solution to this problem is incomplete. However, methods for constructing such models on an array of experimental data were not included in <ref type="bibr" target="#b4">[5]</ref>.</p><p>The aim of the study in reference <ref type="bibr" target="#b5">[6]</ref> was to evaluate the degree of pesticide contamination on the coast of Karachi, Pakistan, with a focus on nine different OCPs considered to be highly toxic. Spatial analysis revealed that Creek Avenue and Channa Creek sites were the most severely affected areas in terms of pesticide pollution. As such, it is of utmost importance to strictly supervise and tightly regulate the reckless and illegal usage of OCPs to prevent seawater contamination, thereby ensuring the wellbeing of the marine ecosystem. However, the execution of such controls via automated systems involves the establishment of formalized data models, an aspect that hasn't received much focus yet. In another study <ref type="bibr" target="#b6">[7]</ref>, the residual concentrations of 11 organochlorine pesticides (OCPs) were determined at nine sampling points in the surface waters of the Juxi Valley during spring and autumn, aiming to evaluate their pollution levels and potential risks. It was apparent from the study's results that the current sampling guidelines do not necessarily guide the construction of nonlinear regression models of the explored spatial and temporal distributions of OCPs. This was indicated by the rather small size of the data sampling, although, in comparison to air sampling, surface water sampling is considerably less labour-intensive.</p><p>On a more promising note regarding the development of nonlinear regression models, a study <ref type="bibr" target="#b7">[8]</ref> investigated the status and shifts in Organochlorine Pesticide Pollution (OCPs) in Honghu Lake situated in the Jianggang Plain of central China. To comprehend and evaluate the risks posed by OCPs to the ecosystem of Lake Honghu, 30 surface water samples, 15 surface sediment samples, and a sediment core were gathered in January and July 2005 However, despite the goal of the work to obtain predictive estimates, the dimensionality of the surface water sample array is insufficient to build adequate mathematical models for different time slices of the spatial distributions of OCPs.</p><p>The study in reference <ref type="bibr" target="#b8">[9]</ref> showcases information on the levels of organochlorine pesticides found in precipitation samples gathered between 1997 and 2003 at seven sites within the Integrated Atmospheric Deposition Networks in the Great Lakes region. Notably, the 28-day volume-weighted average concentrations of several pesticides, such as hexachlorocyclohexane (HCH), endosulfan, hexachlorobenzene, and chlordane, displayed noteworthy seasonal variations. However, mathematical models for these trends were not proposed by the authors.</p><p>Organochlorine pesticides (OCPs) and phthalates are among the most significant anthropogenic environmental pollutants because of their prevalence, persistence, and potential to cause adverse effects in organisms. The studies presented in <ref type="bibr" target="#b9">[10]</ref> aimed at monitoring pollution levels of OCPs and phthalates in South Africa, especially in the Oliphant's catchment area, are limited and limited to short-term monitoring. After reviewing the results of this study, the authors of this paper propose regular and rigorous monitoring strategies that include investigations of OCPs and phthalates for the Loskop Dam and similar water systems around the world. However, the proposed approach mainly involves intensifying sampling procedures and increasing monitoring time intervals. While regulation of data structure and sampling dimensionality oriented to constructing adequate predictive models based on these data is not foreseen.</p><p>The study outlined in reference <ref type="bibr" target="#b10">[11]</ref> investigates the concentrations and distribution of organochlorine pesticides (OCPs) across various tissues of freshwater fish species -silver carp (Hypophthalmichthys molitrix) and bighead carp (Aristichthys Nobilis) -collected from Poyang Lake, the largest freshwater lake in China. However, the authors primarily concentrated on studying the OCPs distributions within the tissues of the biological subjects themselves. Furthermore, the creation of spatial distributions of OCPs, associated with fish migration between different habitats, is represented by a very limited number of samples. Thus, an important mechanism of OCPs transfer together with biota was also not covered by the task of building nonlinear regression models.</p><p>As a result of the analysis of the approaches used in various areas of environmental monitoring for the spatial and temporal distributions of the residual concentration of OCPs, it can be clearly concluded that the monitoring tasks are focused on obtaining a series of static slices of the already existing state of the environment and are not adapted to constructing nonlinear regression models of the evolution of these distributions. Problem statement. This paper proposes a comparative analysis of modern machine learning regression methods for their effectiveness in constructing predictive models of the spatiotemporal evolution of OCPs in the environment.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.">Materials and Methods</head></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.1.">Data structure</head><p>Construction of a nonlinear regression model was carried out on experimental data describing the spatial and temporal distribution of concentrations of various pesticides in the provinces of France, presented in <ref type="bibr" target="#b11">[12]</ref>. Such substances as Chlorpyrifos, Folpet, Lindane, PBO, Pendimethalin, and Tebuconazole were considered. The experiment consisted in the construction of nonlinear regression models for six arrays [8×8×12] of elements. Each array described points of a four-dimensional hypersurface reflecting the spatial and temporal distribution of the concentration of the respective substance. The total number of concentration values was 768. The input for each pesticide type was an array of three independent variables and one dependent variable. The training sample consisted of 537 objects, and the test sample consisted of 231 objects. An example of a 3D directional grid for such a hypersurface is shown in Fig. <ref type="figure" target="#fig_0">1</ref>. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.">Ensemble algorithms</head><p>XGBoost, CatBoost, and LightGBM are well-known gradient-boosting algorithms commonly employed in machine learning tasks. As prominent ensemble gradient-boosting methods, they can prove efficient in addressing regression challenges. Here is a general description of each of these algorithms for regression tasks.</p><p>XGBoost (eXtreme Gradient Boosting): XGBoost offers powerful capabilities for regression, providing high prediction accuracy. It uses gradient boosting with an ensemble of decision trees. XGBoost has flexible parameters that allow for model optimization and control over tree complexity. It also supports regularization to prevent overfitting. XGBoost has built-in features for handling missing values and categorical features.</p><p>CatBoost (Categorical Boosting):</p><p>CatBoost is a gradient-boosting algorithm that effectively solves regression problems, considering the characteristics of categorical features. It automatically handles categorical variables without the need for prior encoding. CatBoost provides high prediction accuracy and offers capabilities for finetuning the model. It supports regularization and automatic parameter selection, making the model optimization process easier.</p><p>LightGBM (Light Gradient Boosting Machine): LightGBM is a fast and efficient gradient-boosting algorithm that demonstrates excellent performance in regression tasks. It uses optimized tree-building methods, including leaf-wise growth, which leads to faster training speed and lower memory usage. LightGBM has built-in support for categorical features and provides parameters for model optimization. It can handle large datasets and achieve high prediction accuracy in regression tasks.</p><p>In general, all three gradient boosting algorithms -XGBoost, CatBoost, and LightGBM -offer powerful capabilities for solving regression problems. Here are some common characteristics of these algorithms:</p><p>High prediction accuracy: XGBoost, CatBoost, and LightGBM exhibit high prediction accuracy in regression tasks. They can capture complex dependencies between input features and the target variable, resulting in accurate predictions.</p><p>Gradient boosting: All three algorithms are based on the gradient boosting method, which builds an ensemble of weak models (decision trees) and combines them into a strong model. This improves the predictive power and generalization ability of the model.</p><p>Regularization: XGBoost, CatBoost, and LightGBM offer regularization methods to mitigate overfitting. Regularization allows for controlling the complexity of the trees and prevents overfitting.</p><p>Handling categorical features: CatBoost and LightGBM have built-in support for categorical features, making it easier to work with such data types. They automatically handle categorical variables without the need for manual encoding.</p><p>High performance: XGBoost, CatBoost, and LightGBM are designed for efficiency and can handle large datasets. They are optimized for fast model training and provide parallelization options, which accelerate the learning process.</p><p>The choice between XGBoost, CatBoost, and LightGBM for solving regression problems may depend on the data characteristics, performance requirements, and the presence of categorical features. It is recommended to conduct comparative studies and parameter tuning for each algorithm on your specific dataset to determine which one demonstrates the best performance and accuracy in your particular case.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.1.">Algorithm eXtreme Gradient Boosting</head><p>XGBoost (eXtreme Gradient Boosting) is an ensemble algorithm for machine learning based on decision trees using gradient boosting techniques. Boosting based on decision trees is a relatively well-known and very effective machine-learning technique. XGBoost is widely used in data processing to achieve the most accurate results for various machine-learning purposes, especially regarding small to medium-sized structured or tabular data <ref type="bibr" target="#b12">[13]</ref>.</p><p>Boosting is an ensemble method where new models are progressively introduced to rectify errors committed by existing models. The models are incorporated sequentially until no further enhancement is achieved <ref type="bibr" target="#b13">[14]</ref>.</p><p>In many practical applications, Gradient Boosting aims to minimize the objective function. With each iteration, we assign the base learning object to the negative gradient of the loss function, multiply our prediction by a constant, and append it to the value from the preceding iteration. Essentially, fitting a base learner to a negative gradient at each iteration conducts a gradient descent on the loss function <ref type="bibr" target="#b12">[13]</ref>. These negative gradients are often called pseudo-residuals as they indirectly aid in minimizing the objective function. XGBoost works by sequentially training a set of weak models called base learners. Each base learner aims to reduce the error of the previous model using gradient descent. In this case, learners are added to the ensemble with weights corresponding to their effectiveness.</p><formula xml:id="formula_0">1, 2, 3 { ... } m F f f f f = (1) 1 ˆ( ) m l t i t Y f x = = ∑<label>(2)</label></formula><p>First, consider (1) the initial set of learners as the base set, and then (2) this will serve as the ultimate prediction. Following that, it is necessary to choose a cost-reducing function. (3)</p><formula xml:id="formula_1">1 1 ( , ( )) ( ) n t t i i t i t i L y y f x f &lt; &gt; &lt; − &gt; = = + + Ω ∑ <label>(4)</label></formula><p>At each iteration, we get ( )</p><formula xml:id="formula_2">t i</formula><p>f x by fitting the base trainer to a negative gradient of the loss function with respect to the previous iteration <ref type="bibr" target="#b12">[13]</ref>. In this algorithm, we examine several base learners or functions and choose the one that minimizes the loss. This approach has several disadvantages:</p><p>1. Learning various basic learning functions 2. Calculation of the value of the loss function of all these essential training functions. XGBoost uses Taylor's theorem to approximate the value of the loss function for the base learner ( )</p><formula xml:id="formula_3">t i</formula><p>f x to calculate the exact loss for the various possible base learners. Taylor's theorem:</p><formula xml:id="formula_4">1 ( ) ( ) ( ) ( ) ( ) 2 ! n n h f a h f a f a h f a h f a n ′ ′′ + = + + + +  (5) 1 ˆt i a y &lt; − &gt; = (6) ( ) t i h f x = (7) 1 ( ) ( , ) t i i f a y y &lt; − &gt; =  (8) 1 2 1 1 2 1 1 1 1 ˆ( , ) ( , ) ˆ( , ) ( ) ( ) ( , ) ˆt t n t t t i i i i i i t i t i i i t t i i i y y y y L y y f x f x y y y y δ δ δ δ &lt; − &gt; &lt; − &gt; &lt; &gt; &lt; − &gt; &lt; − &gt; &lt; − &gt; &lt; − &gt; =     = + + +         ∑     (9) 1 ( ( ) ( )) ( ) n t i t i i t i t i L C g f x h f x f &lt; &gt; = = + + + Ω ∑ (10) 1 ( ( ) ( )) ( ) n t i t i i t i t i L g f x h f x f &lt; &gt; = = + + Ω ∑ (<label>11</label></formula><formula xml:id="formula_5">)</formula><p>XGBoost uses Taylor's second-order derivative theorem, assuming that the approximation at this stage will be sufficient.</p><p>C is constant regardless of any chosen .</p><p>-is the first-order derivative of the loss of the previous iteration with respect to the predictions of the previous iteration.</p><p>-is the second-order derivative of the previous iteration's loss with respect to the previous iteration's predictions <ref type="bibr" target="#b12">[13]</ref>.</p><p>So the algorithm can calculate and before it starts learning the different base learners since it will just be a matter of multiplication. </p><formula xml:id="formula_6">j j K t i i i j j i I i I L h h K ω λ ω γ &lt; &gt; = ∈ ∈       = + + +                   ∑ ∑ ∑ (<label>13</label></formula><formula xml:id="formula_7">)</formula><p>Let has nodes in the decision tree. Then is the set of instances of node . is the prediction for node . For each node * 0 t j dL dω</p><formula xml:id="formula_8">&lt; &gt; = (14) * 0 j j i j i i I i I h h ω λ ∈ ∈   = + +       ∑ ∑ (15) * j j i I i j i I i g h ω λ ∈ ∈ − = + ∑ ∑ (<label>16</label></formula><formula xml:id="formula_9">)</formula><p>Now let's substitute in ( )</p><formula xml:id="formula_10">t i f x</formula><p>and consider prediction:</p><formula xml:id="formula_11">2 1 ( ) 1 2 j j K i I i t j i I i g L K h γ λ ∈ &lt; &gt; = ∈ − = − + + ∑ ∑ ∑ (17)</formula><p>Using Taylor's theorem, it becomes feasible to compute the loss function for a node in a tree. However, when dealing with numerous nodes, manually exploring all potential tree structures becomes impractical. Instead, XGBoost constructs an entire tree by selectively determining splits that yield maximum loss reduction. By applying specific partition criteria, the nodes are conditionally divided into the left (v) and right (R) branches <ref type="bibr" target="#b17">(18)</ref>.</p><p>Consequently, instances are allocated to the respective nodes based on the splitting outcome <ref type="bibr" target="#b12">[13]</ref>. At this stage, the loss reduction can be calculated, and the partition that offers the most significant loss reduction can be chosen. </p><formula xml:id="formula_12">2 ( ) ( ) ( ) 1 2 j j j l R I i I i i I i i I i i I i i I i i I i g g g L h h h γ λ λ λ ∈ ∈ ∈ ∈ ∈ ∈   = + − −   + + +     ∑ ∑ ∑ ∑ ∑ ∑ (18)</formula><p>The pseudocode depicted in Figure <ref type="figure" target="#fig_3">2</ref> illustrates the XGBoost algorithm for regression problems. In a practical implementation, additional optimizations and handling of specific cases can be added. </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.2.">CatBoost algorithm</head><p>The main idea of the CatBoost algorithm for processing non-categorical features in regression problems is gradient boosting and apply regularization to obtain more accurate and generalizing models. Although the CatBoost algorithm is designed specifically for with categorical features, it is also effective when using non-categorical features. Here are a few key features <ref type="bibr" target="#b14">[15]</ref>:</p><p>Missing Value Handling: CatBoost has built-in missing value handling, which allows data to be modelled with missing values and makes it suitable for dealing with non-categorical features where missing values can be common.</p><p>Regularization: CatBoost applies various regularization techniques, such as L2 regularization and random weight dropout, to prevent model overfitting and increase its generalization ability. This is especially important when working with non-categorical features, where there may be more noise or redundancy.</p><p>Optimization of deprivation function: CatBoost uses optimized algorithms to find the optimal parameter settings for models based on gradient descent. This allows you to adapt the model to noncategorical manifestations and improve the accuracy of prediction.</p><p>Automatic selection of optimal hyperparameters: CatBoost offers an automatic selection of optimal hyperparameter values using the GridSearch algorithm and other fitting methods. This helps to find the best model settings using non-categorical features.</p><p>In general, the main idea of the CatBoost algorithm in processing non-categorical features in a regression problem is to apply regularization, Optimization of deprivation function, and automatic selection of optimal hyperparameters to create optimal accurate and generalizing models.</p><p>Advantages of Cat Boost:</p><p>• Reliability -simplified setup of hyperparameters (number of trees, learning rate, regularity, tree depth, etc.), which allows you to create more generalized models.</p><p>• Automatic Feature Processing -CatBoost converts categorical features to numbers using various statistics and combination, allowing you to use CatBoost without any explicit preprocessing.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Simplicity in the table -The algorithm has a very convenient API for Python and R.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head>•</head><p>Performance -with CatBoost, you can get fast and high-quality results that are not inferior to common machine learning algorithms. Gradient boosting iteratively builds a sequence of approximations t F taking into account deprivation function ( , )</p><formula xml:id="formula_13">t i L y F</formula><p>that have two input values, the i-th final output value , and the -th function t F , that evaluates i y . The estimates of i y can be improved on the found other function</p><formula xml:id="formula_14">1 t t t F F a h − = + ⋅</formula><p>, where α is the step size and the function t h is the base predictor selected from the population of H functions to calculate losses. 1 arg min ( , )</p><formula xml:id="formula_15">t t h EL y F h − = + (<label>19</label></formula><formula xml:id="formula_16">)</formula><formula xml:id="formula_17">2 2 1 1 1 arg min arg min t h H h H t t Ly Ly h E h h F n F δ δ δ δ ∈ ∈ − −     = − ≈ −        <label>(20)</label></formula><p>CatBoost uses approximation of functions by means of the Taylor series with some refinements of the gradient boost technique. Let there be a data set D of n instances, each of which has m feature sets in vector x and values in vector y. <ref type="bibr">(</ref> )</p><formula xml:id="formula_18">, , , , m k k k k D x y D n x y = = = ∈ ∈  <label>(21)</label></formula><p>One of the most common feature processing methods in CatBoost is one-step coding, but it is effective for a small number of features. To solve this problem, features are grouped into categories according to target statistics. Mathematically, the target score of the i-th categorical variable of the kth element D can be defined as follows:</p><formula xml:id="formula_19">{ } { } { } 1 ˆ; : ( ) ( ) ;( 0) 1 i i j k j k j i i j k j k x D x x y ap i k k j x D x x a x if D x j i a σ σ ∈ = ⋅ + ∈ = ⋅+ = = &lt; &gt; ∑ ∑<label>(22)</label></formula><p>The function of of indicator { }</p><formula xml:id="formula_20">1 i i j k x x = ⋅</formula><p>is equal to 1 when the i-th component of the input vector j x is equal to the i-th element of the input vector k x . k is used as the k-th element according to the order we put on D with the random permutation σ, and i takes integer values from 1 to k-1. Options a and p required to prevent overflow in the equation <ref type="bibr" target="#b14">[15]</ref>.</p><p>Value condition</p><formula xml:id="formula_21">{ } : ( ) ( ) k j D x j i σ σ = &lt;</formula><p>controls value exception k y to determine values for when encoding value k i</p><p>x . This method also uses the past data of a particular example to calculate its target statistics <ref type="bibr" target="#b15">[16]</ref>. When using target statistics, the gradients of the loss functions L with respect to function 1 Ly F δ δ − the distribution of the gradient may be biased with the condition of encoding value . This conditional bias results in changes in the score value for t h , and this is a negative impact on the results obtained when estimating 1 t F − in data that was not used in use <ref type="bibr" target="#b15">[16]</ref>. The ability of 1 t F − to generalize, known as prediction bias, can have an impact. To address this, CatBoost introduces n auxiliary models and utilizes a random permutation of training instances. However, implementing this approach can be challenging due to data limitations and memory costs. In order to avoid such errors, CatBoost employs a method where a single decision tree structure is used for all models. The algorithm utilizes the same k D , which defines an ordered target statistic, and evaluates if t h is the optimal decision tree to minimize the expected loss using the complete data set D. Residual values are calculated using permutations 1 n δ δ  which are then utilized to obtain 1 t F − and t h . This approach helps reduce the variance in gradient estimates and prevent prediction bias. The CatBoostRegressor algorithm from the CatBoost library provides the ability to solve regression problems. It automatically handles categorical features, works with various data types, and offers a number of optimizations for efficient model training. For some problems, using CatBoost in regression may require additional parameter tweaks and optimizations to achieve better model performance and accuracy.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.2.3.">LightGBM algorithm</head><p>Light Gradient Boosted Machine (LightGBM) is an open source implementation of gradient boosting. LightGBM combines 2 main ideas of GOSS and EFB. GOSS means Gradient-based One-Side Sampling. To preserve the accuracy of the information gain of the estimate, it is more appropriate to leave instances with large gradients and discard instances with small gradients. This approach contributes to a more accurate estimate than a uniform sample <ref type="bibr" target="#b16">[17]</ref>. EFB means Exclusive Feature Bundling. Since in practice the mass of all features is often quite sparse, the idea of EFB tends to reduce the number of effective features. In such a feature space, most features almost never take on non-zero values at the same time.</p><p>The main idea of the LightGBM algorithm is to develop efficient and fast gradient boosting for solving regression problems. It is an optimized version of gradient boosting that has a number of key features:</p><p>1. Building trees vertically: Unlike traditional gradient boosting, which builds trees horizontally (in series), LightGBM builds trees vertically (parallel). This allows you to speed up the learning process and achieve high performance.</p><p>2. Leaf-wise tree growth: LightGBM uses a leaf-wise tree growth algorithm in which each split node is selected with the largest gradient gain. This allows you to model dependencies that are more complex and improves the quality of predictions.</p><p>3. Histogram-Based Optimization: LightGBM uses histograms to efficiently compute gradients and compress histogram. This reduces the amount of memory required to store data and speeds up calculations.</p><p>4. Accounting for categorical features: LightGBM automatically processes categorical features without converting them first. It applies unique algorithms for encoding categorical values and allows you to use them directly in the model.</p><p>5. Parallel Processing Support: LightGBM supports parallel data processing and model training. This allows you to use all available processor cores and speeds up the learning process.</p><p>The LightGBM algorithm is based on the idea of using efficient optimization and parallel processing to achieve high performance and accuracy when solving regression problems. It is widely used in various fields where fast and accurate prediction of numerical values is required.</p><p>Benefits of Light GBM:</p><p>1) Can work with large amounts of data with significantly reduced training time.</p><p>2) The possibility of parallel learning.</p><p>3) Uses much less memory. 4) High learning rate and efficiency due to histogram algorithms. 5) High accuracy of boosting results, since Light GBM builds quite thorny trees, following the split-by-leaf approach rather than by levels.</p><p>All leaves in the decision tree are split at the same time. This is necessary to optimize flows and control the complexity of the model. Leaves have different information gain, which shows the expected decrease in entropy, which can be defined as follows <ref type="bibr" target="#b17">[18]</ref>: is the subset of B for which the attribute has the value v. The sheet-growth method is more efficient, as it only splits the sheet that has the most information gain on the same layer. The GOSS method ranks the training instances based on the absolute values of their gradients in descending order. Next, it keeps the first a×100% of instances with larger gradients and we get a subset of instances of A, and then, for the rest of the set c A , consisting of (1-a)×100% of instances with smaller gradients, we randomly choose a subset of B with size c b A × Finally, we split the instances according to the estimate of the gain in variance ( )</p><formula xml:id="formula_22">( , ) ( ) ( ) V B IG B V En B En B B ν ν ν ∈ = − ∑ (23)</formula><formula xml:id="formula_23">j V d  ver the subsets A B ∪ : 2 2 1 1 1 ( ) ( ) ( ) i l i l i r i r x A i x B i x A i x B i j j j i i a a g g g g b b V d n n d n d ∈ ∈ ∈ ∈   − −     + +             = +         ∑ ∑ ∑ ∑ <label>(25)</label></formula><p>where</p><formula xml:id="formula_24">{ } { } { } { } : , : , : , : . l i ij r i ij l i ij l i ij A x A x d A x A x d B x B x d B x B x d = ∈ ≤ = ∈ &gt; = ∈ ≤ = ∈ &gt;</formula><p>The over a smaller subset of instances is used instead of an exact estimate of ( ) j V d over all instances to determine the split point. Therefore, the computational cost can be significantly reduced. The GOSS approximation error looks like this:   This pseudocode provides a high-level overview of the LightGBM algo-rithm for regression. The actual implementation may involve additional optimizations and techniques for efficiency and performance.</p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="3.3.">Measuring error</head><p>The root mean square error (RMSE), coefficient of determination ( <ref type="formula" target="#formula_0">2</ref>R ) and mean absolute error (MAE) are used to compare the prediction performance XGBoost (eXtreme Gradient Boosting), CatBoost (Categorical Boosting), LightGBM (Light Gradient Boosting Machine) These error measures are expressed as follows, where j y and j y  are the actual response and the predicted response of observation j , and y is the average of all actual responses [23].</p><p>RMSE measures the root-mean-square difference between estimated values and actual values and is a risk function corresponding to the expected value of the squared error loss. where y average value, ˆi y -output of the model.</p><p>The determination coefficient characterizes the fraction of the variance of the resultant variable Y, an explanation of the regression, in the overall variance of the resultant variable Y. Accordingly, the magnitude 1-R 2 characterizes the fraction of the variance of the variable Y caused by the influence of other factors not taken into account in the model. MAE is a measure of error between paired observations expressing the same phenomenon and is calculated using a formula.  </p></div>
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="4.">Results and Discussion</head></div><figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_0"><head>Figure 1 :</head><label>1</label><figDesc>Figure 1: 3D directional cross section for the hypersurface of the spatiotemporal distribution of chlorpyrifos concentration in French provinces</figDesc><graphic coords="5,162.00,71.90,270.85,193.30" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_3"><head>2</head><label>2</label><figDesc></figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_4"><head>Figure 2 :</head><label>2</label><figDesc>Figure 2: Pseudocode of the XGBoost algorithm In this pseudocode</figDesc><graphic coords="8,142.61,162.91,309.76,158.41" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_5"><head>tF</head><label></label><figDesc>− re also random variables, since we use a random permutation ( ) k σ to select elements of k D to encode of categorical variables that affect the value of 1 t F − In the case of obtaining 1 t</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_6"><head>Figure 3 :</head><label>3</label><figDesc>Figure 3: Pseudocode of the CatBoost algorithm for solving regression problems In this pseudocode</figDesc><graphic coords="10,136.01,71.96,323.05,166.05" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_7"><head></head><label></label><figDesc>En B is the information entropy of collection B, d p is the relation of B to category d, D is the number of categories, v is the value of the attribute V, аnd</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_8"><head>coefficient 1 a b −</head><label>b</label><figDesc>normalizes the sum of the gradients over B to the size of c A . The coefficient 1 a b − is used to normalize the sum of gradients over B to the size of c A . Thus, an estimate of ( )</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_10"><head></head><label></label><figDesc>a pseudocode representation of the LightGBM algorithm for regression (Fig.4):</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_11"><head>Figure 4 :</head><label>4</label><figDesc>Figure 4: Pseudocode of the LightBoost algorithm for solving regression problems</figDesc><graphic coords="12,131.81,72.06,331.45,409.20" type="bitmap" /></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" xml:id="fig_12"><head></head><label></label><figDesc>The accuracy of the models obtained was based on the formula of the determination coefficient (R 2 -statistics):</figDesc></figure>
<figure xmlns="http://www.tei-c.org/ns/1.0" type="table" xml:id="tab_0"><head>Table 1</head><label>1</label><figDesc>Metrics for LightGBMPesticides RMSE</figDesc><table><row><cell>R</cell><cell>2</cell><cell>MAE</cell></row></table></figure>
			<note xmlns="http://www.tei-c.org/ns/1.0" place="foot" n="2" xml:id="foot_0">, ,</note>
		</body>
		<back>
			<div type="annex">
<div xmlns="http://www.tei-c.org/ns/1.0"><head n="5.">Conclusions and Future Work</head><p>Comparing the obtained models based on the RMSE from Tables 1-3 while considering that the lower the RMSE, the higher the accuracy. From these tables, we can conclude that XGBoost has the lowest RMSE value for all pesticide species; hence, it works well. On the other hand, regression of the reference vector had the highest value for all pesticide species, so the model did not work well for the data sets studied.</p><p>Based on 2 R -we know that the higher the 2 R higher the accuracy. Tables <ref type="table">1-3</ref> show the values of each model. From these tables, we can conclude that XGBoost has a slightly higher 2  R value than CatBoost algorithm, so they both perform well. The reference vector regression had the lowest value, so the model did not work well for the examined datasets. Based on MAE -we know that the lower the MAE, the higher the accuracy. Tables <ref type="table">1-3</ref> show the MAE values for each model for the six pesticide species. From these tables, we can conclude that the XGBoost regression has the lowest MAE value; hence it works well for this task. On the other hand, the reference CatBoost had the highest value, so the model did not work for the data set.</p><p>As the results of the experiments have shown, XGBoost is an efficient algorithm with good performance. It offers a wide range of features, supports regularization, and has flexible parameters for model optimization. The LightGBM algorithm demonstrates high efficiency, fast training time, and low memory usage. CatBoost performs well and provides built-in support for categorical feather waver; in our research, based on the accuracy results presented in tables 1-3, we would prefer the XGBoost algorithm for solving our regression task. The comparison of accuracy between LightGBM and XGBoost in solving regression problems may depend on the specific dataset and model parameters. Both algorithms generally exhibit high accuracy in regression tasks, but the results can vary depending on the data characteristics.</p><p>The main objective of this paper was to find and evaluate adequate and reliable methods for constructing predictive models built on statistically small samples of monitoring data. Therefore our task was to compare and evaluate different regression models based on model evaluation indicators. Experimental data describing the spatial and temporal distributions of concentrations of different pesticides across the French provinces were used as a dataset. The pesticides used were Chlorpyrifos, Folpet, Lindane, PBO, Pendimethalin, and Tebuconazole.</p><p>In the study, we were able to explore different regression algorithms such as k LightGBM regression, CatBoost XGBoost regression and applied these algorithms to the dataset. The study was conducted in the R environment.</p><p>From this study on our dataset, the XGBoost regression showed the best results with an 2 R score of 83% to 93% on the examined data. In contrast, the reference vector regression showed the lowest results.</p></div>			</div>
			<div type="references">

				<listBibl>

<biblStruct xml:id="b0">
	<analytic>
		<title level="a" type="main">Distribution of organochlorine pesticides in atmospheric air of Tamilnadu, southern India</title>
		<author>
			<persName><forename type="first">S</forename><surname>Srimurali</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Govindaraj</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Kumar</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Babu Rajendran</surname></persName>
		</author>
		<idno type="DOI">10.1007/sl3762-014-055p-3</idno>
	</analytic>
	<monogr>
		<title level="j">Int. J. Environ. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="page" from="1957" to="1964" />
			<date type="published" when="2015">2015</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b1">
	<analytic>
		<title level="a" type="main">Atmospheric organochlorine pesticides and polychlorinated biphenyls in urban areas of Nepal: spatial variation, sources, temporal trends, and long-range transport potential//Atmos</title>
		<author>
			<persName><forename type="first">B</forename><surname>Pokhrel</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Gong</surname></persName>
		</author>
		<author>
			<persName><surname>Xi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">J</forename><surname>Nath Khanal</surname></persName>
		</author>
		<author>
			<persName><surname>Ren</surname></persName>
		</author>
		<author>
			<persName><surname>Ch</surname></persName>
		</author>
		<author>
			<persName><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><surname>Sh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gao</surname></persName>
		</author>
		<author>
			<persName><surname>Yao</surname></persName>
		</author>
		<idno type="DOI">.org/10.5194/acp-18-1325-2018</idno>
	</analytic>
	<monogr>
		<title level="j">Chem. Phys</title>
		<imprint>
			<biblScope unit="volume">18</biblScope>
			<biblScope unit="page" from="1325" to="1336" />
			<date type="published" when="2018">2018</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b2">
	<analytic>
		<title level="a" type="main">Spatial and temporal pattern of pesticides in the global atmosphere</title>
		<author>
			<persName><forename type="first">C</forename><surname>Shunthirasingham</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><forename type="middle">E</forename><surname>Oyiliagu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Cao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Gouin</surname></persName>
		</author>
		<author>
			<persName><forename type="first">F</forename><surname>Wania</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S.-C</forename><surname>Lee</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><forename type="middle">C</forename><surname>Muir</surname></persName>
		</author>
		<idno type="DOI">10.1039/c0em00134a</idno>
	</analytic>
	<monogr>
		<title level="j">//Journal of Environmental Monitoring</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">9</biblScope>
			<biblScope unit="page">1650</biblScope>
			<date type="published" when="2010">2010</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b3">
	<analytic>
		<title level="a" type="main">The Global Distribution of PCBs and Organochlorine Pesticides in Butter</title>
		<author>
			<persName><forename type="first">O</forename><forename type="middle">I</forename><surname>Kalantzi</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">E</forename><surname>Alcock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><forename type="middle">A</forename><surname>Johnston</surname></persName>
		</author>
		<author>
			<persName><forename type="first">D</forename><surname>Santillo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><forename type="middle">L</forename><surname>Stringer</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><forename type="middle">O</forename><surname>Thomas</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><forename type="middle">C</forename><surname>Jones</surname></persName>
		</author>
		<idno type="DOI">10.1021/es0002464</idno>
	</analytic>
	<monogr>
		<title level="j">// Environmental Science &amp; Technology</title>
		<imprint>
			<biblScope unit="volume">35</biblScope>
			<biblScope unit="issue">6</biblScope>
			<biblScope unit="page" from="1013" to="1018" />
			<date type="published" when="2001">2001</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b4">
	<analytic>
		<title level="a" type="main">Ecological and health risk assessment of organochlorine pesticides in an urbanized river network of Shanghai, China</title>
		<author>
			<persName><forename type="first">C</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Zou</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Ma</surname></persName>
		</author>
		<idno type="DOI">10.1186/s12302-020-00322-9</idno>
	</analytic>
	<monogr>
		<title level="j">// Environmental Sciences Europe</title>
		<imprint>
			<biblScope unit="volume">32</biblScope>
			<biblScope unit="issue">1</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b5">
	<analytic>
		<title level="a" type="main">Occurrence and distribution of organochlorine pesticides in Karachi coastal water</title>
		<author>
			<persName><forename type="first">R</forename><surname>Majeed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><forename type="middle">U</forename><surname>Fatima</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">A</forename><surname>Khan</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Sh</surname></persName>
		</author>
		<author>
			<persName><surname>Shaukat</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">International Journal Of Biology And Biotechnology</title>
		<imprint>
			<biblScope unit="volume">17</biblScope>
			<biblScope unit="issue">3</biblScope>
			<biblScope unit="page" from="503" to="512" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b6">
	<analytic>
		<title level="a" type="main">Organochlorine Pesticides in Surface Water of Jiuxi Valley, China: Distribution, Source Analysis, and Risk Evaluation</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Zheng</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Z</forename><surname>Liu</surname></persName>
		</author>
		<idno type="DOI">10.1155/2020/5101936</idno>
	</analytic>
	<monogr>
		<title level="j">Journal of Chemistry</title>
		<imprint>
			<biblScope unit="page" from="1" to="8" />
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b7">
	<analytic>
		<title level="a" type="main">Spatial and temporal variations of organochlorine pesticides (OCPs) in water and sediments from Honghu Lake</title>
		<author>
			<persName><forename type="first">L</forename><surname>Yuana</surname></persName>
		</author>
		<author>
			<persName><surname>Sh</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Qia</surname></persName>
		</author>
		<author>
			<persName><surname>Wud</surname></persName>
		</author>
		<author>
			<persName><surname>Ch</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Wua</surname></persName>
		</author>
		<author>
			<persName><forename type="first">X</forename><surname>Xinga</surname></persName>
		</author>
		<author>
			<persName><surname>Gongf</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.gexplo.2013.07.002</idno>
		<ptr target="http://dx.doi.Org/10.1016/j.gexplo.2013.07.002" />
	</analytic>
	<monogr>
		<title level="j">China//Journal of Geochemical Exploration</title>
		<imprint>
			<biblScope unit="volume">132</biblScope>
			<biblScope unit="page" from="181" to="187" />
			<date type="published" when="2013">2013</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b8">
	<analytic>
		<title level="a" type="main">Temporal and Spatial Trends of Organochlorine Pesticides in Great Lakes Precipitation</title>
		<author>
			<persName><forename type="first">P</forename><surname>Sun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Backus</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Blanchard</surname></persName>
		</author>
		<author>
			<persName><forename type="first">R</forename><surname>Hites</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Environ. Sci. Technol</title>
		<imprint>
			<biblScope unit="volume">40</biblScope>
			<biblScope unit="issue">7</biblScope>
			<biblScope unit="page" from="2135" to="2141" />
			<date type="published" when="2006">2006</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b9">
	<analytic>
		<title level="a" type="main">Pesticides in South African fresh waters</title>
		<author>
			<persName><forename type="first">T</forename><surname>Ansara-Ross</surname></persName>
		</author>
		<author>
			<persName><forename type="first">V</forename><surname>Wepener</surname></persName>
		</author>
		<author>
			<persName><forename type="first">P</forename><surname>Van Den Brink</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><surname>Ross</surname></persName>
		</author>
		<idno type="DOI">10.2989/16085914.2012.666336</idno>
	</analytic>
	<monogr>
		<title level="j">African Journal of Aquatic Science</title>
		<imprint>
			<biblScope unit="volume">37</biblScope>
			<biblScope unit="issue">1</biblScope>
			<biblScope unit="page" from="1" to="16" />
			<date type="published" when="2012">2012</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b10">
	<analytic>
		<title level="a" type="main">Bioaccumulation and tissue distribution of organochlorine pesticides (OCPs) in freshwater fishes: a case study performed in Poyang Lake, China&apos;s largest lake</title>
		<author>
			<persName><forename type="first">Z</forename><surname>Zhao</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">L</forename><surname>Zhang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Cai</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><surname>Chen</surname></persName>
		</author>
		<idno type="DOI">10.1007/s11356-014-2805-z</idno>
	</analytic>
	<monogr>
		<title level="j">// Environmental Science and Pollution Research</title>
		<imprint>
			<biblScope unit="volume">21</biblScope>
			<biblScope unit="issue">14</biblScope>
			<biblScope unit="page" from="8740" to="8749" />
			<date type="published" when="2014">2014</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b11">
	<analytic>
		<title level="a" type="main">Spatial and temporal distribution of current-use pesticides in ambient air of Provence-Alpes-Cote-d&apos;Azur Region and Corsica, France</title>
		<author>
			<persName><forename type="first">M</forename><surname>Desert</surname></persName>
		</author>
		<author>
			<persName><forename type="first">S</forename><surname>Ravier</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gille</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Quinapallo</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Armengaucl</surname></persName>
		</author>
		<idno type="DOI">10.1016/j.atmoscnv.2018.08.054.hal-01865350</idno>
	</analytic>
	<monogr>
		<title level="j">//Atmospheric Environment</title>
		<imprint>
			<biblScope unit="volume">192</biblScope>
			<biblScope unit="page" from="241" to="256" />
			<date type="published" when="2018">2018</date>
			<publisher>Elsevier</publisher>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b12">
	<monogr>
		<title level="m" type="main">XGBoost: A Scalable Tree Boosting System</title>
		<author>
			<persName><forename type="first">T</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">C</forename><surname>Guestrin</surname></persName>
		</author>
		<idno type="arXiv">arXiv:1603.02754v3[cs.LG]10</idno>
		<imprint>
			<date type="published" when="2016-06">Jun 2016</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b13">
	<analytic>
		<title level="a" type="main">Extreme gradient boosting (Xgboost) model to predict the groundwater levels in Selangor Malaysia</title>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">I</forename><surname>Ahmed Osman</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">N</forename><surname>Ahmed</surname></persName>
		</author>
		<author>
			<persName><forename type="first">M</forename><forename type="middle">F</forename><surname>Chow</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Y</forename><forename type="middle">F</forename><surname>Huang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>El-Shafieef</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Ain Shams Engineering Journal</title>
		<imprint>
			<biblScope unit="volume">12</biblScope>
			<biblScope unit="issue">2</biblScope>
			<biblScope unit="page" from="1545" to="1556" />
			<date type="published" when="2021-06">June 2021</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b14">
	<monogr>
		<title level="m" type="main">CatBoost: unbiased boosting with categorical features</title>
		<author>
			<persName><forename type="first">L</forename><surname>Prokhorenkova</surname></persName>
		</author>
		<author>
			<persName><forename type="first">G</forename><surname>Gusev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Vorobev</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><forename type="middle">V</forename><surname>Dorogush</surname></persName>
		</author>
		<author>
			<persName><forename type="first">A</forename><surname>Gulin</surname></persName>
		</author>
		<idno>cs.LG] 20</idno>
		<imprint>
			<date type="published" when="1706-01">1706. Jan 2019</date>
			<biblScope unit="page" from="9516" to="9521" />
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b15">
	<analytic>
		<title level="a" type="main">Research on XGboost academic forecasting and analysis modelling</title>
		<author>
			<persName><forename type="first">T</forename><surname>Hu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Song</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">J. Phys.: Conf. Ser</title>
		<imprint>
			<biblScope unit="volume">1324</biblScope>
			<biblScope unit="page">12091</biblScope>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b16">
	<analytic>
		<title level="a" type="main">CatBoost for big data: an interdisciplinary review</title>
		<author>
			<persName><forename type="first">J</forename><forename type="middle">T</forename><surname>Hancock</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><forename type="middle">M</forename><surname>Khoshgoftaar</surname></persName>
		</author>
	</analytic>
	<monogr>
		<title level="j">Journal of Big Data volume</title>
		<imprint>
			<biblScope unit="volume">7</biblScope>
			<biblScope unit="page">94</biblScope>
			<date type="published" when="2020">2020</date>
		</imprint>
	</monogr>
</biblStruct>

<biblStruct xml:id="b17">
	<monogr>
		<title level="m" type="main">LightGBM: A Highly Efficient Gradient Boosting Decision Tree</title>
		<author>
			<persName><forename type="first">G</forename><surname>Ke</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Meng</surname></persName>
		</author>
		<author>
			<persName><surname>Th</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T</forename><surname>Finley</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Wang</surname></persName>
		</author>
		<author>
			<persName><forename type="first">W</forename><surname>Chen</surname></persName>
		</author>
		<author>
			<persName><forename type="first">Q</forename><surname>Ma</surname></persName>
		</author>
		<author>
			<persName><forename type="first">T.-Y</forename><surname>Ye</surname></persName>
		</author>
		<author>
			<persName><surname>Liu</surname></persName>
		</author>
		<imprint/>
	</monogr>
</biblStruct>

<biblStruct xml:id="b18">
	<analytic>
		<title level="a" type="main">Performance Comparison of Different Machine Learning Algorithms on the Prediction of Wind Turbine Power Generation Conference</title>
		<author>
			<persName><forename type="first">O</forename><surname>Eyecioglu</surname></persName>
		</author>
		<author>
			<persName><forename type="first">B</forename><surname>Hangun</surname></persName>
		</author>
		<author>
			<persName><forename type="first">K</forename><surname>Kayisli</surname></persName>
		</author>
		<idno type="DOI">10.1109/ICRERA47325.2019.8996541</idno>
	</analytic>
	<monogr>
		<title level="m">Proceedings of the 8th International Conference on Renewable Energy Research and Applications (ICRERA)</title>
				<meeting>the 8th International Conference on Renewable Energy Research and Applications (ICRERA)</meeting>
		<imprint>
			<date type="published" when="2019">2019</date>
			<biblScope unit="page" from="922" to="926" />
		</imprint>
	</monogr>
</biblStruct>

				</listBibl>
			</div>
		</back>
	</text>
</TEI>
