<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>X (V. Khomyshyn);</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Comparative analysis of ensemble methods for outlier detection in real estate⋆</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Viktor Khomyshyn</string-name>
          <email>homyshyn@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Oleh Pastukh</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Vasyl Yatsyshyn</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Nataliya Zagorodna</string-name>
          <email>zagorodna.n@gmail.com</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Ternopil Ivan Puluj National Technical University</institution>
          ,
          <addr-line>Ruska str., 56, 46001, Ternopil</addr-line>
          ,
          <country country="UA">Ukraine</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>1808</year>
      </pub-date>
      <volume>000</volume>
      <fpage>0</fpage>
      <lpage>0002</lpage>
      <abstract>
        <p>The paper evaluates the effectiveness of ensemble methods for outlier detection on a real dataset. The aim of the paper is to implement improved ML approaches in the real estate industry in order to optimize market data analysis. The study covered the real estate market of the city of Ternopil (Ukraine), in particular the sale of apartments and rooms. The prepared dataset contained 760 real estate objects with 12 features. For each real estate object, an anomaly label was assigned by an expert based on its characteristics. Algorithm testing was carried out using two methods of encoding categorical features Label Encoder and One-Hot Encoder. Data set standardization was carried out using the RobustScaler scaler resistant to outliers. The following ensemble methods were used during the experiments: INNE, LODA, IForest and Feature Bagging. The results of the work were evaluated by three indicators: AUCROC, Precision @ Rank n and algorithm execution time. They allowed us to assess the accuracy and efficiency of ensemble algorithms and determine their suitability for real-world problems of anomaly detection in real estate data. The visualization of the results of the algorithms was carried out using PCA and t-SNE dimensionality reduction methods and showed how well each model detects normal and anomalous objects. This study is a sequential stage in building a multi-agent system for working with real estate based on modern innovative machine learning algorithms.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;machine learning</kwd>
        <kwd>ensemble methods</kwd>
        <kwd>outlier detection</kwd>
        <kwd>anomaly detection</kwd>
        <kwd>real estate</kwd>
        <kwd>CEUR-WS 1</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Numerous studies in the field of machine learning show that two approaches currently dominate it:
deep learning and ensemble learning. Deep learning is known for its ability to automatically
extract useful patterns from raw data, such as images or audio. Ensemble learning, in turn, has
shown high efficiency in building models on structured data that already contain meaningful
features. It imitates the natural human habit of referring to several opinions before making an
important decision. The basic principle of ensemble learning is to weigh the results of individual
models and combine them to achieve a more accurate answer than the one that a single model
could obtain [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ].
      </p>
      <p>
        Outlier detection is a key technique in building effective machine learning models for real estate
valuation. The accuracy of such models is directly related to the quality of the data [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. It is
important for investors, sellers, and buyers that the price of the object most accurately reflects the
current market situation. The non-standard nature of outliers often significantly worsens the
forecasting results. Therefore, there are and are used many algorithms that detect outliers based on
various criteria: distance, density, angle, etc. More modern approaches involve the use of ensemble
machine learning methods for such tasks. They allow to increase the accuracy of outlier detection.
      </p>
      <p>The purpose of our study is to analyze the effectiveness of ensemble machine learning methods for
detecting outliers in real estate data, as well as to develop an optimized approach that combines the
advantages of different algorithms to increase the accuracy and stability of detection. This will
create a theoretical and practical basis for automating data cleaning in real estate data processing
systems.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Analysis of recent studies and publications</title>
      <p>Ensemble approaches in anomaly analysis are much less studied than in classical data mining tasks.
This is due to the fact that assessing the quality of individual ensemble components when detecting
anomalies is more difficult. This process is also often affected by the lack of accurate labeled data.
As a result, decisions about combining and selecting ensembles must be made using intermediate
results of the algorithm, rather than specific quality metrics on validation datasets. However, these
intermediate results can sometimes be inaccurate estimates of the degree of anomaly. When
decisions about selecting and combining ensembles are made in an uncontrolled manner, the
probability of making erroneous decisions is much higher. However, although the process of
outlier detection is a difficult task for ensemble analysis, these difficulties are not insurmountable.</p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] different ways of classifying outlier analysis problems are proposed, such as independent
or sequential ensembles, and data-driven or model-driven ensembles. The impact of different
combination functions and their relationship to different types of ensembles is also discussed. The
choice of the right combination function is important, although in the general case it may depend
on the structure of the ensemble. In addition, many modern outlier detection methods are
compared with different types of ensembles and the possibility of adapting ensemble methods from
other data mining tasks to outlier detection tasks is considered.
      </p>
      <p>
        The paper [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] is devoted to the study of the theoretical foundations of ensemble outlier analysis.
Despite the significant differences between the classification and outlier analysis problems, the
paper shows that the theoretical foundations of these two problems are actually very similar in
terms of the trade-off between bias and variance. The influence of the combination function is
discussed and specific trade-offs between the averaging and maximization functions are considered.
The authors propose several robust variations of feature bagging methods and two new
combination methods based on variance-bias theory.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] a new outlier detection algorithm is proposed – Ensemble Outlier Detection Method
Based on Information Entropy – Weighted Subspaces for High-Dimensional Data (EOEH). It first
performs a random secondary subsampling of the data, and the detectors are run on different
small-scale subsamplings to obtain a variety of detection results. Then, these results are aggregated
to reduce the global variance and improve the robustness of the algorithm. Next, information
entropy is used to construct a weighting method for dimension spaces, which allows determining
influential factors in different multidimensional spaces. This method generates weighted subspaces
and dimensions for data objects, reducing the influence of noise generated by high-dimensional
data, and improving the detection performance of high-dimensional data. Finally, a new detector –
High-Precision Local Outlier Factor (HPLOF) is proposed, which enhances the differentiation
between normal data and outliers, thereby improving the detection performance of the algorithm.
Experiments using simulated data and UCI datasets have confirmed the feasibility of the proposed
algorithm. Compared to popular outlier detection algorithms, EOEH improves detection quality by
an average of 6% and runs 20% faster.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], an approach is proposed for automatic optimization of outlier detection ensembles using
a limited number of outlier samples (from 1 to 10% of the available outliers). Optimized outlier
detection ensembles consist of outlier detection algorithms that provide an outlier estimate and use
adjustable parameters. Automatic optimization determines parameter values that improve the
discrimination between normal data and outliers. This increases the efficiency of outlier detection.
      </p>
      <p>
        The study [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] describes a universal framework for outlier detection based on bi-sampling –
BiSampling Outlier Detection (BSOD) and provides theoretically justified optimal ratios for row and
column sampling. The BSOD method demonstrates diversity in ensemble construction. As a base,
the LOF (Local Outlier Factor) algorithm was used – a classic method for detecting local outliers,
which calculates the local density by averaging the density of its neighbors. By implementing LOF
within BSOD, a BI-LOF model was built, on which experiments were conducted using 30 synthetic
and 17 real datasets. The BI-LOF model was also tested for outlier detection in images. Overall, the
experimental results showed high quality and stability of the BI-LOF method.
      </p>
      <p>
        In unsupervised ensemble outlier detection problems, the lack of labeled outliers it difficult to
combine baseline outlier detectors. In particular, existing parallel ensembles do not have a reliable
mechanism for selecting effective baseline detectors, which negatively affects the accuracy and
stability during model combination. In [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ], an approach called Locally Selective Combination in
Parallel Outlier Ensembles (LSCP) is proposed, which solves this problem by determining a local
region around a test instance through the consensus of its nearest neighbors in randomly selected
feature subspaces. Unlike traditional combination methods, LSCP determines the best baseline
detectors for each test instance relative to its local environment. The most effective baseline
detectors in this local region are selected and combined to obtain the final result. To evaluate the
effectiveness, the proposed approach is tested on 20 real-world datasets and demonstrates
superiority over baseline algorithms. Four variants of the LSCP model are compared with seven
common parallel approaches. The ensemble approach of LSCP AOM shows the best results,
achieving higher scores on 13 out of 20 datasets by the ROC-AUC metric and on 14 out of 20 by the
mAP metric (average accuracy). The paper also provides theoretical justifications in the context of
the trade-off between bias and variance and visualizations that provide a comprehensive
understanding of the LSCP technique. Since the LSCP approach demonstrates the promise of using
local data, the authors of the study propose to extend this technique by using heterogeneous basic
detectors.
      </p>
      <p>
        A similar approach to the previous one, which also works in the absence of labeled data
(without a teacher), is described in [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]. The authors propose a new improved framework for
combining anomaly detectors – Dynamic Combination of Detector Scores for Outlier Ensembles
(DCSO). Unlike traditional ensemble methods that statically combine detectors, DCSO dynamically
determines the most effective baseline detectors for each test case by evaluating their effectiveness
in a certain local region. The DCSO algorithm first outlines the local region of the test case by its k
nearest neighbors, and then identifies the most effective baseline detectors in this local region.
Given the fact that local data connections are crucial for combining anomaly scores, DCSO ranks
the quality of individual baseline detectors by their similarity to pseudo-reference data in the local
region. To increase the stability of the model and reduce the risks of using a single detector, the
study also proposes different variants of ensembles based on DCSO. The performance of DCSO is
verified through statistical evaluations on ten real-world datasets. The results confirm that this
approach outperforms traditional static combination methods in anomaly detection. In addition to
significantly improving detection quality, DCSO is also computationally robust, being compatible
with any baseline detector (e.g. LOF or k-NN) and open in demonstrating how outlier estimates are
generated for each test instance, given the chosen baseline detector.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], three ensemble methods were compared – LSCP (Locally Selective Combination in
Parallel Outlier Ensembles), iForest (Isolation Forest) and FB (Feature Bagging) with eleven widely
used anomaly detection methods, including: ABOD, CBLOF, HBOS, KNN, LOF, MCD, OCSVM,
PCA, SOS, SOD, AveKNN. The initial data set contained 400 observations with an outlier rate of
0.25. At the first stage, 80 instances were randomly selected from the set, which were divided into
approximately equal parts. Two of these parts were used as a training set, and the third as a test set
(cross-validation). The results of the three tests were averaged. The three most effective algorithms
that showed the best ROC curve values were combined to create a more accurate final model,
which was subsequently applied to the full initial data set. The presented approach for forming an
ensemble has shown its effectiveness and reliability.
      </p>
      <p>
        In [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], an ensemble approach called Cumulative Agreement Rates Ensemble (CARE) is
proposed, which aims to achieve low error by reducing variance and bias. This method considers
anomaly detection as a binary classification problem of unlabeled data and uses a two-phase
aggregation of intermediate results at each iteration to obtain the final result. The two main
components of CARE are its parallel and sequential components. The former help to reduce
variance by weighted combination of several baseline detectors, and the latter are designed to
reduce both bias and variance by using Filtered Variable Probability Sampling (FVPS) and
cumulative aggregation. The first stage sequentially eliminates outliers from the original data set to
build a better data model on which to estimate the outlier, and the second stage combines results
from individual baseline detectors and between iterations. The proposed method was tested on 16
real-world datasets, mostly from the UCI machine learning repository, and showed significant
improvement over baseline methods and state-of-the-art ensemble approaches to anomaly
detection, demonstrating either superior or close results.
      </p>
      <p>The results of the presented numerous studies show that ensemble technologies are actively
developing, providing a significant improvement in the quality of outlier detection, and ensemble
analysis is a promising area of research, in particular in anomaly detection problems.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Research methodology</title>
      <sec id="sec-3-1">
        <title>3.1. Dataset description</title>
        <p>
          In this study, the city of Ternopil (Ukraine) was selected for the analysis of real estate information.
To obtain and maintain the relevance of the dataset, developed software was used in the form of a
Windows application, a Microsoft SQL Server database, and a website [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. Data collection was
carried out by parsing the Internet pages of popular real estate portals in the region. Characteristics
missing from the advertisement (sources of information) were clarified with the owner.
        </p>
        <p>In general, the study used data that was in the database as of the beginning of 2025. The dataset
contained a description of 760 offers for the sale of apartments and rooms in dormitories in the city
of Ternopil. To form a dataset that was later used for machine learning, the software was
supplemented with a function to export the necessary fields to CSV format. A total of 14 columns
were exported from the program (Table 1). The first column is the object ID (unique identifier),
which we will use for auxiliary purposes only, since it has no information value. The next 12
columns are characteristics (features) that describe the real estate object. The last column is the
anomaly label (1 - anomalous object, 0 - normal object).</p>
        <p>
          The anomaly label of each real estate object in the dataset was assessed by an expert based on
an analysis of the object's location, its characteristics, and cost at the stage of adding the object to
the database or changing its price or condition. For this purpose, a set of rules was created that
allowed detecting anomalous values that differ significantly from the average indicators in the
region. For example, objects with too low or high cost compared to other objects of a similar type
and location were considered potential anomalies. The analysis took into account that the price of
1 sq.m. of housing is usually higher for small apartments, mainly one-room ones. In addition,
atypical combinations of characteristics were taken into account, such as the high cost of
apartments in older buildings, mainly from the 80s and earlier. The expert had at his disposal
photographs of each object, on average from 5 to 20 pieces, and in some cases a copy of the
technical passport, against which the information entered into the database was checked. The
dataset prepared in this way contained 10% of the observed anomalous objects. The current state of
the database is available at [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ].
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Research procedure</title>
        <p>The effectiveness of ensemble outlier detection methods was tested in the Python programming
language using the Spyder IDE tool included in the Anaconda software package. Data processing
was performed using the pandas, numpy, sklearn, and pyod libraries, and visualization was
performed using the matplotlib library. Preprocessing of the dataset included: separating the values
0
1
2
3
4
5
6
7
8
9
10
11
12
13</p>
        <p>id
realty_type</p>
        <p>district
total_area
floor
floors
repair_state
wall_material
furniture
heating
build_year
market
price
label
of the "id" and "label" fields into separate one-dimensional arrays; removing duplicate rows;
removing columns with missing data.</p>
        <p>The ID of the object in the database
text
text
integer
integer
integer
text
text
text
text
text
text
integer
binary</p>
        <sec id="sec-3-2-1">
          <title>Type of real estate</title>
        </sec>
        <sec id="sec-3-2-2">
          <title>City district (residential area)</title>
        </sec>
        <sec id="sec-3-2-3">
          <title>Total area (m2)</title>
        </sec>
        <sec id="sec-3-2-4">
          <title>The floor on which the real estate is located</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>Total number of floors in the building</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>State of repair</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>Material of external walls</title>
        </sec>
        <sec id="sec-3-2-8">
          <title>Availability of furniture</title>
        </sec>
        <sec id="sec-3-2-9">
          <title>Type of heating</title>
        </sec>
        <sec id="sec-3-2-10">
          <title>Construction years</title>
        </sec>
        <sec id="sec-3-2-11">
          <title>Real estate market</title>
        </sec>
        <sec id="sec-3-2-12">
          <title>The price of the object ($)</title>
        </sec>
        <sec id="sec-3-2-13">
          <title>Anomaly label of an object</title>
          <p>
            The list of studied ensemble algorithms and their characteristics is given in Table 2. All of these
algorithms are components of the PyOD (Python Outlier Detection) library. By default, Feature
Bagging (FB) uses the LOF algorithm as the base estimator. We extended the Feature Bagging study
by using the CBLOF, KDE, KNN, OCSVM, and QMCD algorithms as the base detectors instead of
LOF. These algorithms showed the best individual results for outlier detection on the studied
dataset [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
          </p>
          <p>For a more comprehensive evaluation, each algorithm was tested on a dataset that underwent
various methods of encoding categorical features (Label Encoding and One-Hot Encoding) at the
preprocessing stage, as well as with and without data scaling (RobustScaler).</p>
          <p>At the same time, the size of the dataset, depending on the method of encoding categorical
features, was as follows:

</p>
          <p>Label Encoding – dataset of 760 observations and 12 features;</p>
          <p>One-Hot Encoding – dataset of 760 observations and 67 features.</p>
          <p>The experiments were conducted on a PC with the Windows 10 Pro x64 operating system, an
Intel(R) Core(TM) i3-10105F 3.70 GHz processor, and 32 GB of RAM.</p>
          <p>










</p>
        </sec>
        <sec id="sec-3-2-14">
          <title>INNE</title>
        </sec>
        <sec id="sec-3-2-15">
          <title>LODA</title>
        </sec>
        <sec id="sec-3-2-16">
          <title>IForest FB</title>
        </sec>
        <sec id="sec-3-2-17">
          <title>Decryption</title>
        </sec>
        <sec id="sec-3-2-18">
          <title>Isolation-based Anomaly Detection Using NearestNeighbor Ensembles</title>
        </sec>
        <sec id="sec-3-2-19">
          <title>Lightweight On-line Detector of Anomalies</title>
        </sec>
        <sec id="sec-3-2-20">
          <title>Isolation Forest</title>
        </sec>
        <sec id="sec-3-2-21">
          <title>Feature Bagging</title>
        </sec>
        <sec id="sec-3-2-22">
          <title>Type</title>
        </sec>
        <sec id="sec-3-2-23">
          <title>Unsupervised</title>
        </sec>
        <sec id="sec-3-2-24">
          <title>Unsupervised</title>
        </sec>
        <sec id="sec-3-2-25">
          <title>Unsupervised</title>
        </sec>
        <sec id="sec-3-2-26">
          <title>Unsupervised</title>
        </sec>
        <sec id="sec-3-2-27">
          <title>The results were evaluated using three indicators:</title>
        </sec>
        <sec id="sec-3-2-28">
          <title>Year</title>
          <p>2018
2016
2008
2005</p>
        </sec>
        <sec id="sec-3-2-29">
          <title>Multicore</title>
        </sec>
        <sec id="sec-3-2-30">
          <title>Source No No Yes</title>
          <p>Yes
[15]
[16]
[17]
[18]</p>
          <p>For each combination of algorithm and data set, 100 experiments were conducted. The metrics
were averaged. The test results are presented in Table 3.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Preliminary assessment of results</title>
        <p>The data in Table 3 indicate that the way categorical features are encoded and the data scaling
affects the accuracy of each ensemble differently. We have summarized these results and propose a
matrix for selecting the optimal characteristics of the dataset for each ensemble algorithm (Table
4). Let us explain it using the example of the IForest algorithm: the best result is achieved when
using the Label Encoder, while the data scaling (yes/no) does not matter.</p>
        <p>The quality assessment of models based on the AUC-ROC indicator is as follows [19]:</p>
        <sec id="sec-3-3-1">
          <title>AUC-ROC ≥ 0.9 – excellent quality;</title>
          <p>0.8 ≤ AUC-ROC &lt; 0.9 – good qualit;
0.7 ≤ AUC-ROC &lt; 0.8 – acceptable (satisfactory) quality;
0.5&lt; AUC-ROC &lt; 0.7 – low quality;
AUC-ROC = 0.5 – equivalent to random guessing.</p>
          <p>The AUC-ROC metric is a balanced global estimate, but it is not sensitive to class imbalance,
which is bad for very rare anomalies.</p>
          <p>If it is important to focus on the top anomalies (local estimate), the P@n metric is used. It
indicates what proportion of objects among the top-n detected by the system are really anomalies.</p>
          <p>Among the considered algorithms, the following ensembles have the best accuracy indicators:
Although the FB + QMCD ensemble showed high results (AUC-ROC=0.880, P@n=0.505), we
excluded it from further research due to the fact that the spread of metrics in the experiments was
quite significant.
0.392
0.582
0.327
0.514
0.413
0.581
0.155
0.218</p>
        </sec>
      </sec>
      <sec id="sec-3-4">
        <title>3.4. Optimization of algorithms</title>
        <p>The next step of the study was to optimize the hyperparameters of the selected ensembles in order
to obtain the best result that each ensemble algorithm can provide. Currently, there are such
popular approaches to solving this problem: grid search, random search and Bayesian optimization.
There are several Python libraries for finding optimal hyperparameter settings, including Optuna,
Ray Tune and Hyperopt. These libraries simplify and automate the search process and also have
the ability to scale in several computing environments to speed up the result [20].</p>
        <p>In our study, the Optuna framework was used. This is an open source library that provides
efficient optimization by applying modern hyperparameter selection algorithms and effective
pruning of unpromising trials. Optuna also integrates with MLflow (a standard format for
packaging machine learning models) for tracking and monitoring models and trials [21].</p>
        <p>The AUC-ROC metric was chosen as the objective function for optimization. The goal of the
optimization was to maximize the value of the objective function. The hyperparameter search
space was specified individually for each ensemble (INNE, LODA) or the base algorithm (CBLOF,
KNN). The results of hyperparameter optimization for each ensemble are presented in Tables 5-8.
Figures 1-4 present visualizations of the results of the optimized outlier detection algorithms using
PCA [22] and t-SNE [23] dimensionality reduction methods. Visual evaluation demonstrates high
accuracy of anomaly detection by each algorithm.</p>
        <p>Time, ms
0.863
0.514
0.564
17
101</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Conclusions</title>
      <p>The conducted comparative study showed high accuracy of ensemble machine learning algorithms
in outlier detection tasks in real estate data. The best among all considered options is the
combination of the Feature Bagging method and the optimized basic detector CBLOF. At the same
time, high metrics (AUC-ROC=0.886, P@n=0.660) and optimal running time (Time=83 ms) are
provided. Slightly lower, but quite close indicators are given by the combination of the Feature
Bagging method in a pair with the optimized basic detector KNN (AUC-ROC=0.889, P@n=0.632),
but this pair works faster than its predecessor (Time=54 ms). The optimized INNE algorithm is also
not much inferior in metrics (AUC-ROC=0.883, P@n=0.615), but is the slowest (Time=142 ms).
Regarding the LODA algorithm, for which hyperparameter optimization was also performed, at the
highest metric AUC-ROC=0.907, the P@n = 0.564 indicator is the lowest. This is visually confirmed
by its graphs, which show that the separation of objects and normal and abnormal is less clearly
expressed in comparison with other described algorithms.</p>
      <p>Further research on the topic of this work can be aimed at analyzing the effectiveness of other
ensemble techniques and algorithms for outlier detection, including their combinations and
hyperparameter optimization.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <sec id="sec-5-1">
        <title>The authors have not employed any Generative AI tools.</title>
        <p>[15] T. Bandaragoda, K. Ting, D. Albrecht, F. Liu, Y. Zhu, J. Wells, Isolation-based anomaly
detection using nearest-neighbor ensembles, Computational Intelligence 34(3) (2018) 968–998.
doi:10.1111/coin.12156.
[16] T. Pevný, Loda: Lightweight on-line detector of anomalies, Machine Learning 102 (2016) 275–
304. doi:10.1007/s10994-015-5521-0.
[17] F. Liu, K. Ting, Z. Zhou, Isolation Forest, in: Eighth IEEE International Conference on Data</p>
        <p>Mining (ICDM '08), 2008, pp. 413–422. doi:10.1109/ICDM.2008.17.
[18] A. Lazarevic, V. Kumar, Feature Bagging for Outlier Detection, in: International conference on
Knowledge discovery in data mining (KDD '05), 2005, pp. 157–166.
doi:10.1145/1081870.1081891.
[19] D. Hosmer, S. Lemeshow, R. Sturdivant, Applied Logistic Regression, 2nd. ed., Chapter 5 (2013)
173-182.
[20] L. Aleksina, А. Bondarchuk, Hyperparameters optimization for the machine learning,</p>
        <p>Connectivity 2 (2024) 18–22. doi:10.31673/2412-9070.2024.021822.
[21] Optuna - A hyperparameter optimization framework, Optuna, 2025. URL: https://optuna.org.
[22] K. Pearson, LIII. On lines and planes of closest fit to systems of points in space, The London,
Edinburgh and Dublin Philosophical Magazine and Journal of Science 2(11) (1901) 559–572.
doi:10.1080/14786440109462720.
[23] L. Maaten, G. Hinton, Visualizing Data using t-SNE, Journal of Machine Learning Research
9(86) (2008) 2579–2605.</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>L.</given-names>
            <surname>Rokach</surname>
          </string-name>
          ,
          <article-title>Ensemble Learning: Pattern Classification Using Ensemble Methods</article-title>
          , 2nd. ed., World Scientific (
          <year>2019</year>
          ). doi:
          <volume>10</volume>
          .1142/11325.
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khomyshyn</surname>
          </string-name>
          ,
          <article-title>Using ensemble methods of machine learning to predict real estate prices</article-title>
          ,
          <source>in: ITTAP'2024: 4th International Workshop on Information Technologies: Theoretical and Applied Problems</source>
          ,
          <year>2024</year>
          , pp.
          <fpage>438</fpage>
          -
          <lpage>447</lpage>
          . URL: https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3896</volume>
          /paper26.pdf.
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          , Outlier Ensembles,
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>14</volume>
          (
          <issue>2</issue>
          ) (
          <year>2012</year>
          )
          <fpage>49</fpage>
          -
          <lpage>58</lpage>
          . doi:
          <volume>10</volume>
          .1145/2481244.2481252.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Aggarwal</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Sathe</surname>
          </string-name>
          ,
          <article-title>Theoretical Foundations and Algorithms for Outlier Ensembles</article-title>
          ,
          <source>ACM SIGKDD Explorations Newsletter</source>
          <volume>17</volume>
          (
          <issue>1</issue>
          ) (
          <year>2015</year>
          )
          <fpage>24</fpage>
          -
          <lpage>47</lpage>
          . doi:
          <volume>10</volume>
          .1145/2830544.2830549.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <surname>L. Zhang,</surname>
          </string-name>
          <article-title>An Ensemble Outlier Detection Method Based on Information EntropyWeighted Subspaces for High-Dimensional, Data</article-title>
          .
          <source>Entropy</source>
          <volume>25</volume>
          (
          <issue>8</issue>
          ) (
          <year>2023</year>
          ). doi:
          <volume>10</volume>
          .3390/e25081185.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>N.</given-names>
            <surname>Reunanen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Räty</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Lintonen</surname>
          </string-name>
          ,
          <article-title>Automatic optimization of outlier detection ensembles using a limited number of outlier examples</article-title>
          ,
          <source>International Journal of Data Science and Analytics</source>
          <volume>10</volume>
          (
          <year>2020</year>
          )
          <fpage>377</fpage>
          -
          <lpage>394</lpage>
          . doi:
          <volume>10</volume>
          .1007/s41060-020-00222-4.
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>H.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Deng</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Fu</surname>
          </string-name>
          , Outlier Detection via Sampling Ensemble,
          <source>in: International Conference on Big Data (Big Data)</source>
          ,
          <year>2016</year>
          . doi:
          <volume>10</volume>
          .1109/BigData.
          <year>2016</year>
          .
          <volume>7840665</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Nasrullah</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Hryniewicki</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <article-title>LSCP: Locally Selective Combination in Parallel Outlier Ensembles</article-title>
          ,
          <source>in: Proceedings of the 2019 SIAM International Conference on Data Mining</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>585</fpage>
          -
          <lpage>593</lpage>
          . doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1812</year>
          .
          <volume>01528</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhao</surname>
          </string-name>
          ,
          <string-name>
            <surname>M.</surname>
          </string-name>
          <article-title>Hryniewicki, DCSO: Dynamic Combination of Detector Scores for Outlier Ensembles</article-title>
          .,
          <source>in: ACM SIGKDD Conference on Knowledge Discovery and Data Mining (KDD)</source>
          ,
          <string-name>
            <surname>Outlier Detection</surname>
          </string-name>
          De-constructed Workshop,
          <year>2018</year>
          . doi:
          <volume>10</volume>
          .48550/arXiv.
          <year>1911</year>
          .
          <volume>10418</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>S.</given-names>
            <surname>Alexandropoulos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Kotsiantis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Piperigou</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vrahatis</surname>
          </string-name>
          ,
          <article-title>A new ensemble method for outlier identification</article-title>
          ,
          <source>in: 10th International Conference on Cloud Computing, Data Science &amp; Engineering</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>786</fpage>
          -
          <lpage>791</lpage>
          . doi:
          <volume>10</volume>
          .1109/Confluence47617.
          <year>2020</year>
          .
          <volume>9058219</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>S.</given-names>
            <surname>Rayana</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Zhong</surname>
          </string-name>
          , L. Akoglu,
          <article-title>Sequential Ensemble Learning for Outlier Detection. A BiasVariance Perspective</article-title>
          ,
          <source>in: IEEE 16th International Conference on Data Mining (ICDM-2016)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>1167</fpage>
          -
          <lpage>1172</lpage>
          . doi:
          <volume>10</volume>
          .1109/ICDM.
          <year>2016</year>
          .
          <volume>0154</volume>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <article-title>Software for real estate agencies "MyHome", Real estate agency "</article-title>
          <source>Leader" Ternopil</source>
          ,
          <year>2015</year>
          . URL: https://lider.org.ua/myhome.aspx.
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <article-title>Sale of apartments and rooms in Ternopil and Ternopil district, Real estate agency "</article-title>
          <source>Leader" Ternopil</source>
          ,
          <year>2025</year>
          . URL: https://lider.org.ua/base.aspx?t=kva.
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>O.</given-names>
            <surname>Pastukh</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Khomyshyn</surname>
          </string-name>
          .
          <article-title>Efficiency research of cluster analysis methods for detecting outliers in real estate market</article-title>
          , Herald of Khmelnytskyi National University,
          <source>Technical sciences 3(1)</source>
          (
          <year>2025</year>
          )
          <fpage>362</fpage>
          -
          <lpage>381</lpage>
          . doi:
          <volume>10</volume>
          .31891/
          <fpage>2307</fpage>
          -5732-2025-351-45.
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>