=Paper=
{{Paper
|id=Vol-3934/paper1
|storemode=property
|title=Exploiting Outlier Explanation to Unveil Key-aspects of High Green Comparative Advantage Nations
|pdfUrl=https://ceur-ws.org/Vol-3934/paper1.pdf
|volume=Vol-3934
|authors=Fabrizio Angiulli,Fabio Fassetti,Simona Nisticò,Luigi Palopoli
|dblpUrl=https://dblp.org/rec/conf/greenai/AngiulliFN024
}}
==Exploiting Outlier Explanation to Unveil Key-aspects of High Green Comparative Advantage Nations==
Exploiting Outlier Explanation to Unveil Key-aspects of
High Green Comparative Advantage Nations
Fabrizio Angiulli1 , Fabio Fassetti1 , Simona Nisticò1,∗ and Luigi Palopoli1
1
DIMES, University of Calabria, via Pietro Bucci, Rende (CS), 87036, Italy
Abstract
Climate change is observable in the drastic modification of world ecosystems and weather patterns. The potential
effects of this phenomenon make the research of successful strategies to delimit the problem an absolute priority.
Objectives 7, 12 and 13 of the United Nations’ Agenda 2030 [1] are only some examples of the importance of this
problem on a worldwide scale. The development and diffusion of low-carbon technologies are among the key
points in politics against climate change due to the massive impact human activities have on carbon emissions.
Deep Learning techniques, currently widely used in many aspects of everyday life, can also help in this
field. This work, in particular, aims to demonstrate the effectiveness of M2 OE, a transformation-based outlier
explanation technique, in extracting actionable explanations in the green economy context. Specifically, we
analyze the Low Carbon Technologies Comparative Advantage, an index measuring the relative economic
advantage in developing low carbon technologies, by looking at the nations exhibiting a high comparative
advantage to qualitatively evaluate the insights the method provides to the user.
To this aim, we have gathered data concerning 7 indicators related to the comparative advantage of low-carbon
technologies in the 2019-2021 time period. This data extraction work has resulted in the Green Comparative
Advantage (GreenCA) tabular data set, in which the information retrieved for the reference time horizon is
organized and summarized. By a set of experiments exploiting this data collection together with the M2 OE
method, we catch a glimpse to gain interesting insights about which politics are successful in promoting a change
in favour of green energies.
Keywords
Outlier Explanation, Green Economy, Low Carbon Technologies Comparative Advantage
1. Introduction
The worrying frequency of extraordinary natural events is making evident the climate change problem,
which is dramatically marking the planet’s equilibrium and our ecosystems and is affecting human
life [2]. Timely actions and a change in lifestyle and environmental politics are required to mitigate
the effect of a problem primarily caused by human activities. Thanks to its power to push technology
and economics forward to more sustainable models, politics has a main role in attenuating the climate
change issue [3]. Fortunately, the undeniable evidence of the above-introduced problem has inducted
world countries to commit their efforts to discuss effective strategies to promote policy, technologies
and behaviours tailored to reduce the CO2 emissions, that are causing this phenomenon. The annual
United Nations Climate Change Conference and the Agenda 2030 7, 12 and 13 goals focused respectively
on affordable green energy, responsible consumption and production and climate changes [1], testify a
spread in the interest of world politics in discussing environmental subjects and in the green economy,
thus, considering both the economical and the sustainability aspects.
1st Workshop on Green-Aware Artificial Intelligence, 23rd International Conference of the Italian Association for Artificial
Intelligence (AIxIA 2024), November 25–28, 2024, Bolzano, Italy
∗
Corresponding author.
Authors contribution: F.A., F.F..: Conceptualization, Investigation, Methodology, Validation, Writing – Review & Editing; S.N.:
Conceptualization, Investigation, Methodology, Software, Validation, Data curation, Writing – Original Draft Preparation; L.P.:
Validation, Writing – Review & Editing.
Envelope-Open f.angiulli@unical.it (F. Angiulli); f.fassetti@unical.it (F. Fassetti); s.nistico@unical.it (S. Nisticò); l.palopoli@unical.it
(L. Palopoli)
Orcid 0000-0002-9860-7569 (F. Angiulli); 0000-0002-8416-906X (F. Fassetti); 0000-0002-7386-2512 (S. Nisticò);
0000-0003-4915-5137 (L. Palopoli)
© 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
CEUR
ceur-ws.org
Workshop ISSN 1613-0073
Proceedings
As a consequence of socioeconomic, geography and morphology differences, each county’s gov-
ernment implements different policies to deal with the exigence of a more sustainable lifestyle and
economy. Unfortunately, not all the strategies adopted are equally effective in reaching the goal of
promoting the reduction of CO2 emissions. In this regard, always looking at the green economy matter,
decision-makers could be facilitated by having techniques providing insights about the aspect charac-
terizing the policies of countries attaining sustainability goals, since it potentially provides them with
the instruments to make more informed explanations.
In this paper, we aim to witness the effectiveness of M2 OE [4, 5], a transformation-based Outlier
Explanation technique, in gaining actionable explanations in the green economy context. In particular,
we qualitatively analyze the actionability of explanations related to the Low Carbon Technologies
Comparative Advantage (hereafter referred to as Green Comparative Advantage), an index measuring
the relative advantage of a country in producing low-carbon technologies. We have chosen this index
for its potential to promote investments in technologies reducing the environmental impact and thus to
test the effectiveness of the M2 OE explanations in a relevant real-world scenario.
To this aim, we have collected and arranged the data shared by the International Monetary Fund
containing information relating to environmental taxes, investment in environmental protection, fossil
fuel subsidies, energy (both renewable and non-renewable), forests and trade in low-carbon technologies
products. Our efforts resulted in the Green Comparative Advantage (GreenCA) dataset providing
in a tabular shape, a rich and, hopefully, complete overview of the policies applied and peculiarities of
a set of countries. More specifically we have collected information for 54 countries, more details are
going to be provided in section 2. The number of samples skewed toward countries with a low green
comparative advantage makes it suitable to be analyzed in an outlier explanation setting by considering
the less numerous class of countries with high comparative advantage as outliers.
Given reference data considered as “normal” and one or more outlying samples, the goal of the
previously referred Outlier Explanation task is figuring out the aspects characterizing point outlierness
and, thus what makes the analyzed sample or groups behave differently from the rest of the data. Two
are the most diffused ways to approach this problem. The first of them shapes the considered problem as
the search of the set of features characterizing the outlying samples or associating a score to each feature
by, for example, using separability as a quality criterion [6], finding invertible projections that make the
outlying sample better recognisable and then obtaining the features contributing to this mapping [7], or
by leveraging features selection methods equipping them with properly-designed sampling methods to
deal with extremely unbalanced data [8]. Other approaches perform the selection through outlierness
metrics based on the relative frequency of value combinations, applied to single outliers in datasets
having categorical [9] or continuous features [10], or groups of anomalies [11]. Alternatively, find
exceptional values by estimating the distribution of the value frequencies [12]. In the second instead,
the features are ranked and, for each of them a score is provided, by using scoring criteria ranging from
the distance between the studied sample and its k-nearest neighbours [13], measures extracted through
kernel density estimation [14] or other criteria studied to be dimensional unbiased, thus not dependent
from the number of features of the sample [15].
Beyond this information, the here-considered M2 OE technique [4, 5] is tailored to extract richer
insights by considering transformation-based explanations [16], whose goal is to find a group of features
to change and for each of them a value describing how to change the value of that feature. It follows
that it potentially gives actionable explanations guiding decision-makers to change the observed status
making appropriate changes to the features available, which are particularly useful in real-world
contexts like that considered in this paper.
The rest of the paper is structured as follows. Section 2 presents the GreenCA dataset by describing the
data collection, the building process and the information included, Section 3 presents the M2 OE Outlier
Explanation method, Section 4 qualitatively evaluates the actionability of the collected explanations
and, finally, Section 5 draw the conclusion of this work.
2. The Green Comparative Advantage dataset
As already stated in this paper’s introduction, designing effective strategies to promote green and
sustainable technologies is crucial to follow the right path to have a less impacting society and lifestyle
and to try to remedy the negative effects observed as a consequence of global pollution. To gain
insights into the most effective way to make the development and adoption of low-carbon technologies
advantageously from an economic perspective, thus, hopefully, to incentive this category of technologies,
we want to study which are the differences characterizing countries having a high comparative advantage
from green technologies in comparison with the others.
To look out at the insights given by M2 OE on what characterizes counties exhibiting a high compara-
tive advantage with the purpose of checking their usefulness, data for 6 mitigation indicator groups has
been collected from the International Monetary Fund Climate Dashboard, which includes information
about national policies to contain and reduce carbon emissions, shaped as time series. Deeper into
details, the indicator groups considered are the following:
• Environmental Taxes (ET) [17]: Charges levied by measuring, through a physical unity or
some proxy criterion, something that is proven to harm the environment.
• Environmental Protection Expenditures (EP) [18]: Money amount invested in environmental
protection activities like, for example, waste management, pollution abatement and biodiversity
and landscape protection.
• Forest and Carbon (FC) [19]: Data about Forest Extends and Carbon stored by forests providing
a high-level summary of the forest state in each country.
• Fossil Fuel Subsidies (FF) [20]: Estimated values of explicit and implicit government subsidies.
• Renewable Energy (RE) [21]: Information about electricity generation and electricity installed
capacity, where energy is classified as renewable or non-renewable.
• Trade in Low-carbon Technology Products (TT) [22]: Data about trade in low carbon tech-
nology product.
The original data sources for each indicator group consist of sets of time series, each of them taking
care of reporting the features for one nation and one kind of measure. In the following, we describe the
process performed to extract analyzed data. Table 1 reports an overview of the features included in the
here presented data collection.
2.1. Data collection methodology
The GreenCA dataset presented in this work comes as the upshot of a data collection and summarization
process. Indeed, the International Monetary Fund Climate Dashboard provides users with varied, and,
sometimes, redundant data, to make them usable for different kinds of analytics. The objective is to
rationalize data through reshaping and filtering operations to obtain a two-class tabular dataset.
Data is divided using the Green Comparative Advantage as a discriminant feature. This value
represents the economic advantage over the other nations in exporting low-carbon technologies, which
consists of all the technological products tailored to reduce the impact of human activities on the
environment. Surveyed nations are distinguished between those exhibiting a high Green Comparative
Advantage, thus that have a value greater than 1 for this index, to which we assign the target label 1,
and those instead having a value lower than 1, to which the label 0 is assigned.
To avoid scale problems, when more than one unit is available for the considered indicator and
when applicable, we consider only information from records measuring the analyzed indicator as a
percentage.
We selected the more recent three-year period satisfying data availability, so, in the presented dataset,
we chose the 2019-2021 years as the target period for our analysis. However, the described data
processing procedure can be applied to updated data by considering a different time horizon to obtain
an up-to-date version of this dataset. To summarize the information relating to the considered period
ID Description Unit Indicator Group
ET_0 Environmental Taxes Percent of GDP
ET_1 Taxes on Energy Percent of GDP
ET_2 Taxes on Pollution Percent of GDP Enviromental Taxes
ET_3 Taxes on Resources Percent of GDP
ET_4 Taxes on Transport Percent of GDP
EP_0 Expenditure on biodiversity & landscape protection Percent of GDP
EP_1 Expenditure on environment protection Percent of GDP
EP_2 Expenditure on environmental protection n.e.c. Percent of GDP
Environmental
EP_3 Expenditure on environmental protection R&D Percent of GDP
Protection Expenditures
EP_4 Expenditure on pollution abatement Percent of GDP
EP_5 Expenditure on waste management Percent of GDP
EP_6 Expenditure on waste of water management Percent of GDP
FC_0 Carbon stocks in forests Million tonnes
FC_1 Forest area 1000 HA
FC_2 Index of carbon stocks in forests Index
Forest and Carbon
FC_3 Index of forest extent Index
FC_4 Land area 1000 HA
FC_5 Share of forest area Percent
FF_0 Implicit Fossil Fuel Subsidies Percent of GDP
FF_1 Explicit Fossil Fuel Subsidies Percent of GDP Fossil Fuel Subsidies
FF_2 Total Fossil Fuel Subsidies Percent of GDP
RE_0_0 Renewable Electricity Generation Gigawatt-hours (GWh)
RE_0_1 Non-Renewable Electricity Generation Gigawatt-hours (GWh)
Renewable Energy
RE_1_0 Renewable Electricity Installed Capacity Megawatt (MW)
RE_1_1 Non-Renewable Electricity Installed Capacity Megawatt (MW)
TT_0 Trade balance in low carbon technology products Percent Trade in Low-carbon
TT_1 Total trade in low carbon technology products Percent Technology Products
Table 1
GreenCA dataset features overview
we apply, for each indicator, the mean operation to the values for the considered years. To prevent
working with null data, we drop from the 104 registered nations those with at least one unspecified
value. The above-described data extraction procedure resulted in a dataset containing information from
54 countries, among which 16 have a high comparative advantage and the remaining a low comparative
advantage. Table 1 lists the set of features available, where a description, the unity of the measurement
and the connected thematic area are reported for each feature.
Since there are only a few outstanding countries in which low-carbon technologies are economically
advantageous, the dataset presented in this work can be looked at as an outlier detection dataset, in
which the minority class can be seen as anomalous. The GreenCA dataset presented in this section can
be found at the following link https://www.kaggle.com/datasets/simonanistico/greenca.
3. M2 OE
Masking Models for Outlier Explanation, shortly M2 OE, tackles the Outlier Explanation problem
by providing the user with transformation-based explanations, describing the outlier peculiarities
by suggesting alterations that, applied to the analyzed outlier, makes it behaving similar to normal
samples. More in detail, given an object 𝑜 ∈ 𝐷𝑆, the explanation consists of a set 𝑒 of feature-value
pairs {(𝑓𝑖1 , 𝑣𝑖1 ), … , (𝑓𝑖𝑘 , 𝑣𝑖𝑘 )} codifying a transformation 𝑡𝑒 (𝑜) of 𝑜 resulting in a new object 𝑜 ′ such that
𝑜 ′ [𝑓𝑗 ] = 𝑜[𝑓𝑗 ] + 𝑣𝑗 for 𝑗 ∈ {𝑖1 , … , 𝑖𝑘 } and 𝑜 ′ [𝑓𝑗 ] = 𝑜[𝑓𝑗 ] otherwise. The just-described transformation
potentially represents an actionable explanation providing users with insights on how to change the
features of the outlier to make it act as a normal sample.
The pipeline proposed to find the previously introduced explanation form is depicted by Figure 1. As
Figure 1: M2 OE pipeline
reported in the figure, to compute an explanation, M2 OE takes the outlier 𝑜 and a dataset 𝐷𝑆 of normal
samples as input. The pipeline input is first given to the Reference Set Generator, devoted to finding
a reference set 𝑅𝑆 for 𝑜, which is a subset of 𝐷𝑆 consisting of 𝑘 samples selected according to some
criterion to act as a prototype of the normality concept to which 𝑜 should conform to. The reference set
𝑅𝑆 is then given, together with the outlier 𝑜 to the Training Set Generator module that is in charge of
building the training set 𝑇 𝑆 leveraged to the subsequent module consisting of a set {⟨𝑜, 𝑟⟩ ∶ 𝑟 ∈ 𝑅𝑆} of 𝑘
tuples containing the outlier 𝑜 and one of the reference set samples. The so-obtained training set 𝑇 𝑆
is then used by the Generative Neural Module, representing the core of the pipeline and composed of
two neural networks: the Choice Generator charged of finding the set of features to modify, codified by
the choice binary vector 𝑐 having one component for each feature and values equal to 1 only for the
features involved in the transformation, and the Mask Generator module answering for figuring out
how to change each of the features pointed out by the Choice generation network. The Mask Generator
codifies the alteration to apply as a real values vector that, coherently to the Choice Generator, has one
component for each feature and shows not-null components only for indexes corresponding to features
to change. Finally, after the training phase of the neural module is completed, the 𝑘 choice and mask
couples, given as output by the Generative Neural module are provided to the Explanation Generator
module devoted to combining the collected information to build a set of minimal disjoint explanations
for the outlier 𝑜.
To compute this set of explanations, we collect the set 𝐶 consisting of all the choices associated with
objects of the reference set 𝑅𝑆 and find the frequent itemsets 𝐹𝐶 (in our context each feature represent
an item). Then, for each of the frequent itemsets found (representing a frequent choice) 𝑓 in 𝐹𝐶 we
apply a clustering algorithm to the reference set samples whose corresponding choice contains 𝑓, in
particular, in this work, we have leveraged DBSCAN [23]. After this clustering step, we take for each
cluster its medoid as a representative point used to find a mask related to that set of samples, which,
together with 𝑓, is one of the explanations provided to the user.
3.1. Explanation computation
Both the previously referred Choice Generator and Mask Applier networks consist of feed-forward
dense neural networks having 𝑙𝑔 ≥ 3 layers having a number 𝑛𝑔1 ⋅ 𝑑 (𝑛𝑔1 ≥ 3) of neurons. The layers of
the latter of the two modules have linear activation functions, while, for the first neural network, the
hidden layers are equipped with a ReLU activation function and the output layer with a sigmoid. This
results in returning the 𝑑-dimensional real-valued choice vector 𝑐,̃ having values 𝑐𝑖̃ ∈ [0, 1] which are
eventually converted into a binary format 𝑐𝑖 ∈ {0, 1} through a thresholding operation.
To carry out the training, M2 OE has to compute a statistic vector 𝑠 on 𝑅𝑆, whose 𝑖-th feature (1 ≤ 𝑖 ≤ 𝑑)
is the mean feature-wise squared differences between normal points:
2
𝑠𝑖 = ∑ (𝑟 − 𝑟 ′ )2 .
𝑘(𝑘 − 1) 𝑟,𝑟 ′ ∈𝑅𝑆 𝑖 𝑖
Given this vector, the outlier 𝑜 and the reference sample 𝑟, the loss function leading the neural networks
training is the following:
𝑑 𝑑
∑𝑖=1 𝑠𝑖 ⋅ 𝑐𝑖̃
ℒ (𝑜, 𝑟) = 𝛼1 ⋅ + 𝛼2 ⋅ ∑ [(𝑜𝑖̃′ − 𝑟𝑖 )2 ⋅ 𝑐𝑖̃ ] + 𝛼3 ⋅ ||𝑐||̃ 2 (1)
𝑑
[∑𝑖=1 (𝑜𝑖 − 𝑟𝑖 )2 ⋅ 𝑐𝑖̃ ] + 𝜖 𝑖=1
in which 𝛼1 , 𝛼2 and 𝛼3 are values used to weigh the contributions of the three terms appearing in the
loss function, 𝜖 is a small constant to avoid division by zero and 𝑜̃′ is the sample resulting from the
transformation application. The three-fold objective of this loss is to find the subspaces in which the
outlier deviates most from the normal samples (first term) while making the transformed outlier 𝑜̃′ as
similar as possible to the normal samples (second term) and keeping the number of features included in
the explanation low (last term).
4. Case of study
In our previous works [4, 5], the performances of M2 OE have been thoroughly discussed. In particular,
it has been shown that, despite not being specifically tailored to search for subspace-based explanations,
the quality of the set of features included in the transformation is in almost all cases better than those
of the competitors involved in our experiments, namely ATOM and COIN, or at least comparable.
Furthermore, we have shown that the transformations provided as an explanation, a novelty in the
outlier explanation field, can get the outlier closer to behaving as a normal sample. In this section, our
efforts are focused on observing the explanations provided by M2 OE to check the insights they offer
and their actionability. To carry out this experiment we have used the dataset described in Section
2, which is presented for the first time in this work. In our Outlier Explanation setting, countries
showing a high Green Comparative Advantage are the outlier samples whose behaviour is under
study. So, to summarize, we consider 16 countries exhibiting a high relative advantage in exporting
low-carbon technologies in which an economic boost potentially pushes forward the development of
these low-impacting technologies.
For this analysis, the M2 OE method is set up as follows. Due to the small number of samples available,
we consider as a reference dataset all the samples not belonging to the studied group, so 38 countries.
The neural modules being part of the Generative Neural Module are trained for 30 epochs, with loss
weights equal to 1.0, 1.2 and 0.3 respectively, and a learning rate equal to 0.001.
The results of the explanation for the considered countries are depicted in Figure 2, where, for
each of them, the characterizing features are listed by showing the alteration suggested to make the
comparative advantage of that nation low. To improve the delivery of the explanations, we present
the transformation values as percentages of the original values of the features. In the following, we
summarize the findings extracted by M2 OE, however, the explanations reported in the figure supply
detailed explanations showing also how to change pointed-out features expressed as a percentage of
the original feature value.
• According to the explanations provided by M2 OE, Austria’s comparative advantage is due to
its taxes on transport (Figure 2a), indeed decreasing them by about 30% causes a loss of this
behoof. Taxes bear a high level for this index also for the Republic of Croatia (Figure 2b), Hungary
(Figure 2g) and Slovak Republic (Figure 2n). Indeed, according to the results of the methodology
considered in this work, a decrease in Taxes on Energy and Environmental taxes for the first, and
taxes on pollution for the last two would make them have a low value for the considered index.
• Another group of countries stand out in terms of comparative advantage due to investments
related to the environment. More in detail, the Czech Republic (Figure 2c) is characterized by
expenditure on biodiversity and landscape protection, Estonia (Figure 2e) on environmental
protection R&D, Italy (Figure 2i) on waste management and environmental protection R&D,
North Macedonia (Figure 2k) on environmental protection for unclassified aspects and, finally,
United Kingdom (Figure 2p) on waste management.
• Another frequent pattern is that in which the policies are supposed to combine expenditures and
targeted taxes. According to the information we have retrieved, it happens in Denmark (Figure
Austria Croatia, Rep. of
Taxes on Energy
Taxes on Transport
Environmental Taxes
-80 -60 -40 -20 0 -50 -40 -30 -20 -10 0
% w.r.t original feature value % w.r.t original feature value
(a) (b)
Czech Rep. Denmark
Expenditure on Taxes on Transport
biodiversity & Expenditure on
landscape protection biodiversity &
landscape protection
-70 -60 -50 -40 -30 -20 -10 0 -80 -60 -40 -20 0
% w.r.t original feature value % w.r.t original feature value
(c) (d)
Estonia, Rep. of Finland
Expenditure on
environmental Share of forest area
protection R&D
-80 -60 -40 -20 0 -60 -50 -40 -30 -20 -10 0
% w.r.t original feature value % w.r.t original feature value
(e) (f)
Hungary Israel
Taxes on Pollution Taxes on Transport
Expenditure on waste Expenditure on waste
management management
-100 -80 -60 -40 -20 0 -100 -80 -60 -40 -20 0
% w.r.t original feature value % w.r.t original feature value
(g) (h)
Italy Japan
Expenditure on Non-Renewable
environmental Electricity
protection R&D Installed Capacity
-80 -60 -40 -20 0 -100 -80 -60 -40 -20 0
% w.r.t original feature value % w.r.t original feature value
(i) (j)
North Macedonia, Republic of Panama
Expenditure on Share of forest area
environmental
protection n.e.c. Environmental Taxes
-100 -80 -60 -40 -20 0 -50 0 50 100 150
% w.r.t original feature value % w.r.t original feature value
(k) (l)
Romania Slovak Rep.
Index of carbon
Taxes on Pollution
stocks in forests
-50 -40 -30 -20 -10 0 -80 -60 -40 -20 0
% w.r.t original feature value % w.r.t original feature value
(m) (n)
Sweden United Kingdom
Share of forest area Expenditure on waste
management
-50 -40 -30 -20 -10 0 -100 -80 -60 -40 -20 0
% w.r.t original feature value % w.r.t original feature value
(o) (p)
Figure 2: M2 OE’s explanations for the 16 countries having a green comparative advantage.
2d) whose policy is characterized by a mix of taxes on transport and investment in biodiversity
and landscape protection and in Israel (Figure 2h) where investments on waste management are
combined with taxes on transport.
• In other cases, the pivotal feature for comparative advantage relates to geographical aspects like
the share of forest area for Finland and Sweden (Figures 2f and 2o respectively) and the index of
carbon stocks in forests for Romania (Figure 2m). In both cases, a reduction is supposed to cause
a low comparative advantage.
• Finally, the remaining two nations, namely Japan (Figure 2j) and Panama (Figure 2l), exhibit an
explanation differing from the previously described patterns. As for the first of them, M2 OE says
that to reduce its comparative advantage a decrease in non-renewable electricity generation is
needed, while, as for the second, the proposed method’s suggestion is to reduce the forest area a
little and increase environmental taxes heavily.
Since the object of our analysis is also to assess the ability of M2 OE to effectively unveil the key
aspects of countries of high comparative advantage, it is useful to recall that explanations need to be
read in such a way: the features included in the explanation are the important aspects and features
positively impact the considered index if the transformation suggests lowering their value and negatively
otherwise.
The previously performed analysis testifies that the suggestions from M2 OE’s explanations can be
translated into actions that decision-makers can perform.
To further confirm the quality of the observed results, we have measured the outlierness score of the
analyzed samples in the space given by the features included in the explanation. More in detail, we have
computed the mean value of the outlierness score on the original outliers and the samples resulting
from the transformation. Moreover, we have measured the fraction of correctly patched samples, that
is to say, the portion of outliers for which the returned transformation has lowered their anomaly
score of at least 5%. The outlierness score involved in our analysis is the iForest score [24], based on
the Isolation Forest anomaly detection method, whose underlying idea is that the anomalies are few
and isolated from the normal samples. This score has been chosen for its dimension unbiasedness,
which makes comparable even explanations with different numbers of features. The outliers in the
set of features provided by the explanations show a mean outlierness of 0.65, which is consistently
higher than that shown by the full feature space equal to 0.42. Instead, the samples resulting from the
transformation exhibit an outlierness score of 0.45, which is substantially lower than that of the outliers,
indeed, the 96% of the samples has been correctly transformed. This further confirms our conviction on
the actionability of the explanations provided by M2 OE in the considered context, also witnessed by
the previous qualitative evaluation.
5. Conclusion
In this paper, we have analyzed M2 OE, a transformation-based outlier explanation method, in the green
economy context to study its effectiveness in extracting actionable explanations.
To analyze this context, we have designed a tabular dataset, named the Green Comparative Advantage
(GreenCA) dataset, representing one of the contributions of this paper. This data collection summarizes
and reshapes information accessed from the International Monetary Fund about many indicators relative
to the Green Comparative Advantage, which is an index measuring the relative benefit of exporting
low-carbon technologies.
Inspecting the explanations provided for 16 nations exhibiting a high comparative advantage, we
have seen how the transformations provided by M2 OE as explanations can be considered actionable
since they provide useful suggestions showing how to make that countries have a low comparative
advantage by acting on the policies or natural aspects like the share of forests. This information
is useful in a complementary analysis to unveil the aspects characterizing countries having a high
comparative advantage. The quality of the observed explanations has also been validated through a
numerical analysis. In future work, we plan to deepen our analysis by considering social, economic and
environmental diversity to observe how they influence the Green Comparative Advantage.
Acknowledgments
We acknowledge the support of the PNRR project FAIR - Future AI Research (PE00000013), Spoke 9 -
Green-aware AI, under the NRRP MUR program funded by the NextGenerationEU.
References
[1] D. o. E. United Nations, S. A. S. Development, Transforming our world: the 2030 agenda for
sustainable development, 2015. URL: https://sdgs.un.org/2030agenda.
[2] K. Abbass, M. Z. Qasim, H. Song, M. Murshed, H. Mahmood, I. Younis, A review of the global
climate change impacts, adaptation, and sustainable mitigation measures, Environmental Science
and Pollution Research 29 (2022) 42539–42559.
[3] F. C. Moore, K. Lacasse, K. J. Mach, Y. A. Shin, L. J. Gross, B. Beckage, Determinants of emissions
pathways in the coupled climate–social system, Nature 603 (2022) 103–111.
[4] F. Angiulli, F. Fassetti, S. Nisticó, L. Palopoli, Counterfactuals explanations for outliers via subspaces
density contrastive loss, in: International Conference on Discovery Science, Springer, 2023, pp.
159–173.
[5] F. Angiulli, F. Fassetti, S. Nisticò, L. Palopoli, Explaining outliers and anomalous groups via
subspace density contrastive loss, Machine Learning (2024) 1–25.
[6] B. Micenková, R. T. Ng, X.-H. Dang, I. Assent, Explaining outliers by subspace separability, in:
2013 IEEE 13th international conference on data mining, IEEE, 2013, pp. 518–527.
[7] X. H. Dang, I. Assent, R. T. Ng, A. Zimek, E. Schubert, Discriminative features for identifying and
interpreting outliers, in: 2014 IEEE 30th international conference on data engineering, IEEE, 2014,
pp. 88–99.
[8] T. Mokoena, T. Celik, V. Marivate, Why is this an anomaly? explaining anomalies using sequential
explanations, Pattern Recognition 121 (2022) 108227.
[9] F. Angiulli, F. Fassetti, L. Palopoli, Detecting outlying properties of exceptional objects, Acm
transactions on database systems (tods) 34 (2009) 1–62.
[10] F. Angiulli, F. Fassetti, G. Manco, L. Palopoli, Outlying property detection with numerical attributes,
Data mining and knowledge discovery 31 (2017) 134–163.
[11] F. Angiulli, F. Fassetti, L. Palopoli, Discovering characterizations of the behavior of anomalous
subpopulations, IEEE Transactions on knowledge and data engineering 25 (2012) 1280–1292.
[12] F. Angiulli, F. Fassetti, L. Palopoli, C. Serrao, A density estimation approach for detecting and
explaining exceptional values in categorical data, Applied Intelligence 52 (2022) 17534–17556.
[13] J. Zhang, M. Lou, T. W. Ling, H. Wang, Hos-miner: A system for detecting outlying subspaces of
high-dimensional data, in: Proceedings of the 30th International Conference on Very Large Data
Bases (VLDB’04), Morgan Kaufmann Publishers Inc., 2004, pp. 1265–1268.
[14] L. Duan, G. Tang, J. Pei, J. Bailey, A. Campbell, C. Tang, Mining outlying aspects on numeric data,
Data Mining and Knowledge Discovery 29 (2015) 1116–1151.
[15] N. X. Vinh, J. Chan, S. Romano, J. Bailey, C. Leckie, K. Ramamohanarao, J. Pei, Discovering outlying
aspects in large datasets, Data mining and knowledge discovery 30 (2016) 1520–1555.
[16] F. Angiulli, F. Fassetti, S. Nisticò, L. Palopoli, Outlier explanation through masking models, in:
European Conference on Advances in Databases and Information Systems, Springer, 2022, pp.
392–406.
[17] International Monetary Fund, Climate change indicators dashboard. environmental taxes, https:
//climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[18] International Monetary Fund, Climate change indicators dashboard. environmental protection
expenditures, https://climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[19] International Monetary Fund, Climate change indicators dashboard. forest and carbon, https:
//climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[20] International Monetary Fund, Climate change indicators dashboard. fossil fuel subsidies, https:
//climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[21] International Monetary Fund, Climate change indicators dashboard. renewable energy, https:
//climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[22] International Monetary Fund, Climate change indicators dashboard. trade in low-carbon technol-
ogy products, https://climatedata.imf.org/pages/access-data, 2022. Accessed on 2024-09-07.
[23] M. Ester, H.-P. Kriegel, J. Sander, X. Xu, et al., A density-based algorithm for discovering clusters
in large spatial databases with noise., in: kdd, volume 96, 1996, pp. 226–231.
[24] F. T. Liu, K. M. Ting, Z.-H. Zhou, Isolation forest, in: 2008 eighth ieee international conference on
data mining, IEEE, 2008, pp. 413–422.