<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta>
      <journal-title-group>
        <journal-title>Joint Conference (March</journal-title>
      </journal-title-group>
    </journal-meta>
    <article-meta>
      <title-group>
        <article-title>Exploring energy performance certificates through visualization</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tania Cerquitelli</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Evelina Di Corso</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Stefano Proto</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Alfonso Capozzoli</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabio Bellotti</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Maria G. Cassese</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Elena Baralis</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Marco Mellia</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Silvia Casagrande</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Martina Tamburini</string-name>
          <xref ref-type="aff" rid="aff3">3</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Control and Computer engineering, Politecnico di Torino</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Department of Electronics and Telecommunications, Politecnico di Torino</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Department of Energy, Politecnico di Torino</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
        <aff id="aff3">
          <label>3</label>
          <institution>Edison Spa</institution>
          ,
          <addr-line>Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <volume>26</volume>
      <issue>2019</issue>
      <abstract>
        <p>Energy Performance Certificates (EPCs) provide interesting information on the standard-based calculation of energy performance, thermo-physical and geometrical related properties of a building. Because of the volume of available data (issued as open data) and the heterogeneity of the attributes, the exploration of these energy-related data collection is challenging. This paper presents INDICE (INformative DynamiC dashboard Engine), a new data visualization framework able to automatically explore large collections of EPCs. INDICE explores EPCs through both querying and analytics tasks, and intuitively presents the output through informative dashboards. The latter include dynamic and interactive maps along with diferent informative charts allowing diferent stakeholders (e.g., domain and non-domain expert users) to explore and interpret the extracted knowledge at different spatial granularity levels. The objective of INDICE is to create energy maps useful for the characterization of the energy performance of buildings located in diferent areas. The experimental evaluation, performed on a real set of EPCs related to a major Italian region in the North West of Italy, demonstrates the efectiveness of INDICE in exploring an EPC dataset through diferent data and knowledge visualization techniques.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Nowadays large volumes of energy-related data are continuously
collected in diferent domains. To reduce wasteful energy
consumption, several orthogonal applications (e.g., buildings,
IoTbased devices, wireless networks) increased their policy priority
on energy eficiency. According to the U.S. Department of
Energy, in industrialized countries more than 40% of total energy is
consumed in buildings [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. In the last few years many eforts
have been devoted to improve building energy eficiency with
different final goals: (i) facilitating proactive energy-saving services
[
        <xref ref-type="bibr" rid="ref32">32</xref>
        ], (ii) characterizing data streams of energy consumption of
individual residential consumers in buildings [
        <xref ref-type="bibr" rid="ref5 ref6 ref7">5–7</xref>
        ], (iii)
characterizing heating energy demand through the analysis of energy
performance certificates of buildings [
        <xref ref-type="bibr" rid="ref11 ref4 ref9">4, 9, 11</xref>
        ], and (iv) reducing
emissions and energy consumption for buildings [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ].
      </p>
      <p>
        To enhance the efectiveness of data and knowledge
exploration, a variety of data visualization techniques have been
proposed. In [
        <xref ref-type="bibr" rid="ref22 ref23 ref26">22, 23, 26</xref>
        ] the authors exploited choropleth maps to
analyze the energy consumption and the electricity
consumption per unit area, respectively. Instead, in [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], the authors used
dynamic simulations of building energy consumption and
building information to develop urban energy maps with high spatial
resolutions. However, all the above works proposed static maps
to analyze the average values of some features of interest. The
exploitation of dynamic and navigable maps tailored to the
analysis of energy-related data has not been proposed so far. The
authors in [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ] propose an interactive 3D visualization to analyze
the Linking Open Data (LOD) cloud adopting the metaphor of
urban area. The visualization is interactive, meaning that the
user can enlarge any part of the model, modify the perspective,
change the shape of the buildings and their positioning, view all
the connections or only those belonging to a specific data set. A
parallel research efort has been devoted to explore and
summarize geolocated time series data through maps [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Moreover, a
great research efort has been done in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ], in which the authors
propose a city energy model based on the requests and need for
visualization from a group of energy consultants. Their proposed
model ofers stakeholders a powerful tool for evaluating both the
current state and future scenarios.
      </p>
      <p>This paper presents INDICE (INformative DynamiC dashboard
Engine), a data visualization framework generating interactive
and navigable dashboards through the analysis of a set of Energy
Performance Certificates (EPCs). An EPC is a legal requirement
when constructing, selling or renting a building, and it provides
interesting information on the calculated standard energy
performance, thermo-physical and geometrical properties of existing
buildings. The multi-tiered framework INDICE has been
proposed to efectively deal with large collection of EPCs. With
respect to the other works, our framework brings together many
diferent analysis techniques to help non-expert users make sense
of Energy Performance Certificates. Indeed, after a pre-processing
step, cluster analysis allows discovering groups of EPCs with
similar features. To summarize the energy performance of buildings
at diferent granularities, INDICE generates informative
dashboards tailored to diferent energy stakeholders, combining both
a rich set of interesting knowledge and ease of use.</p>
      <p>The proposed informative dashboards exploit diferent kinds
of energy maps to show data and knowledge at diferent spatial
granularity levels. The proposed visualization techniques allow
diferent energy stakeholders to easily capture the high-level
overview of heating energy demand at a city level, and
drilldown the knowledge to the single apartment. Moreover, in order
to analyze the energy eficiency of diferent buildings through
the most interesting attributes under analysis, INDICE includes
cluster-markers which dealing with the problem of representing
multiple variables at the same time.</p>
      <p>As a case study, a real collection of EPCs related to a major
Italian region, in North West Italy, was analyzed. Preliminary
experimental results show that the proposed approach is efective
in visualizing a manageable set of human-readable knowledge
for each end-user thought dynamic and interactive maps.</p>
      <p>The next sections of the paper are organized as follows.
Section 2 introduces an overview of the INDICE system with a
thorough description of its main building blocks. Section 3 discusses
the preliminary experimental results obtained on a real data
collection and Section 4 draws conclusions and presents the future
development of this work.
2</p>
    </sec>
    <sec id="sec-2">
      <title>THE INDICE ANALYTICS SYSTEM</title>
      <p>INDICE (INformative DynamiC dashboard Engine) has been
tailored to analyze any collection of EPCs. The analysis of this
kind of data is challenging, due to the large number of attributes
characterizing each energy performance certificate. The
exploitation of this high dimensional data is burdensome due to the high
variability and dimensionality of data. INDICE combines diferent
techniques to efectively visualize a rich set of knowledge items
for a variety of energy stakeholders. The overall architecture is
shown in Figure 1. INDICE includes three main building blocks,
each one addressing one of the main steps of the
knowledgeextraction process: (i) Data pre-processing, (ii) Data selection and
analytics, and (iii) Data and Knowledge visualization. In the
following, a detailed description of each building block is given.
2.1</p>
    </sec>
    <sec id="sec-3">
      <title>Data Pre-processing</title>
      <p>The INDICE pre-processing phase aims at smoothing the efect of
possibly unreliable data. It performs two tasks which have been
proved to be crucial in real-world geospatial data: (i) geospatial
coordinates cleaning and (ii) outlier detection and removal.</p>
      <sec id="sec-3-1">
        <title>2.1.1 Geospatial data cleaning.</title>
        <p>This pre-processing step is crucial when the final aim is to
display data and knowledge through maps. INDICE includes an
ad-hoc strategy to clean geospatial attributes, including address,
house number, ZIP Code, latitude and longitude. Since the
address attribute is usually collected as a free text field, it often
contains numerous typos and input errors, which require
careful analysis to be correctly fixed. To clean the above-mentioned
attributes, INDICE includes a multi-step algorithm to correctly
reconstruct and correct the wrong information. Specifically, it
compares the available addresses with a referenced street map
that is usually available for each city. The referenced street map
should contain all the detailed information on streets, including
street names, house numbers, ZIP Code and geolocation (i.e.,
latitude and longitude). Given a city under analysis, INDICE
automatically downloads the referenced street map if it is available
online.</p>
        <p>
          The referenced street map is exploited by INDICE to verify the
reliability of the addresses in the dataset under analysis to correct
errors in the address field and at the same time reconstruct
missing or incorrect information in the attributes ZIP Code, house
address, latitude and longitude. Specifically, the developed
algorithm compares each string in the dataset with the ones in the
referenced street map. For each couple of addresses Levenshtein
distance [
          <xref ref-type="bibr" rid="ref19">19</xref>
          ] is computed to evaluate the similarity between two
character strings, in terms of the minimum number of
modiifcations (insertions, deletions and substitutions) necessary to
transform the first string into the second one. The similarity
computed from Levenshtein distance takes values in the range
[
          <xref ref-type="bibr" rid="ref1">0-1</xref>
          ], where 0 indicates total dissimilarity and 1 equality of the
compared strings. Given a user-defined threshold ϕ, the
referenced address (the most similar to the address under analysis)
replaces the original one if Levenshtein similarity between the
two addresses is greater than or equal to ϕ. When the association
to a referenced address is not possible, i.e., Levenshtein
similarities are below ϕ, a geocoding request is sent via the Google
Geocoding APIs1. The latter is a reliable service providing a
textual address to reconstruct the whole address in a consistent way.
However, INDICE exploits the Google Geocoding service only
when the association cannot be resolved through the referenced
street map due to a limit on the number of free requests.
        </p>
      </sec>
      <sec id="sec-3-2">
        <title>2.1.2 Outlier detection and removal.</title>
        <p>An outlier is an extreme value that deviates from other
observations on data. It may occur either when the collected value
does not fit the model under study or when some error happens
during the data collection phase. To address this issue, INDICE
exploits three approaches: (i) univariate outlier detection, (ii) mixed
univariate analysis, and (iii) multivariate outlier detection.
Independently of the above adopted strategies, values labelled as
outliers are not considered in the subsequent steps of analysis.</p>
        <p>
          Univariate outlier. INDICE integrates three methodologies
to automatically detect outliers and remove them for the
subsequent analytics steps: (i) the graphic boxplot method, (ii) the
parametric generalized Extreme Studentized Deviate (gESD) method
[
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] and (iii) the non-parametric Median Absolute Deviation
(MAD) [
          <xref ref-type="bibr" rid="ref15">15</xref>
          ]. The boxplot [
          <xref ref-type="bibr" rid="ref31">31</xref>
          ] (aka whiskers plot) is a
convenient way of visually displaying a data distribution through its
quartiles. The frequency distribution of each variable is summed
up through a few numbers (i.e. median, quartiles, min and max
values). The median summarizes the central tendency of the
distribution, while the quartiles give an indication of the variability
through the interquartile diference. The minimum and
maximum values provide not only information about extremes but
also on the possible presence of data with abnormal
characteristics w.r.t. the other points, plotting them individually. For each
variable, the analyst can manually remove the outliers (i.e., the
values smaller and greater than the minimum and the maximum)
through value filters.
1https://developers.google.com/maps/documentation/geocoding/intro
The gESD method [
          <xref ref-type="bibr" rid="ref27">27</xref>
          ] is used to detect one or more outliers
in a univariate data set. This test needs a parameter which is
the upper bound on the number of potential outliers. INDICE
tests the null hypothesis that the data has no outliers versus the
alternative hypothesis that there are at most k outliers (for some
user specified value of k). Given the upper bound, k, the gESD
test essentially performs k separate tests: a test for one outlier, a
test for two outliers, and so on up to k outliers. In INDICE the
number of outliers is determined by finding the largest value
r (with r ≤ k), such that the corresponding test gives a value
higher than the critical one.
        </p>
        <p>
          Lastly, in statistics the MAD method[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ] is a robust measure of
the variability of a univariate sample of quantitative data.
Calculating the MAD is straightforward, as it only involves finding the
median of absolute deviations from the median. It is calculated
by taking the absolute diference between each point and the
corresponding median, and then calculating the median of those
diferences. As proposed in [
          <xref ref-type="bibr" rid="ref16">16</xref>
          ], INDICE uses the score of 3.5 as
cut-of value. This means that every point with a score above 3.5
is considered an outlier.
        </p>
        <p>The users can exploit all the diferent univariate methodologies
and/or choose the most suitable one. If a non-expert user does
not know how to deal with these outlier detection techniques,
she can use default configurations, as described below.</p>
        <sec id="sec-3-2-1">
          <title>Expert-driven univariate analysis. Because some non</title>
          <p>expert users may be interested in analysing EPC collections,
INDICE suggests the univariate outlier detection method mostly
used by domain experts in the past interactions with INDICE.
Specifically, by collecting and storing expert user (e.g., energy
scientists) INDICE configurations, the non-expert users can
receive interesting and efective suggestions to properly deal with
noisy data. In the current version of INDICE, only relevant
attributes describing the building thermo-physical characteristics
(e.g., Aspect Ratio, Average U-value of the vertical opaque
envelope and Average U-value of the windows) and the eficiency of
the heating subsystems (e.g., Distribution Subsystem Eficiency
and Generation Subsystem Eficiency ) have been considered. In
this way, if a non-expert user does not know which univariate
analysis technique should be used, she can use a configuration
adopted by previous INDICE expert users, since their choices
are automatically stored as default configurations for non-expert
users.</p>
        </sec>
        <sec id="sec-3-2-2">
          <title>Multivariate outlier detection. For the multivariate outlier</title>
          <p>
            detection, INDICE integrates the DBSCAN algorithm
(DensityBased Spatial Clustering of Application with Noise) [
            <xref ref-type="bibr" rid="ref12">12</xref>
            ] to
automatically identify outliers. Specifically, DBSCAN detects clusters
based on a density reachability concept, where clusters with
higher-density regions are separated by lower-density regions.
DBSCAN requires two user-defined parameters (i.e., minPoints
and Epsilon). To properly specify these input parameters INDICE
plots the k-distance graph and automatically estimates a good
value for each parameter. As proposed in [
            <xref ref-type="bibr" rid="ref10">10</xref>
            ], INDICE runs
several times the k-distance plot for diferent values of minPoints,
and selects minPoints when the curve stabilises, and Espilon as
the elbow point of the stable curve.
2.2
          </p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Data selection and analytics</title>
      <p>The knowledge visualization step is preceded by a data selection
and analytics phase. Since each energy performance certificate
includes a large number of features characterized by a great
variability, in order to extract accessible knowledge and implement
data mining algorithms (e.g., cluster analysis, association rules)
data have to be properly transformed and peculiar attributes
have to be selected. Several techniques have been used to reduce
the complexity of the datasets under analysis and discover
efective and hidden knowledge, interesting and readable by all the
diferent stakeholders involved in the analysis. This component
includes two innovative engines: (i) the query engine and (ii) the
data analytics engine.</p>
      <sec id="sec-4-1">
        <title>2.2.1 Querying engine.</title>
        <p>To select and explore the dataset under analysis, INDICE
implements a query engine that lets the user focus on the single
attributes of the energy performance certificates. Possible
stakeholders may be citizens, public administration and energy
scientists. Each of them could be interested in diferent
characteristics of the dataset under analysis. For each stakeholder, INDICE
produces the best possible representation to highlight the main
interesting facets of the results. Citizens could be interested in the
energy analysis of the buildings related to a specific area of the
city, or in the geometric features that characterize the buildings
belonging to the same intended use. The citizens may want to
discover areas of the city with more performing buildings, to buy
a flat that performs well in terms of energy eficiency. The public
administration may be instead being interested in identifying
areas where to promote and invest for energy renovations. Energy
scientists could use INDICE to explore and characterize through
supervised and unsupervised techniques groups of building with
similar properties to perform benchmarking analysis. Based on
the target of each stakeholder, the system is able to automatically
propose to the specific end-user an optimal set of interesting
reports and graphical representations, with the possibility to set
manually the subset of features and parameters for the queries
to which she is interested in.</p>
      </sec>
      <sec id="sec-4-2">
        <title>2.2.2 Data analytics engine.</title>
        <p>
          To extract meaningful and interesting knowledge items from data,
INDICE includes diferent supervised and exploratory algorithms
to automatically analyze feature subsets. INDICE integrates the
K-means clustering algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] to create groups of buildings
with similar thermo-physical and energy properties, and
association rule mining [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ] to extract interesting correlations among
features.
        </p>
        <p>
          K-means algorithm. The partitional K-means cluster
algorithm [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ] is exploited by INDICE to identify groups of EPCs
characterized by similar properties. To measure the similarity
between EPCs, the Euclidean distance is computed. The K-means
algorithm, which is the most popular clustering algorithm, divides
the input dataset into K groups, where K is defined a-priori. The
average of all the energy certificates in each cluster represents
the centroid (representative point) of each group of buildings.
First, the algorithm chooses randomly K initial centroids. Then,
each point is assigned to the closest centroid and the centroids are
recalculated. The previous steps are repeated until the centroids
no longer change. K-means is able to identify a good cluster set
in a limited computational time. INDICE analyses the trend of
the SSE (saReadum of squared error) quality index to evaluate the
cluster cohesion [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] and automatically identify possible good K
values. The SSE is computed as the total sum of squared errors
for all objects in the collection, where for each object the error is
computed as the squared distance from the closest centroid. As
done in [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ], in INDICE the K value is chosen as the point where
the marginal decrease in the SSE curve is maximized (aka elbow
approach).
        </p>
        <p>
          Association rules. One of the most powerful exploratory
techniques in data mining aiming at finding interesting
correlations among data is represented by association rule discovery [
          <xref ref-type="bibr" rid="ref1">1</xref>
          ].
An association rule is expressed in the form A → B, where A and
B are disjoint and non-empty itemsets, (i.e., A ∩ B = ∅). A is also
called rule antecedent and B rule consequent. Since association
rules extraction operates on a transactional dataset of categorical
attributes, a discretization step is needed to convert the original
continuously-valued measurements into categorical bins. The
discretization adopted in INDICE are described in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]. The used
technique involves creating a decision CART (Classification And
Regression Tree) [
          <xref ref-type="bibr" rid="ref2">2</xref>
          ] for each variable, using as response variable
the annual primary energy demand normalized on the floor area.
The tree splits are used as bins in the discretization process. To
select only a subset of interesting rules, constraints on various
goodness measures are used. INDICE includes four well-known
quality indices: i) support, ii) confidence , iii) lift, and iv) conviction.
The rule support is the percentage of transactions that contain
both antecedent and consequent; confidence is the conditional
probability that the consequent is true under the condition of the
antecedent; lift [
          <xref ref-type="bibr" rid="ref30">30</xref>
          ] measures the correlation between the
antecedent and the consequent; conviction [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] measures the degree
of implication of a rule. Default thresholds are set by INDICE
however the end-user could change the default values to analyze
at diferent granularity level the extracted rules.
2.3
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>Informative dashboard</title>
      <p>The aim of this component is to visualize and make the
information and the extracted knowledge easy to be interpreted at
diferent levels of detail. To this extent, INDICE includes
interactive and navigable dashboards tailored to diferent use cases,
providing both domain specific information and high-level energy
demand overviews. Indeed, the dashboards can be customized for
each end-user, providing deep targeted knowledge for domain
experts and human-readable informative contents for non-expert
users. Besides displaying charts and diagrams, which are typical
of statistics and generally dificult to interpret, since the
geolocalized EPC data lend themselves very well to be visualized
on maps, INDICE proposes several techniques to explore and
visualize the knowledge extracted from EPCs.</p>
      <p>The dashboards include (i) geospatial maps, including traditional
maps as choropleth and scatter maps and a new type of map
named cluster-marker map, (ii) frequency distribution plots, (iii)
association rules, and (iv) correlation matrices. These visualization
techniques are jointly exploited by INDICE to graphically show
the extracted knowledge at diferent spatial granularity levels
such as city, district, neighbourhood, or housing unit (e.g.,
certificates belonging to the same building).</p>
      <p>Geospatial maps. In INDICE, three geospatial maps have
been integrated: (i) choropleth maps, (ii) scatter maps, and (iii)
cluster-marker maps. These energy maps are related to each other,
as each user can switch from one view to another, simply by
changing the analysis zoom (i.e., drill down in the energy map)
or introducing the knowledge of the cluster-markers. In
choropleth maps each area (at diferent zoom levels) is colored
according to the average value of the considered variable for the
area under analysis. The scatter maps report a point and its
corresponding value for each EPC (and so residential unit)
contained in the selected area. Cluster-marker maps, similarly to
the choropleth maps, aggregate multiple certificates coloring the
dynamic markers according to the average of the values of the
aggregated points. While the first two geospatial maps (i.e.,
choropleth and scatter maps) are useful for analyzing single variables,
the cluster-marker visualization faces the problem of
representing multiple variables at the same time. Specifically, exploring
a single variable at coarse granularity levels could lead to flat
and poor representative maps. To this extent, INDICE includes
cluster-markers to introduce a new feature to the maps, in order
to analyze the energy eficiency of several buildings through
various attributes. The cardinality of the corresponding cluster
afects the size of the marker and is reported inside the marker.
These maps have been used together, ensuring in a single solution
diferent levels of detail depending on the zoom degree selected
by the user. Figure 2 shows examples of analysis results at
diferent granularity levels, visualizing various information features
on the maps. In the upper part of Figure 2, a set of attributes
(i.e., the Average U-value of the vertical opaque envelope and the
Average U-value of the windows, see Section 3 for further attribute
details) extracted from the EPCs by means of the querying engine
has been displayed. The choropleth map shows the average value
of the attributes for the selected area together with the scatter
marker of each single point, visualized at neighbourhood and
housing unit zoom levels, respectively. The users can navigate
the map and check the attribute values for each certificate by
clicking on the markers. In the bottom part of Figure 2, the
information obtained through the data analytics engine (e.g., the
identification of the areas characterized by lower and medium
energy performances) has been visualized at district (Left) and
city (Right) levels. The cluster-markers show the cardinality of
each cluster, together with the average value of an independent
response variable chosen in the analytic process.</p>
      <sec id="sec-5-1">
        <title>Frequency distribution plots. For a given area, the frequency</title>
        <p>distributions (e.g., quartiles or deciles) of the features selected
for the visualization task are reported. A frequency distribution
of data can be shown in a table or graph/diagrams. Some
common methods include frequency tables, histograms or bar charts.
These distributions can refer to single attributes or to aggregate
information extracted from the analytic task, hence to groups of
similar certificates according to the subsets of attributes selected
for the analysis. INDICE provides a setting panel to select one or
more distribution visualizations, including the description of the
main statistical indices. For numeric data, INDICE includes count,
mean, standard deviation and the three quartiles (i.e., median, first
and third quartiles), while for categorical attributes, the count,
the most common value’s frequency (i.e., mode) and the top-k
frequent values are reported. The end-user can select a response
variable against which to color the attribute distributions.</p>
        <p>Association rules. INDICE discovers correlations in terms
of association rules. However, to ease the manual inspection of
the most interesting correlations, INDICE defines templates to
characterize the attributes and represent the association rules
using a tabular visualization. By sorting on quality indices, only
the top-k rules that satisfy all constraints may be displayed. Rules
can be extracted at diferent granularity levels, e.g., for each city,
neighbourhood or downstream of the clustering algorithm.</p>
        <p>
          Correlation matrices. To reduce the complexity of the
analysis and remove correlated attributes from the analytic process,
INDICE proposes correlation matrices to analyze the dependence
between variables. For each pair of numerical attributes X and Y,
the framework computes the Pearson correlation coeficient [
          <xref ref-type="bibr" rid="ref28">28</xref>
          ],
defined as ρX,Y = cov(X,Y ) where cov(X , Y ) is the covariance
σX σY
between X and Y , σX is the standard deviation of X and
analogously σY for Y . Each coeficient value is translated into a gray
level in the black-and-white scale to represent the correlation
intensity in a plot matrix. When the selected set of attributes has
no evident linear correlation, it is eligible for the analytic task.
3
        </p>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>PRELIMINARY EXPERIMENTAL RESULTS</title>
      <p>INDICE has been experimentally evaluated on a real collection
of building energy performance certificates. The EPCs are issued
in the years between 2016 and 2018 for buildings and flats
located in Piedmont, a major Italian region. This dataset has been
collected and openly released by CSI Piemonte (the Information
System Consortium)2 and regulated by the Piedmont Region
authority (Sustainable Energy Development Sector). The dataset
includes approximately 25000 energy certificates, each one
characterized by 132 features, including energy and thermo-physical
attributes, divided into 89 categorical attributes and 43
quantitative attributes.</p>
      <p>
        INDICE has been developed in Python [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], including the
scikitlearn library [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] (for the analytic tasks) and folium library [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ]
(for visualization purposes).
3.1
      </p>
    </sec>
    <sec id="sec-7">
      <title>Case study</title>
      <p>To evaluate the efectiveness of INDICE, we focus on a case study
having as stakeholder the public administration (PA). The
results are obtained by tailoring the analysis to the city of Turin
and selecting the EPCs related to the housing units of type E.1.1
(buildings used as permanent residence). To clean the geospatial
coordinates, in the specific address, house number, ZIP Code,
latitude and longitude for each EPC, INDICE applies the algorithm
proposed in Section 2. This algorithm compares the addresses
in the EPC dataset and the addresses in an open dataset3
provided by the municipality of Turin, containing the city roads,
with street names, house numbers, ZIP Code and geolocation
(i.e., (latitude, longitude)). This database was used to verify the
reliability of the addresses in our dataset. In our case study, if
2http://www.csipiemonte.it/web/it/
3https://www.sciamlab.com/opendatahub/dataset/c_l219_260
the PA user is interested in discovering which areas of a city are
more energy consuming and which are more eficient, she could
select the following subset of attributes, which characterize the
thermo-physical properties of each building: Aspect Ratio (S/V),
Average U-value of the vertical opaque envelope (Uo ), Average
Uvalue of the windows (Uw ), Heat surface (Sr ) and Average global
eficiency for space heating (ETAH). The Aspect Ratio represents
the geometric shape of a building. Uo and Uw measure the heat
loss through the opaque and the transparent elements of the
building, respectively. The lower the thermal transmittance of
the building envelope, the lower the heat flow that is transmitted
through the elements themselves. The Heat surface corresponds
to the heated floor area. Lastly, the ETAH index takes into
account all the thermal losses of each subsystem, including the
generation, distribution, emission and control subsystems. The
PA user may be interested in discovering groups of buildings
with homogeneous thermo-physical properties. To address this
task the K-means clustering algorithm can be applied.</p>
      <p>Before clustering, the correlation between the considered
numerical attributes is checked. In Figure 3, the correlation plot
matrix between the considered attribute pairs is reported. Dark
squares represent high linear correlation between the two
variables, while light squares represent low correlation. All the
variables considered in the analysis are weakly correlated (i.e., there
is no evident linear association between variable pairs). Hence,
the results obtained from the five attributes selected for the
clustering phase (i.e., S/V, Uo , Uw , Sr and ETAH) and the response
variable Normalized primary heating energy consumption (EPH),
allow the extraction of non-trivial knowledge from data. Figure
4 shows the results obtained by the data analytics engine for
the features described above. From the charts reported in the
dashboard, the analyst can explore the frequency distribution of
a specific attribute, as the response variable EPH, or its
distribution in the cluster set detected by INDICE. Moreover, interesting
correlation rules4 can be extracted and visualized using a tabular
representation. In this way, every end user, independently of
her expertise degree, can detect the attributes which influence
most the energy performance of buildings and find out the
geographical areas for which a certain set of rules apply. Driven
4The discretization used for the dynamic dashboard is as follows. 4 classes for
the Average U-value of the windows (i.e., Low = [1.1, 2.05], medium = (2.05, 2.45],
High = (2.45, 3.35] and Very high = (3.35, 5.5]); 3 classes for the Average U-value
of vertical opaque envelope (i.e., Low = [0.15, 0.45], medium = (0.45, 0.65], High =
(0.65, 1.1]; 3 classes for the Average global eficiency for space heating (i.e., Low =
[0.20, 0.60], medium = (0.60, 0.80], High = (0.80, 1.1].
by the extracted knowledge, the PA user may support and
incentive renovation policies targeting specific low performance
neighborhoods, or identifying groups of similar EPCs.
4</p>
    </sec>
    <sec id="sec-8">
      <title>CONCLUSIONS AND FUTURE WORKS</title>
      <p>This paper presents INDICE, a new data visualization framework
that analyzes EPC collections at diferent granularity levels.
After a preprocessing step, INDICE extracts interesting and hidden
knowledge for diferent end-users. Informative dynamic
dashboards have been presented to show useful information, at
different geospatial levels and with enriched map representations
(e.g., the cluster-marker map).</p>
      <p>As future work we plan to integrate in INDICE other analytics
techniques (both supervised and unsupervised) to provide a more
lfexible and enhanced analysis. Furthermore, the analysis process
should be empowered by an automatic tool suggesting
appropriate analysis configurations for the considered datasets. To this
aim, we are currently planning to release our framework INDICE
in order to have real feed-backs from end-users (e.g., citizens,
energy experts, public administration). In this way, we could
improve the choices of the default configurations, but also include
and integrate further representations to improve the visualization
of the extracted knowledge.</p>
    </sec>
    <sec id="sec-9">
      <title>Acknowledgments</title>
      <p>The research leading to these results has been supported by the
SmartData@PoliTO center for Big Data and Machine Learning
technologies.</p>
      <p>The authors express their gratitude to Giovanni Nuvoli
(Settore Sviluppo Energetico Sostenibile - Regione Piemonte) and to
CSI Piemonte.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Rakesh</given-names>
            <surname>Agrawal</surname>
          </string-name>
          , Tomasz Imieliński, and
          <string-name>
            <given-names>Arun</given-names>
            <surname>Swami</surname>
          </string-name>
          .
          <year>1993</year>
          .
          <article-title>Mining association rules between sets of items in large databases</article-title>
          .
          <source>In Acm sigmod record. ACM</source>
          ,
          <volume>207</volume>
          -
          <fpage>216</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>Leo</given-names>
            <surname>Breiman</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Classification and regression trees</article-title>
          .
          <source>Routledge.</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>Sergey</given-names>
            <surname>Brin</surname>
          </string-name>
          , Rajeev Motwani,
          <string-name>
            <surname>Jefrey D Ullman</surname>
            , and
            <given-names>Shalom</given-names>
          </string-name>
          <string-name>
            <surname>Tsur</surname>
          </string-name>
          .
          <year>1997</year>
          .
          <article-title>Dynamic itemset counting and implication rules for market basket data</article-title>
          .
          <source>Acm Sigmod Record</source>
          <volume>26</volume>
          ,
          <issue>2</issue>
          (
          <year>1997</year>
          ),
          <fpage>255</fpage>
          -
          <lpage>264</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Alfonso</given-names>
            <surname>Capozzoli</surname>
          </string-name>
          , Daniele Grassi, Marco Savino Piscitelli, and
          <string-name>
            <given-names>Gianluca</given-names>
            <surname>Serale</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>Discovering Knowledge from a Residential Building Stock through Data Mining Analysis for Engineering Sustainability</article-title>
          .
          <source>Energy Procedia</source>
          <volume>83</volume>
          (
          <year>2015</year>
          ),
          <fpage>370</fpage>
          -
          <lpage>379</lpage>
          . https://doi.org/10.1016/j.egypro.
          <year>2015</year>
          .
          <volume>12</volume>
          .212
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>Tania</given-names>
            <surname>Cerquitelli</surname>
          </string-name>
          , Gianfranco Chicco, Evelina Di Corso, Francesco Ventura, Giuseppe Montesano, Mirko Armiento, Alicia Mateo González, and Andrea Veiga Santiago.
          <year>2018</year>
          .
          <article-title>Clustering-Based Assessment of Residential Consumers from Hourly-Metered Data</article-title>
          .
          <source>In 2018 International Conference on Smart Energy Systems and Technologies (SEST)</source>
          .
          <source>IEEE</source>
          , 1-
          <fpage>6</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Tania</given-names>
            <surname>Cerquitelli</surname>
          </string-name>
          , Gianfranco Chicco, Evelina Di Corso, Francesco Ventura, Giuseppe Montesano,
          <source>Anita Del Pizzo</source>
          , Alicia Mateo González, and Eduardo Martin Sobrino.
          <year>2018</year>
          .
          <article-title>Discovering electricity consumption over time for residential consumers through cluster analysis</article-title>
          .
          <source>In 2018 International Conference on Development and Application Systems (DAS)</source>
          .
          <source>IEEE</source>
          ,
          <fpage>164</fpage>
          -
          <lpage>169</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>Tania</given-names>
            <surname>Cerquitelli</surname>
          </string-name>
          and Evelina Di Corso.
          <year>2016</year>
          .
          <article-title>Characterizing Thermal Energy Consumption through Exploratory Data Mining Algorithms.</article-title>
          . In EDBT/ICDT Workshops.
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>Georgios</given-names>
            <surname>Chatzigeorgakidis</surname>
          </string-name>
          , Dimitrios Skoutas, Kostas Patroumpas, Spiros Athanasiou, and
          <string-name>
            <given-names>Spiros</given-names>
            <surname>Skiadopoulos</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>Map-Based Visual Exploration of Geolocated Time Series</article-title>
          .
          <source>In Proceedings of the Workshops of the EDBT/ICDT 2018 Joint Conference (EDBT/ICDT</source>
          <year>2018</year>
          ), Vienna, Austria, March
          <volume>26</volume>
          ,
          <year>2018</year>
          .
          <fpage>92</fpage>
          -
          <lpage>99</lpage>
          . http://ceur-ws.
          <source>org/</source>
          Vol-2083/paper-14.pdf
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>Giuliano</given-names>
            <surname>Dall'O</surname>
            , Luca
          </string-name>
          <string-name>
            <surname>Sarto</surname>
          </string-name>
          , Nicola Sanna, Valeria Tonetti, and
          <string-name>
            <given-names>Martina</given-names>
            <surname>Ventura</surname>
          </string-name>
          .
          <year>2015</year>
          .
          <article-title>On the use of an energy certification database to create indicators for energy planning purposes: Application in northern Italy</article-title>
          . Energy Policy 85,
          <string-name>
            <surname>C</surname>
          </string-name>
          (
          <year>2015</year>
          ),
          <fpage>207</fpage>
          -
          <lpage>217</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>Evelina</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Corso</surname>
          </string-name>
          , Tania Cerquitelli, and
          <string-name>
            <given-names>Daniele</given-names>
            <surname>Apiletti</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>METATECH: METeorological Data Analysis for Thermal Energy CHaracterization by Means of Self-Learning Transparent Models</article-title>
          .
          <source>Energies</source>
          <volume>11</volume>
          ,
          <issue>6</issue>
          (
          <year>2018</year>
          ),
          <fpage>1336</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>Evelina</given-names>
            <surname>Di</surname>
          </string-name>
          <string-name>
            <surname>Corso</surname>
          </string-name>
          , Tania Cerquitelli, Marco Savino Piscitelli, and
          <string-name>
            <given-names>Alfonso</given-names>
            <surname>Capozzoli</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Exploring Energy Certificates of Buildings through Unsupervised Data Mining Techniques. In Internet of Things (iThings) and IEEE Green Computing and Communications (GreenCom) and</article-title>
          IEEE Cyber,
          <article-title>Physical and Social Computing (CPSCom) and IEEE Smart Data (SmartData</article-title>
          ),
          <source>2017 IEEE International Conference on. IEEE</source>
          ,
          <fpage>991</fpage>
          -
          <lpage>998</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Martin</surname>
            <given-names>Ester</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Hans-Peter Kriegel</surname>
            , Jörg Sander,
            <given-names>Xiaowei</given-names>
          </string-name>
          <string-name>
            <surname>Xu</surname>
          </string-name>
          , et al.
          <year>1996</year>
          .
          <article-title>A density-based algorithm for discovering clusters in large spatial databases with noise.</article-title>
          .
          <source>In Kdd</source>
          .
          <volume>226</volume>
          -
          <fpage>231</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <article-title>Filipe and Martin Journois at all</article-title>
          .
          <source>2018. python-visualization/folium: v0.6.0. (Aug</source>
          .
          <year>2018</year>
          ). https://doi.org/10.5281/zenodo.1344457
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Xiaohong</surname>
            <given-names>Guan</given-names>
          </string-name>
          , Zhanbo Xu, and
          <string-name>
            <surname>Qing-Shan Jia</surname>
          </string-name>
          .
          <year>2010</year>
          .
          <article-title>Energy-eficient buildings facilitated by microgrid</article-title>
          .
          <source>IEEE Transactions on smart grid 1</source>
          ,
          <issue>3</issue>
          (
          <year>2010</year>
          ),
          <fpage>243</fpage>
          -
          <lpage>252</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Frank</surname>
            <given-names>R</given-names>
          </string-name>
          <string-name>
            <surname>Hampel</surname>
          </string-name>
          .
          <year>1974</year>
          .
          <article-title>The influence curve and its role in robust estimation</article-title>
          .
          <source>Journal of the american statistical association 69</source>
          ,
          <issue>346</issue>
          (
          <year>1974</year>
          ),
          <fpage>383</fpage>
          -
          <lpage>393</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>Boris</given-names>
            <surname>Iglewicz</surname>
          </string-name>
          and David Caster Hoaglin.
          <year>1993</year>
          .
          <article-title>How to detect and handle outliers</article-title>
          . Vol.
          <volume>16</volume>
          . Asq Press.
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Tim</surname>
            <given-names>Johansson</given-names>
          </string-name>
          , Mattias Vesterlund, Thomas Olofsson, and
          <string-name>
            <given-names>Jan</given-names>
            <surname>Dahl</surname>
          </string-name>
          .
          <year>2016</year>
          .
          <article-title>Energy performance certificates and 3-dimensional city models as a means to reach national targets-A case study of the city of Kiruna</article-title>
          .
          <source>Energy Conversion and Management</source>
          <volume>116</volume>
          (
          <year>2016</year>
          ),
          <fpage>42</fpage>
          -
          <lpage>57</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>B.-H.</given-names>
            <surname>Juang</surname>
          </string-name>
          and
          <string-name>
            <given-names>L.R.</given-names>
            <surname>Rabiner</surname>
          </string-name>
          .
          <year>1990</year>
          .
          <article-title>The segmental K-means algorithm for estimating parameters of hidden Markov models</article-title>
          .
          <source>IEEE Transactions on Acoustics, Speech and Signal Processing</source>
          <volume>38</volume>
          ,
          <issue>9</issue>
          (Sep
          <year>1990</year>
          ),
          <fpage>1639</fpage>
          -
          <lpage>1641</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Vladimir</surname>
            <given-names>I</given-names>
          </string-name>
          <string-name>
            <surname>Levenshtein</surname>
          </string-name>
          .
          <year>1966</year>
          .
          <article-title>Binary codes capable of correcting deletions, insertions, and reversals</article-title>
          .
          <source>In Soviet physics doklady</source>
          .
          <volume>707</volume>
          -
          <fpage>710</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>Xue</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Lanshun</given-names>
            <surname>Nie</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Shuo</given-names>
            <surname>Chen</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Approximate Dynamic Programming Based Data Center Resource Dynamic Scheduling for Energy Optimization</article-title>
          . In IEEE iThings/GreenCom/CPSCom 2014, Taipei, Taiwan, September 1-
          <issue>3</issue>
          ,
          <year>2014</year>
          .
          <fpage>494</fpage>
          -
          <lpage>501</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <surname>Feng-Yi</surname>
            <given-names>Lin</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Tzu-Ping Lin</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ruey-Lung Hwang</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Using geospatial information and building energy simulation to construct urban residential energy use map with high resolution for Taiwan cities</article-title>
          .
          <source>Energy and Buildings</source>
          <volume>157</volume>
          (
          <year>2017</year>
          ),
          <fpage>166</fpage>
          -
          <lpage>175</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>Sara</given-names>
            <surname>Torabi</surname>
          </string-name>
          <string-name>
            <surname>Moghadam</surname>
          </string-name>
          , Patrizia Lombardi, and
          <string-name>
            <given-names>Guglielmina</given-names>
            <surname>Mutani</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>A mixed methodology for defining a new spatial decision analysis towards low carbon cities</article-title>
          .
          <source>Procedia Engineering</source>
          <volume>198</volume>
          (
          <year>2017</year>
          ),
          <fpage>375</fpage>
          -
          <lpage>385</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>Y</given-names>
            <surname>Olivo</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A</given-names>
            <surname>Hamidi</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P</given-names>
            <surname>Ramamurthy</surname>
          </string-name>
          .
          <year>2017</year>
          .
          <article-title>Spatiotemporal variability in building energy use in New York City</article-title>
          .
          <source>Energy</source>
          <volume>141</volume>
          (
          <year>2017</year>
          ),
          <fpage>1393</fpage>
          -
          <lpage>1401</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <surname>Maria-Evangelia</surname>
            <given-names>Papadaki</given-names>
          </string-name>
          , Panagiotis Papadakos, Michalis Mountantonakis, and
          <string-name>
            <given-names>Yannis</given-names>
            <surname>Tzitzikas</surname>
          </string-name>
          .
          <year>2018</year>
          .
          <article-title>An Interactive 3D Visualization for the LOD Cloud</article-title>
          .
          <source>In Proceedings of the Workshops of the EDBT/ICDT 2018 Joint Conference (EDBT/ICDT</source>
          <year>2018</year>
          ), Vienna, Austria, March
          <volume>26</volume>
          ,
          <year>2018</year>
          .
          <fpage>100</fpage>
          -
          <lpage>103</lpage>
          . http://ceur-ws.
          <source>org/</source>
          Vol-2083/paper-15.pdf
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>F.</given-names>
            <surname>Pedregosa</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G.</given-names>
            <surname>Varoquaux</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Gramfort</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Michel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Thirion</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Grisel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Blondel</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Prettenhofer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Weiss</surname>
          </string-name>
          ,
          <string-name>
            <given-names>V.</given-names>
            <surname>Dubourg</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Vanderplas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Passos</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Cournapeau</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Brucher</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perrot</surname>
          </string-name>
          , and
          <string-name>
            <given-names>E.</given-names>
            <surname>Duchesnay</surname>
          </string-name>
          .
          <year>2011</year>
          .
          <article-title>Scikit-learn: Machine Learning in Python</article-title>
          .
          <source>Journal of Machine Learning Research</source>
          <volume>12</volume>
          (
          <year>2011</year>
          ),
          <fpage>2825</fpage>
          -
          <lpage>2830</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>Iraci</given-names>
            <surname>Miranda</surname>
          </string-name>
          Pereira and Eleonora Sad de Assis.
          <year>2013</year>
          .
          <article-title>Urban energy consumption mapping for energy management</article-title>
          .
          <source>Energy Policy</source>
          <volume>59</volume>
          (
          <year>2013</year>
          ),
          <fpage>257</fpage>
          -
          <lpage>269</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Bernard</given-names>
            <surname>Rosner</surname>
          </string-name>
          .
          <year>1983</year>
          .
          <article-title>Percentage points for a generalized ESD many-outlier procedure</article-title>
          .
          <source>Technometrics</source>
          <volume>25</volume>
          ,
          <issue>2</issue>
          (
          <year>1983</year>
          ),
          <fpage>165</fpage>
          -
          <lpage>172</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <surname>Sheldon</surname>
            <given-names>M</given-names>
          </string-name>
          <string-name>
            <surname>Ross</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Introduction to probability models</article-title>
          . Academic press.
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>Guido</given-names>
            <surname>Rossum</surname>
          </string-name>
          .
          <year>1995</year>
          .
          <article-title>Python Reference Manual</article-title>
          .
          <source>Technical Report</source>
          . Amsterdam, The Netherlands, The Netherlands.
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <surname>Pang-Ning Tan</surname>
          </string-name>
          et al.
          <year>2007</year>
          .
          <article-title>Introduction to data mining</article-title>
          .
          <source>Pearson Education India.</source>
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <surname>John</surname>
            <given-names>W Tukey.</given-names>
          </string-name>
          <year>1977</year>
          .
          <article-title>Box-and-whisker plots</article-title>
          .
          <source>Exploratory data analysis</source>
          (
          <year>1977</year>
          ),
          <fpage>39</fpage>
          -
          <lpage>43</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref32">
        <mixed-citation>
          [32]
          <string-name>
            <surname>Chao-Lin</surname>
            <given-names>Wu</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wei-Chen</surname>
            <given-names>Chen</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yi-Show</surname>
            <given-names>Tseng</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Li-Chen Fu</surname>
          </string-name>
          , and
          <string-name>
            <surname>Ching-Hu Lu</surname>
          </string-name>
          .
          <year>2014</year>
          .
          <article-title>Anticipatory Reasoning for a Proactive Context-Aware Energy Saving System</article-title>
          . In IEEE iThings/GreenCom/CPSCom 2014, Taipei, Taiwan, September 1-
          <issue>3</issue>
          ,
          <year>2014</year>
          .
          <fpage>228</fpage>
          -
          <lpage>234</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>