Adaptive Hierarchical Clustering for
                                    Petrographic Image Analysis
                 Andrea Pasini                                        Elena Baralis                              Paolo Garza
             Politecnico di Torino                                Politecnico di Torino                      Politecnico di Torino
                  Torino, Italy                                        Torino, Italy                              Torino, Italy
            andrea.pasini@polito.it                              elena.baralis@polito.it                     paolo.garza@polito.it

               Davide Floriello                                     Michela Idiomi                             Andrea Ortenzi
               ENI S.p.A.                                            ENI S.p.A.                                  ENI S.p.A.
      San Donato Milanese (MI), Italy                      San Donato Milanese (MI), Italy             San Donato Milanese (MI), Italy
        Davide.Floriello@eni.com                              Michela.Idiomi@eni.com                      Andrea.Ortenzi@eni.com

                                                                      Simone Ricci
                                                                    ENI S.p.A.
                                                           San Donato Milanese (MI), Italy
                                                               Simone.Ricci@eni.com
ABSTRACT
The task of analyzing the permeability of soil to natural gas and
oil has been studied by experts for many years. This analysis is
typically achieved by inspecting samples of terrain, called thin
sections. An important procedure of this task is the categoriza-
tion/clusterization of pores, which are microscopical cavities in
thin sections. This operation is manually carried out by domain
experts through the analysis of high-resolution images retrieved
with Scanning Electron Microscopes (SEM). Since the number of
pores is very high (more than 10,000 for each thin section), the
manual categorization procedure is performed on a small subset
of pores and hence the achieved estimations can be imprecise.                   Figure 1: Example of thin section analysis. On the left the
   To address the pore analysis problem, we propose a custom                    image retrieved with SEM, on the right the result after ap-
clustering pipeline that automatically groups pores inside thin                 plying our clustering method.
sections. The work has been conducted with the help of ENI,
a leading company in oil and gas extraction. Specifically, we
have designed the Adaptive Multi-level Dendrogram Cut method,                   information such as the size, distribution, shape, orientation and
a customized version of hierarchical clustering. The proposed                   textural relationships of minerals can be retrieved.
method is able to cut the hierarchical dendrogram at multiple                      Specifically, SEM images are highly useful for pore charac-
levels, being guided by both automatic metrics and domain expert                terization in rocks [2, 16]. Petrographic Image Analysis (PIA)
knowledge. In this paper, we show that our methodology pro-                     studies the presence and distribution of pores, which is impor-
duces higher quality results with respect to standard hierarchical              tant for analyzing the permeability of rocks to natural gas and
clustering algorithms. We also defined a set of techniques to gen-              oil [6, 9]. Within this process, SEM images are retrieved from
erate interpretable descriptions of the pore clusters. According                ground samples called thin sections. After acquisition, image pro-
to ENI’s experts, the obtained clusters have a good quality from                cessing algorithms can extract different characteristics of pores,
a geological point of view and help to automatize a very slow                   for example their size and shape. [4, 13]. Finally, domain experts
process that was carried out manually.                                          use these results to inspect rock properties such as their perme-
                                                                                ability.
                                                                                   Since each thin section possibly contains from thousands to
1    INTRODUCTION                                                               millions of pores, their manual categorization is applicable only
Petrography is the branch of petrology focused on classifying and               to a small subset. This entails an approximated result that highly
describing rocks from both a microscopical and a megascopical                   depends on the sampling performed over the initial data. An au-
point of view [3]. With the diffusion of Scanning Electron Mi-                  tomatic categorization/clusterization method may help avoiding
croscopes (SEM), optical analyses for petrography reached good                  most of the manual effort that is required to perform the analysis.
quality results due to the high resolution of the images. Indeed,               Both classification and clustering methods could be taken in con-
SEM images can reveal features that are invisible with optical                  sideration. However, classification requires having a sufficient
microscopy [8, 12, 15]. In particular, they allow distinguishing dif-           amount of training data with manually generated labels. This
ferent mineral phases based on variations of gray intensity. Other              implies that domain experts should label large quantities of pores
                                                                                to model the characteristics of different categories. Since there
© 2019 Copyright held by the author(s). Published in the Workshop Proceedings   are no public datasets with labeled pores and generating new
of the EDBT/ICDT 2019 Joint Conference (March 26, 2019, Lisbon, Portugal) on
CEUR-WS.org.                                                                    labeled data is very time consuming, we adopted a clustering
                                                                                based approach. Clustering methods are able to learn the data
                                                      Figure 2: Big Petro pipeline.


distribution and automatically derive a set of groups, which then      as this method builds globular clusters with a balanced number
can be more easily and quickly mapped to geological categories         of points, but the types of pores in our datasets present more
by domain experts. Clustering algorithms do not need training          complex shapes and are very imbalanced in number. For example
labels and can be applied to thousands of samples. With the            the category of small pores is typically an order of magnitude
proposed methodology, domain experts can concentrate their             more numerous than the one containing bigger pores.
efforts only on the geological interpretation of each cluster, while      Hierarchical clustering [10] techniques allow defining clusters
leaving to the algorithm the most time consuming task.                 by inspecting the cluster hierarchy with a structure called den-
   In this paper, we propose the Big Petro pipeline, a clustering      drogram. By cutting the dendrogram at a fixed level, k clusters
methodology specifically designed for automatically categoriz-         are generated. The generated clusters can be characterized by
ing pores inside thin sections. The proposed pipeline has been         different densities and imbalanced cardinalities. This technique,
designed and evaluated on real datasets with the support of the        already in its standard version, allowed us to obtain the best re-
domain experts of ENI, a leading company in oil and gas ex-            sults, according to the analysis performed by ENI domain experts.
traction. The contribution of our paper can be summarized as           For this reason, we selected hierarchical clustering as preferred
follows:                                                               method to be customized and integrated in our Big Petro pipeline.
    • Development of a customized semi-automatic process for
      clustering geological pores. This methodology helps do-          3   BIG PETRO PIPELINE
      main experts to avoid manually categorizing all pores.           The objective of our work is to automatically discover different
    • Definition of a variant of the standard usage of hierarchical    categories of geological pores in SEM images of rock samples by
      clustering. Specifically, we propose to obtain clusters with     means of the proposed Big Petro pipeline. The Big Petro pipeline
      an Adaptive Multi-level Dendrogram Cut, being guided by          is composed of different blocks, depicted in Figure 2. In the fol-
      both automatic metrics and domain experts knowledge.             lowing paragraphs we describe the characteristics of each block.
    • Inspection of different methodologies to describe the gen-          Data extraction. Each dataset to be analyzed consists of a
      erated clusters in terms of characterizing attributes distri-    single thin section acquisition, represented as a grayscale image.
      bution for each cluster. The description approaches allow        Figure 1a shows an example of a small portion of thin section,
      domain experts to interpret the different groups and derive      where the black irregular spots are the pores to be categorized.
      interesting insights from the analyzed thin sections.            In this first step, a tool developed by ENI processes the grayscale
                                                                       image that automatically selects the darker regions representing
2   RELATED WORK                                                       pores. Afterwards, the tool analyzes each pore and extracts a set
Clustering techniques [11] are widely used in many application         of geometrical features describing the pore shape and size. The
fields. To the best of our knowledge, this is the first attempt to     result of this operation is a structured dataset, with a record for
apply clustering techniques to automatize pore analysis. During        each pore, characterized by 32 numerical attributes. Section 5
our preliminary study of geological pores categorization, we           describes more in detail the characteristics of the considered
inspected the behavior of the following algorithms: DBSCAN [5],        datasets, extracted from different thin section samples.
KMeans [7], and agglomerative hierarchical clustering [10].               Data preparation. The second step of the pipeline involves
   DBSCAN [5] is a density based algorithm. Its driving idea is        the preprocessing of the datasets obtained in the data extraction
joining points in the same cluster when they exceed a minimum          phase. Each dataset is typically characterized by the presence
neighborhood density. The main issue with this technique is that       of many small pores represented by only few pixels. The low
the distribution of our data shows variable densities. Setting a       pixel resolution of these pores does not allow computing most of
low minimum density results in a single cluster containing all         the geometrical characteristics necessary for the analysis. Hence,
the pores, while when setting a high density many pores are            we filter out the smaller pores by applying a domain provided
considered as noise. Multiple runs of DBSCAN with different            threshold on their size. This operation also yielded smaller, more
configurations could potentially allow identifying clusters with       manageable datasets (from millions of pores to thousands), which
different densities. However, also the self-tuning techniques that     allowed a reduction of the clustering algorithm processing time.
use multiple runs of DBSCAN [1] did not yield good quality             In Section 5, Table 1 shows further details of the dataset cardi-
results on our data.                                                   nality before and after this filtering procedure.
   K-means [7] is a well-known centroid based clustering algo-            Next, attribute values are normalized by applying the z-score
rithm, which requires specifying the (fixed) desired number of         normalization, after which each attribute domain range is char-
clusters. The results with this technique were not satisfactory,       acterized by mean equal to zero and unitary variance. This step
is necessary to guide the distance metric used for clustering to
treat all the attributes in the same way, independently of their
value range.
    Adaptive Multi-level Dendrogram Cut. Pore clustering
takes place in this phase. It is composed of four steps: (i) den-
drogram construction, (ii) super-clusters generation, (iii) domain
experts analysis, and (iv) sub-clusters generation. The first task is
addressed by running an agglomerative hierarchical clustering
algorithm, which builds the dendrogram describing the hierarchy
of the pore aggregations. The second step consists of generating a
first set of clusters by cutting the dendrogram at a specific height.
These clusters correspond to macro-groups of pores, called super-
clusters and provide an initial coarse-grain aggregation of pores.
                                                                                        Figure 3: Dendrogram for Dset1.
Their number is chosen by analyzing the silhouette score [14],
as described in Section 5.
    Next, each super-cluster is manually analyzed by domain ex-             As mentioned before, agglomerative hierarchical clustering
perts to decide if further partitioning is needed. The standard          yields a binary tree structure, called dendrogram. The leaves
approach consisting of one single cut of the dendrogram may              of the tree represent single pores to be clustered, while nodes
generate some clusters already containing homogeneous pores              describe the merging points of clusters at different hierarchical
and some other clusters grouping heterogeneous pores. Simply             levels. The height of each merging point is defined by the linkage
cutting the dendrogram at a different height will split also the         distance between the two nodes to be merged.
already high-quality macro-clusters. For this reason, only for the          Let Ch be the set of k clusters obtained by cutting the dendro-
subset of super-clusters that are deemed by the domain experts           gram at a specific height h:
to need further split in sub-clusters, the sub-cluster generation
step is executed. Specifically, given a cluster to be further divided,                        Ch = {c 0h , ..., c ih , ..., c kh−1 }
we analyze its corresponding dendrogram sub-tree and cut the             The dendrogram cut at height h is exploited to generate the first
hierarchy at a new level of depth. The process continues with a          level of groups, denoted as super-clusters. For example in Figure 3
loop involving the domain expert analysis and the sub-clusters           the obtained super-clusters are tiny-pores, small-pores, big-pores1
generation. At each iteration, new levels of sub-clusters are added      and big-pores2 (these names have been conventionally assigned
until the results are satisfactory. Further details on this process      by domain experts after the analysis of the macro-clusters). The
are provided in Section 4.                                               value of h is selected by inspecting the silhouette value of the
    Cluster description. In this step, the obtained clusters are         obtained groups. In Section 5 we further discuss how this value
analyzed with different descriptive techniques. In Section 5, we         is chosen for the different analyzed datasets.
show how, by inspecting the attribute distribution in the obtained          When the domain experts deem the granularity of some macro-
clusters and the aggregation hierarchy in the dendrogram, inter-         clusters as insufficient, the iterative generation of sub-clusters is
esting insights are obtained. We also consider PCA plots to show         activated. Sub-clusters are generated by inspecting the descen-
the distribution of the cluster labels along the directions where        dants of their corresponding super-cluster. The set of sub-clusters
data have maximum variance.                                              of a cluster c ih obtained by cutting the dendrogram at height h ′
                                                                         is defined as:
4    ADAPTIVE MULTI-LEVEL DENDROGRAM                                                                           ′      ′
                                                                                subclustersh ′ (c ih ) = {c hj : c hj ∈ descendants(c ih )}
     CUT
The dendrogram is generated by an agglomerative hierarchical             where descendants(c ih ) is the set of descendants of c ih in the
                                                                                                  ′
clustering algorithm. Since our datasets consist of continuous at-       tree hierarchy and c hj are the nodes obtained by cutting the
tributes, the euclidean metric is used for computing the distance        dendogram at the new height h ′ . For example, in Figure 3 the
between pore pairs. After inspecting the results obtained with           set subclustersh ′ (tiny- pores) is composed of sub-clusters tiny-a,
different linkage techniques, we selected Ward’s method [17]. In         tiny-b, tiny-c. The definition of subclustersh ′ (c ih ) can be applied
particular the single linkage criterion did not fit our purposes, as     recursively to obtain multiple sets of subclusters, at different
it is sensible to noise and the points belonging to lower density        levels of the hierarchy for different cluster subsets, until the
clusters are not merged properly in the dendrogram. More specif-         results are satisfactory (see Figure 2).
ically, single linkage connects two clusters when they present
a pair of points which are very close together. For this reason          5    EXPERIMENTAL RESULTS
higher density clusters are merged together in the first stages of       ENI provided us three different datasets (Dset1, Dset2 and Dset3),
the dendogram and points in lower density regions appear in dif-         whose samples were extracted from three thin sections mostly
ferent isolated clusters. Instead, complete linkage tends to break       composed by carbonates. As described in Section 3, the smallest
bigger clusters to have equal-sized groups. The algorithm be-            pores in each dataset are filtered out using a threshold specified
haviour with this metric is approximately similar to K-means and         by ENI domain experts. Consider the left part of Table 1. The
does not produce good quality results. Average linkage produces          biggest dataset, Dset3, presents 1.3M pores, while after filtering
a trade-off result between the single and complete methods. How-         its size is 4K rows. The smallest dataset, Dset2, changes from
ever, by inspecting the first levels of the dendrogram together          300K to 5.9K pores. Note that, even if the original size of the
with ENI domain experts, we observed that Ward’s method yields           datasets can be quite different, the number of pores after filtering
a better separation of the main pore groups.                             takes similar values, ranging between 4K and 6K.
                       Datasets size                                      Super-clusters size
           Dataset     # pores # filtered pores # tiny-pores # small-pores # big-pores1 # big-pores2              # big-pores3
           Dset1       330346              6771          6194           489              74       14
           Dset2       301828              5880          5025           634             213        7                          1
           Dset3      1349554              4164          4034           103               2       25
                                             Table 1: Datasets and super-clusters size.


                                                                         Figures 6a, 6b, and 6c show the results of a Principal Component
                                                                      Anaysis (PCA) on the pores of the three datasets by visualizing
                                                                      the projection of the points on the first three components. The
                                                                      different colors represent points belonging to the generated super-
                                                                      clusters. For all three datasets, the two biggest clusters (tiny-pores,
                                                                      small-pores) are characterized by the highest density, while the
                                                                      others are sparser. Furthermore, the cluster structure is similar
                                                                      for all datasets. The biggest clusters are characterized by lower
                                                                      values of component pc1, while the smaller ones by higher values
                                                                      of pc1. Finally, the outlier in Dset2 is clearly visible as an isolated
                                                                      point in Figure 6b (big-pores3).
                                                                         Sub-clusters generation. According to the analysis performed
                                                                      by ENI domain experts, clusters big-pores1 and big-pores2 are pure
                                                                      and well characterized for all datasets. Instead, clusters tiny-pores
                                                                      and small-pores need to be further separated. Being guided by
Figure 4: Super clusters generation. Silhouette varying the           ENI domain experts we generated k ′ = k ′′ = 3 sub-clusters
number of clusters k.                                                 from tiny-pores and small-pores respectively. Figure 3 depicts the
                                                                      resulting dendrogram for dataset Dset1. The names of the sub-
                                                                      clusters are obtained by appending a suffix (-a,-b,-c) to the name
                                                                      of their corresponding super-cluster (e.g., tiny-a, tiny-b). Note
                                                                      that the cut point for the sub-clusters of small-pores is deeper in
                                                                      the dendogram than the one for the three sub-clusters of tiny-
                                                                      pores. This result cannot be obtained with a single dendrogram
                                                                      cut and motivates the need for the proposed Adaptive Multi-level
                                                                      Dendrogram Cut. To reach the granularity of the small-pores sub-
                                                                      clusters with a single dendrogram cut we need to choose a height
                                                                      H corresponding to K = 22 clusters. However, this K value would
                                                                      break the clusters big-pores1, big-pores2, tiny-a, tiny-b, tiny-c and
                                                                      would produce a lower quality clustering.
        Figure 5: Silhouette for different methods.
                                                                         Figure 5 shows the silhouette value for the final clusters ob-
                                                                      tained with Adaptive Multi-level Dendrogram Cut and those
   Super-clusters generation. After building the dendrogram           generated with a single dendrogram cut. The number of clusters
on the three datasets, we selected the proper values of h to cut      for the single dendrogram cut is K = 22, 19, 13 for Dset1, Dset2
the hierarchy tree and obtain the super-clusters. In particular,      and Dset3 respectively. The silhouette of the clusters obtained
Figure 4 plots the silhouette values against the number k of          with our method is clearly higher than the one based on one sin-
super-clusters in each dataset. For all datasets the silhouette       gle cut. This entails that our method not only produces a better
score decreases when k grows. However, even if the silhouette         partition from a geological point of view, but also a better domain-
is higher with lower values of k, a very low number of super-         agnostic quality. A further discussion on the results shown in
clusters does not provide meaningful groups from a geological         Figure 5 is provided in Section 6.
point of view. Datasets Dset1 and Dset3 show stable silhouette           Cluster description. We provide a description of the ob-
values in the range k = 3 to k = 4. Hence, we selected k = 4, the     tained clusters by inspecting the attributes that allow charac-
maximum value of k before the silhouette rapidly decreases, for       terizing them. The PCA representation in Figure 6 shows the dis-
these two datasets. Instead, dataset Dset2 shows stable silhouette    tribution of the different super and sub-clusters. We inspected the
values for k = 2, 3 and k = 4, 5. For the same motivation, we set     attributes which are most representative of each of the three prin-
k = 5, i.e., the highest number of clusters in the region of stable   cipal components, by considering the ones with highest eigen-
silhouette values.                                                    values. For example, the first component, pc1, is characterized by
   Table 1 shows the size of the obtained super-clusters. All         size attributes. In particular, higher values of pc1 entail higher
datasets are characterized by two big clusters (tiny-pores, small-    size of the pores. In Figures 6a, 6b, 6c, the super-clusters tiny-
pores) and two smaller ones (big-pores1, big-pores2). Moreover,       pores and small-pores are characterized by lower values of pc1,
Dset2 shows a cluster (big-pores3) with a single pore, which can      while big-pores1 and big-pores2 present higher values. Figure 7a
be considered an outlier. This confirms that the choice k = 5 for     shows the distribution of one of the size attributes in Dset1 for the
dataset Dset2 produces similar results to the ones obtained with      different super-clusters. It confirms that the distinction between
the other two datasets.                                               super clusters is partially driven by the size of the pores.
          (a) Super-clusters of Dset1.                   (b) Super-clusters of Dset2.                      (c) Super-clusters of Dset3.


         (d) Final clustering of Dset1.                  (e) Final clustering of Dset2.                    (f) Final clustering of Dset3.


                       Figure 6: Clusters obtained with hierarchical clustering for all datasets. PCA plots.


          (a) Super-clusters of Dset1.                    (b) Sub-clusters of Dset1.                        (c) Sub-clusters of Dset1.


                                                    Figure 7: Cluster distributions.


   Figures 6d, 6e, 6f show the final clustering, including both the         6    LESSONS LEARNED
sub-clusters and the super-clusters that have not been further              The analysis conducted in the previous sections highlights the
divided. It can be noticed that sub-clusters present a finer sepa-          following results. First, the silhouette score, a domain-agnostic
ration that also involves components pc2 and pc3. For example               quality index for evaluating clusters, is not always a good metric
in Dset1 clusters tiny-b and tiny-c are well separated by pc2. The          from a domain driven point of view. Consider Figure 5. The
most relevant attributes for pc2 describe how much pores are                silhouette score computed on the super-clusters is higher than
stretched, obtaining a non circular shape. Looking at Figure 7b it          the one obtained with the Adaptive Multi-level Dendrogram Cut.
is possible to see that pores in tiny-c show a higher stretching            For example, in Dset3 super-clusters reach a silhouette score
than those in tiny-b that are more circular. The third principal            of 0.73, while Adaptive Multi-level Dendrogram Cut clusters
component, pc3, is related to pore irregularity, instead. From Fig-         score 0.2. However, the quality of the clusters from the domain
ure 6d pores in cluster small-a present a lower shape irregularity          experts point of view is higher for the proposed method, as the
than those in small-c. This is also confirmed by the histograms             super-clusters show an excessively coarse subdivision. In fact,
in Figure 7c. Following this methodology, ENI domain experts                super-clusters have a strong correlation with the pore size, but
are able to derive interesting insights, describing the geological          cannot capture more complex characteristics of the pore shape.
characteristics of the analyzed thin section.                                  Second, the cluster description confirmed that sub-clusters
                                                                            represent a finer subdivision of the initial groups according to
more complex characteristics, such as the pore stretching and                                 network analyzer. IEEE Transactions on Network and Service Management 13,
shape irregularity. According to the domain experts, this group-                              3 (2016), 696–710.
                                                                                          [2] Philip W Choquette and Lloyd C Pray. 1970. Geologic nomenclature and
ing has a good quality from a geological point of view, because it                            classification of porosity in sedimentary carbonates. AAPG bulletin 54, 2
generates groups that are distinguished by both geometrical and                               (1970), 207–250.
                                                                                          [3] M.G. Edwards. 2008. Introduction to Optical Mineralogy and Petrography - The
genetic characteristics. These characteristics are fundamental for                            Practical Methods of Identifying Minerals in Thin Section with the Microscope
geologists to inspect the structure of the analyzed thin section                              and the Principles Involved in the Classification of Rocks. Read Books.
and its permeability. Hence, the role of domain knowledge is                              [4] Robert Ehrlich, Stephen K Kennedy, Sterling J Crabtree, and Robert L Cannon.
                                                                                              1984. Petrographic image analysis; I, Analysis of reservoir pore complexes.
fundamental in driving the clustering process to obtain higher                                Journal of Sedimentary Research 54, 4 (1984), 1365–1378.
quality results.                                                                          [5] Martin Ester, Hans-Peter Kriegel, Jörg Sander, Xiaowei Xu, et al. 1996. A
                                                                                              density-based algorithm for discovering clusters in large spatial databases
                                                                                              with noise.. In Kdd, Vol. 96. 226–231.
7    CONCLUSIONS AND FUTURE WORK                                                          [6] Edward L Etris, David S Brumfield, Robert Ehrlich, and Sterling J Crabtree. 1988.
In this paper, we proposed the Adaptive Multi-level Dendrogram                                Relations between pores, throats and permeability: a petrographic/physical
                                                                                              analysis of some carbonate grainstones and packstones. Carbonates and
Cut method for clustering geological pores in thin sections and                               Evaporites 3, 1 (1988), 17.
we evaluated its results on three different real datasets provided                        [7] B-H Juang and Lawrence R Rabiner. 1990. The segmental K-means algorithm
                                                                                              for estimating parameters of hidden Markov models. IEEE Transactions on
by ENI. We demonstrated that the Adaptive Multi-level Den-                                    acoustics, speech, and signal Processing 38, 9 (1990), 1639–1641.
drogram Cut allows obtaining sub-clusters that capture more                               [8] Kitty L Milliken, Lucy T Ko, Maxwell Pommer, and Kathleen M Marsaglia.
complex characteristics of the pore shape and have higher sil-                                2014. SEM petrography of eastern Mediterranean sapropels: Analogue data
                                                                                              for assessing organic matter in oil and gas shales. Journal of Sedimentary
houette values than the sub-clusters that would be obtained with                              Research 84, 11 (2014), 961–974.
the corresponding single level dendrogram cut. Currently, the                             [9] Theodore T Mowers and David A Budd. 1996. Quantification of porosity and
Big Petro pipeline is under further evaluation by ENI domain                                  permeability reduction due to calcite cementation using computer-assisted
                                                                                              petrographic image analysis techniques. AAPG bulletin 80, 3 (1996), 309–321.
experts, which are planning to deploy it shortly.                                        [10] Fionn Murtagh. 1983. A survey of recent advances in hierarchical clustering
   As future work, we plan to integrate this pipeline with domain                             algorithms. Comput. J. 26, 4 (1983), 354–359.
                                                                                         [11] T. Pang-Ning, Steinbach M., and Kumar V. 2006. Introduction to data mining.
driven self-tuning techniques that will allow fully-automatic pore                            Addison-Wesley.
clustering. Furthermore, we will work on improving the cluster                           [12] Kenneth Pye and David H Krinsley. 1984. Petrographic examination of sedi-
description step to help domain experts in interpreting clustering                            mentary rocks in the SEM using backscattered electron detectors. Journal of
                                                                                              Sedimentary Research 54, 3 (1984), 877–888.
results.                                                                                 [13] Chandra L Reedy. 2006. Review of digital image analysis of petrographic
                                                                                              thin sections in conservation research. Journal of the American Institute for
Acknowledgements                                                                              Conservation 45, 2 (2006), 127–146.
                                                                                         [14] Peter J Rousseeuw. 1987. Silhouettes: a graphical aid to the interpretation and
The research leading to these results has been supported by the                               validation of cluster analysis. Journal of computational and applied mathemat-
                                                                                              ics 20 (1987), 53–65.
SmartData@PoliTO center for Big Data and Machine Learning                                [15] Ashok K Singh, Mamta Sharma, and Mahendra P Singh. 2013. SEM and
technologies.                                                                                 reflected light petrography: A case study on natural cokes from seam XIV,
                                                                                              Jharia coalfield, India. Fuel 112 (2013), 502–512.
                                                                                         [16] Sandra Neils Tonietto, Margaret Z Smoot, and Michael Pope. 2014. PS Pore
REFERENCES                                                                                    Type Characterization and Classification in Carbonate Reservoirs. (2014).
 [1] Daniele Apiletti, Elena Baralis, Tania Cerquitelli, Paolo Garza, Danilo Giordano,   [17] Joe H Ward Jr. 1963. Hierarchical grouping to optimize an objective function.
     Marco Mellia, and Luca Venturini. 2016. SeLINA: a self-learning insightful               Journal of the American statistical association 58, 301 (1963), 236–244.