Segmentation of remotely sensed images with a neuro-fuzzy inference system Giovanna Castellano, Ciro Castiello, Andrea Montemurro, Gennaro Vessio and Gianluca Zaza Department of Computer Science, University of Bari Aldo Moro, Italy Abstract The semantic segmentation of remotely sensed images is a difficult task because the images do not repre- sent well-defined objects. To tackle this task, fuzzy logic represents a valid alternative to convolutional neural networksβ€”especially in the presence of very limited dataβ€”, as it allows to classify these objects with a degree of uncertainty. Unfortunately, the fuzzy rules for doing this have to be defined by hand. To overcome this limitation, in this work we propose to use an adaptive neuro-fuzzy inference system (ANFIS), which automatically infers the fuzzy rules that classify the pixels of the remotely sensed im- ages, thus realizing their semantic segmentation. The resulting fuzzy model guarantees a good level of accuracy in the classification of pixels despite the few input features and the limited number of images used for training. Moreover, unlike the classic deep learning approaches, it is also explanatory, since the classification rules produced are similar to the way of thinking of human beings. Keywords Remote sensing, semantic segmentation, neuro-fuzzy model, ANFIS 1. Motivations and objectives Remote sensing images cover diverse applications in meteorology, agriculture, geology, biodi- versity conservation, land use planning, education, intelligence and warfare (see, for example, [1, 2]). The images can be acquired in color in the visible spectrum (for example in RGB) or in other electromagnetic spectra (for example in the infrared). The semantic segmentation of this type of images is an important task to make them usable and understandable also by human beings, in the fields of application mentioned above [3]. In recent years, the dominant approach to tackle this task has been the one based on Con- volutional Neural Networks (CNNs). Popular models used for semantic segmentation include U-Net [4] and SegNet [5], and many variants have been proposed in the particular remote sensing domain (e.g., [6, 7]). However, while successful, the CNN-based approach has some major drawbacks. First, huge labeled training sets are typically required to develop accurate models, and the data collection process can be very expensive and time-consuming. Second, the WILF 2021: 13th International Workshop on Fuzzy Logic and Applications " giovanna.castellano@uniba.it (G. Castellano); ciro.castiello@uniba.it (C. Castiello); a.montemurro23@studenti.uniba.it (A. Montemurro); gennaro.vessio@uniba.it (G. Vessio); gianluca.zaza@uniba.it (G. Zaza)  0000-0002-6489-8628 (G. Castellano); 0000-0002-8471-5403 (C. Castiello); 0000-0002-0883-2691 (G. Vessio); 0000-0003-3272-9739 (G. Zaza) Β© 2021 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) CNN-based approach generally does not provide human-understandable explanations of the process by which a certain output is obtained. This clearly limits the end user’s confidence in the developed model and does not help to use the given prediction in decision-making processes. On the other hand, fuzzy inference systems, as an extension of traditional expert systems, can best use small and inherently inaccurate data to generate simple, but accurate models that are much easier to interpret and explain. A fuzzy system, in fact, can explicitly represent image classification decisions in the form of IF-THEN rules, which are very easy to understand [8, 9]. For example, a fuzzy inference system for the explainable semantic segmentation of trees in satellite images has recently been proposed in [10]. The system uses the HSV color space as input, and is optimized using a Big Bang-Big Crunch evolutionary procedure. Similarly, a semantic segmentation method based on an interval type-2 fuzzy membership function for high-resolution remote sensing images has been presented in [11]. Domain knowledge is the primary source of the fuzzy rules and fuzzy set parameters on which an inference system is based. Therefore, acquiring this knowledge is laborious, error-prone and highly subjective. One way to get around the limitations of both neural networks and fuzzy systems is to integrate them into a unified framework in which the shortcomings of one method are mitigated by the advantages of the other, and vice versa. This is already done in the literature in the form of so-called neuro-fuzzy models [12]. Basically, a neuro-fuzzy model is a fuzzy system in which a learning algorithm typically employed for neural networks, such as the popular backpropagation algorithm, is exploited to learn fuzzy set parameters directly from data. This paradigm is attracting growing interest in the remote sensing research area. For example, a fuzzy extreme learning machine has recently been proposed in [13] for the classification of remote sensing images based on a domain adaptation approach. Instead, a new deep fuzzy convolutional network has been presented in [14], which leverages fuzzy logic to provide more reliable segmentation results than classic CNN models. To contribute to this research effort, in this paper we propose to use an adaptive neuro-fuzzy inference system (ANFIS) [15] for the semantic segmentation of remotely sensed images. The proposed method provides a better understanding of how the decision made by the system was reached. Furthermore, since it transforms the image-level segmentation of remotely sensed images into a pixel-level classification, the method requires a much lower labeled training set than is typically required. It is worth noting that in this work we focus on RGB images, which are much easier to collect with traditional optical cameras. The next section presents the proposed method. Section 3 describes the experimental setup and the results obtained. Section 4 concludes the paper. 2. Neuro-fuzzy modeling To learn fuzzy rules for pixel classification starting from available data, we trained a neuro-fuzzy model, that is a neural network that encodes a collection of IF-THEN fuzzy rules in its structure. To this end, we considered zero-order Takagi-Sugeno (TS) fuzzy models [16], whose rules have fuzzy sets in the antecedents and singleton values in the consequent parts. Formally, the TS fuzzy rules can be formalized as: IF (π‘₯1 is π΄π‘˜1 ) AND . . . AND (π‘₯π‘š is π΄π‘˜π‘š ) THEN (𝑦1 is π‘π‘˜1 ) AND . . . AND (𝑦𝐢 is π‘π‘˜πΆ ), for π‘˜ = 1, . . . , 𝐾, where 𝐾 is the number of rules. Moreover, π΄π‘˜π‘– , (𝑖 = 1, . . . , π‘š) are fuzzy sets defined over the input feature vector x = [π‘₯1 , . . . , π‘₯π‘š ], 𝐢 is the number of classes and π‘π‘˜π‘— is a fuzzy singleton expressing the degree of certainty of the output class 𝑦𝑗 , (𝑗 = 1, . . . , 𝐢). Each fuzzy set is defined by a Gaussian membership function: [οΈƒ (οΈ‚ )οΈ‚ ]οΈƒ π‘₯𝑖 βˆ’ π‘π‘˜π‘– 2 π‘’π‘˜π‘– = πœ‡π‘˜π‘– (π‘₯𝑖 ) = exp βˆ’ , (1) πœŽπ‘˜π‘– where π‘π‘˜π‘– and πœŽπ‘˜π‘– are the center and width of the Gaussian function, respectively. Gaussian functions are quite popular in the fuzzy literature: they are always continuous functions which can be defined on the basis of a reduced number of parameters. Also, their employment in a fuzzy rule base implies the functional equivalence between the fuzzy system and the radial basis function network [17], which is the inspiring idea of neuro-fuzzy modeling. Once fuzzy rules have been defined, they are used to provide degrees of certainty for each class by means of the following fuzzy reasoning procedure: 1. Compute fuzzy membership values π‘’π‘˜π‘– according to Eq. (1); 2. For π‘˜ = 1, . . . , 𝐾, compute the activation level of the π‘˜-th rule using the product operator: π‘š ∏︁ π‘’π‘˜ (x) = π‘’π‘˜π‘– ; (2) 𝑖=1 3. Normalize the rule activation levels: π‘’π‘˜ (x) πœ‡π‘˜ (x) = βˆ‘οΈ€πΎ ; (3) β„Ž=1 π‘’β„Ž (x) 4. For each class, compute the degree of certainty: 𝐾 βˆ‘οΈ 𝑦𝑗 = πœ‡π‘˜ (x) Β· π‘π‘˜π‘— 𝑗 = 1, . . . , 𝐢. (4) π‘˜=1 Fuzzy rules are typically defined manually by the domain expert who specifies the fuzzy terms in the antecedent and consequent of each rule. When expert knowledge is lacking (as it may be the case in an image segmentation task), the parameters of fuzzy rules can be better learned from the data by leveraging the learning capabilities of neural networks. This combination yields neuro-fuzzy networks, which are adaptive fuzzy systems capable of learning fuzzy rules from data [18]. Thus, a model learned by a neuro-fuzzy network can be represented by a set of fuzzy rules understandable to humans. To learn the parameters of the fuzzy rules, namely the antecedent fuzzy set parameters π‘π‘˜π‘– and πœŽπ‘˜π‘– , and the consequent parameters π‘π‘˜π‘— , we use a neuro-fuzzy architecture similar to ANFIS [15], which is a feed-forward neural network reflecting the fuzzy rule base in its parameters and topology. The architecture of the network (sketched in Fig. 1) includes four layers that realize the inference of the fuzzy rules by calculating, respectively: the membership degree to fuzzy sets; the activation level of each fuzzy rule; the normalized activation level for each fuzzy rule; the final output. Training is carried out through an optimization procedure of backpropagation-type based on gradient descent over the parameter space. Figure 1: Simplified schema of the neuro-fuzzy network architecture. 3. Experiment The experiment was conducted through an implementation of ANFIS in PyTorch.1 The com- plexity of the network architecture may vary with respect to the scheme depicted in Fig. 1 since in our experiments we adopted different levels of granularity to partition the input ranges (as discussed in Section 3.2). As a running environment, we used Google Colaboratory, which provides powerful GPUs for free. Below, we will present the data and discuss the experimental results. 3.1. Data To collect the data useful for our experiments, we referred to the Wuhan Dense Labeling Dataset (WHDLD) [19], which is a publicly available dataset suitable for image retrieval, classification or semantic segmentation. WHDLD includes images cropped from a larger satellite image of the Wuhan urban area: 4940 RGB images were obtained with the same pixel size (256Γ—256) and resolution equal to 2 meters. The pixels of each image are manually labelled considering six classes: vegetation; building; pavement; water; road; and bare soil. Since our goal is to produce a model for pixel classification, we considered a limited number of images from the WHDLD dataset. Actually, considering only 18 images, we were able to collect the information relating to 1,179,648 labelled pixels. As stated earlier, this represents a major advantage of our approach, as it is possible to train a classification model by referring to a limited amount of images (provided that they are chosen in such a way to preserve the balanced co-occurrence of different pixel classes, with reference to the distribution within the original dataset). As a reference, it can be observed that the training of the fully convolutional networks proposed in literature chosen for comparison (see the subsequent discussion in Section 3.2) has been performed on the basis of more than 250 million labelled pixels. The selected images consist of the following pixel categories: vegetation (26%); building (19%); pavement (18%); water (14%); road (12%); and bare soil (11%). The predominance of vegetation pixels reflects the composition of the original satellite image, where only a few limited portions of roads and bare soil were involved. To tackle our pixel classification task, we built up a dataset starting from the analysis of the selected images. For each of them, a four-dimensional feature vector was considered. The 1 https://github.com/jfpower/anfis-pytorch Figure 2: A sample (gray scale) image and the corresponding one obtained applying the entropy filter. information embedded in the feature vector concerns the values of the red, green and blue channels of the pixel, and the value of the pixel when an entropy filter is applied to the image. In other words, each pixel is described in terms of its RGB components, together with an additional feature related to the local entropy information. This is evaluated by applying a filter to the gray-scale version of the image, which returns the minimum number of bits needed to encode the local gray-scale distribution. Entropy analysis is a way to classify textures in image processing applications. In fact, certain textures can be associated to specific entropy values depending on the repetition of patterns within the image. Therefore, low entropy values concern more homogeneous components than those characterized by higher entropy values. This information was included in the feature vector to provide an additional low-level feature for distinguishing image pixels. In Fig. 2, we show one of the images (on the left, in its gray scale version) chosen to build our dataset and the corresponding filtered image (on the right) obtained at the end of the entropy filter application. After feature extraction, each sample in the dataset is completed by the information concerning the pixel class. 3.2. Results and discussion To derive the classification model from the data, some parameters must be set. A first choice concerns the number of fuzzy sets to be associated with the input features (i.e., the level of granularity to partition the ranges of input values). This is a crucial point, since more fuzzy sets generally lend themselves to a more adequate representation of the data distribution. However, higher granularity also increases the complexity of the resulting predictive model. Since 4 input features are involved, the choice of 2, 3, and 4 fuzzy sets per input leads to the realization of a model including 16, 64, 256 fuzzy rules, respectively. The additional parameters are the number of training epochs, which was set to 200, and the size of the training batch, which was set to 32. Table 1 reports the information relating to the three derived models (namely, NFC1, NFC2, and NFC3), including their composition (number of fuzzy sets per input feature, total number of rules), the training time of the corresponding neuro-fuzzy networks, and the value of their overall accuracy. The experiments were conducted considering the previously described dataset composed by 1,179,648 pixel datapoints, which was arranged in a stratified split of training and test sets equal to 70% and 30% of the samples, respectively. As expected, the accuracy of the predictive models improves as their complexity increases. However, the aggravation in terms of training time and, most importantly, complexity exhibited by the NFC2 and NFC3 models is not rewarded by a significant boost in classification perfor- Table 1 Details of the three neuro-fuzzy classification models derived from the data. Model # of fuzzy sets # of Training Overall ID per input feature rules time accuracy NFC1 2 16 3h 39m 13s 74.98% NFC2 3 64 5h 20m 5s 76.50% NFC3 4 256 7h 15m 51s 77.53% Figure 3: Membership functions composing the rule antecedents of NFC1. Table 2 Performance of NFC1 evaluated in terms of precision, recall, and f1-score. Class Precision Recall F1-score Vegetation 0.73 0.79 0.75 Building 0.75 0.83 0.79 Pavement 0.75 0.72 0.73 Water 0.93 0.87 0.90 Road 0.57 0.47 0.51 Bare Soil 0.77 0.72 0.75 mance. Therefore, we can refer in the following analysis to NFC1 which offers the benefit of being quite simple and readable for a human user. In Fig. 3 the membership functions composing the antecedents of the fuzzy rules involved in NFC1 are depicted for the sake of illustration. When applied to the test set, the neuro-fuzzy model showed the performance reported in Table 2, described in terms of precision, recall, and f1-score. To appreciate the interpretability of the resulting fuzzy model, an excerpt of rules of the NFC1 model is depicted in Fig. 4. These fuzzy rules offer an understanding of how the neuro-fuzzy system performs the image classification process. For example, it can be seen that pixels with low values in all channels are classified with high certainty as water, while pixels with high values in all three channels are classified as urban lands, i.e. building, but also road and pavement. Vegetation pixels generally exhibit a high value of the green channel, as expected. Finally, to better assess the suitability of our proposal, we considered a different method Figure 4: An excerpt of the rules produced by NFC1. introduced in the literature for the sake of comparison. The work described in [19] consists in a multi-label approach for image retrieval: it is based on the employment of fully convolutional networks (FCNs) trained on the same WHDLD dataset adopted in our experimental session. More precisely, the cited work relies on the use of the entire image dataset (split into a training and a validation set). On the other hand, our proposal can be implemented considering only a limited subset of images: this represents a first advantage in view of the comparison. From a methodological point of view, the FCN proposed in [19] is first trained to derive a segmentation map of the images, that is preparatory for a subsequent process of multi-label image retrieval performed on the basis of a region-based similarity measure. Although our approach does not aim to fulfil the entire pipeline of an image retrieval process, we can compare our results with those appeared in literature, limited to the segmentation performance, evaluated in terms of some metrics related to the pixel classification level. Table 3 illustrates this comparison. The involved metrics are the overall accuracy, the averaged recall and the mean Jaccard index formally described in [20]: they have been chosen in order to allow a comparison with the results reported in [19]. As it can be observed, the proposed model was able to produce satisfactory results: even if it does not match the accuracy performance of the FCN-based model, it is still able to exhibit higher values of the remaining considered metrics. More precisely, Table 2 shows that the overall accuracy of the proposed model is penalized by a poor ability to recognize the pixels relative to the road class, probably due to the fact that from the chromatic point of view these are very similar to those of the building and pavement class, which are also often close to gray. Fig. 5 shows the pixel classification for some test images, i.e., images taken from the WHDLD collection which did not contribute to the realization of our training set. Each image is coupled with a map corresponding to the pixel classification of the image, thus providing a segmentation of the illustrated scene. 4. Conclusion In this work, the ANFIS neuro-fuzzy system has been applied to the semantic segmentation of remotely sensed images. The results obtained are encouraging, since the proposed approach Table 3 Comparison between our proposed model and the best performing one proposed in [19]. Model Overall accuracy Averaged recall Mean Jaccard index NFC1 0.75 0.73 0.60 FCN-based 0.81 0.68 0.56 Figure 5: Test images and the corresponding pixel classification produced by NFC1. can learn an effective model using even a very limited dataset for training. Moreover, the rules derived can provide an understandable explanation of why some areas have been segmented in some way, which is a desirable feature in different planning scenarios, where experts must justify their decisions based on the support provided by the machine. While the use of pixel-level information is beneficial, it limits the model to specific features of certain objects instead of using features that can improve generalization. As a future work, we expect to include more complex features that provide more contextual information of the pixel and do not limit the model to the color values of a given area and its local entropy. We can also improve system performance by adding new rules to those automatically derived from ANFIS, removing incorrect rules or updating existing rules. In particular, the interpretability and relevance of the learned fuzzy rules may be further investigated using user feedback. In this way, the results may contribute to show that accuracy is not necessarily the single indicator to be considered when studying the quality of an automatic classifier, and a more β€œsensible” learned classification rule could match expert judgment better than purely data-driven approaches. Acknowledgment The research is partially supported by the Italian Ministry of University and Research (MUR) under the PON ARS01_00141 β€œCLOSE” funding. Giovanna Castellano, Ciro Castiello, Gennaro Vessio, and Gianluca Zaza are members of the INdAM Research group GNCS. References [1] R. Wang, J. A. Gamon, Remote sensing of terrestrial plant biodiversity, Remote Sensing of Environment 231 (2019) 111218. [2] M. Weiss, F. Jacob, G. Duveiller, Remote sensing for agricultural applications: A meta- review, Remote Sensing of Environment 236 (2020) 111402. [3] X. Yuan, J. Shi, L. Gu, A review of deep learning methods for semantic segmentation of remote sensing imagery, Expert Systems with Applications (2020) 114417. [4] O. Ronneberger, P. Fischer, T. Brox, U-net: Convolutional networks for biomedical image segmentation, in: International Conference on Medical image computing and computer- assisted intervention, Springer, 2015, pp. 234–241. [5] V. Badrinarayanan, A. Kendall, R. Cipolla, Segnet: A deep convolutional encoder-decoder architecture for image segmentation, IEEE Trans. on PAMI 39 (2017) 2481–2495. [6] R. Dong, X. Pan, F. Li, Denseu-net-based semantic segmentation of small objects in urban remote sensing images, IEEE Access 7 (2019) 65347–65356. [7] Y. Yi, Z. Zhang, W. Zhang, C. Zhang, W. Li, T. Zhao, Semantic segmentation of urban buildings from vhr remote sensing imagery using a deep convolutional neural network, Remote sensing 11 (2019) 1774. [8] J. M. A. Moral, C. Castiello, L. Magdalena, C. Mencar, Explainable Fuzzy Systems: Paving the way from Interpretable Fuzzy Systems to Explainable AI Systems, Springer, 2021. [9] G. Casalino, G. Castellano, C. Castiello, V. Pasquadibisceglie, G. Zaza, A fuzzy rule-based decision support system for cardiovascular risk assessment, in: International Workshop on Fuzzy Logic and Applications, Springer, 2018, pp. 97–108. [10] H. Leon-Garza, H. Hagras, A. PeΓ±a-Rios, A. Conway, G. Owusu, A big bang-big crunch type-2 fuzzy logic system for explainable semantic segmentation of trees in satellite images using hsv color space, in: IEEE Int. Conf. on Fuzzy Systems (FUZZ-IEEE), 2020, pp. 1–7. [11] C. Wang, A. Xu, X. Li, Supervised classification high-resolution remote-sensing image based on interval type-2 fuzzy membership function, Remote Sensing 10 (2018) 710. [12] K. Shihabudheen, G. N. Pillai, Recent advances in neuro-fuzzy system: A survey, Knowledge-Based Systems 152 (2018) 136–162. [13] S. K. Meher, N. S. Kothari, Interpretable rule-based fuzzy elm and domain adaptation for remote sensing image classification, IEEE Trans. on Geoscience and R. Sensing (2020). [14] Z. Tianyu, J. Xu, Hyperspectral remote sensing image segmentation based on the fuzzy deep convolutional neural network, in: 13th International Congress on Image and Signal Processing, BioMedical Eng. and Informatics (CISP-BMEI 2020), IEEE, 2020, pp. 181–186. [15] J.-S. Jang, Anfis: adaptive-network-based fuzzy inference system, IEEE Transactions on Systems, Man, and Cybernetics 23 (1993) 665–685. [16] T. Takagi, M. Sugeno, Fuzzy identification of systems and its applications to modeling and control, IEEE Transactions on Systems, Man, and Cybernetics (1985) 116–132. [17] J.-S. Jang, C.-T. Sun, Functional equivalence between radial basis function networks and fuzzy inference systems, IEEE Transactions on Neural Networks 4 (1993) 156–159. [18] M. Brown, C. J. Harris, Neurofuzzy adaptive modelling and control, Prentice Hall, 1994. [19] Z. Shao, W. Zhou, X. Deng, M. Zhang, Q. Cheng, Multilabel Remote Sensing Image Retrieval Based on Fully Convolutional Network, IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing 13 (2020) 318–328. [20] E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017) 640–651.