Position paper: On the role of abductive reasoning in semantic image segmentation Andrea Rafanelli1,2,* , Stefania Costantini2 and Andrea Omicini3 1 Department of Computer Science, University of Pisa, Italy 2 Department of Information Engineering, Computer Science and Mathematics, University of L’Aquila, Italy 3 Department of Computer Science and Engineering, Alma Mater Studiorum—Università di Bologna, Italy Abstract This position paper provides insights aiming at resolving the most pressing needs and issues of computer vision algorithms. Specifically, these problems relate to the scarcity of data, the inability of such algorithms to adapt to never-seen-before conditions, and the challenge of developing explainable and trustworthy algorithms. This work proposes the incorporation of reasoning systems, and in particular of abductive reasoning, into image segmentation algorithms as a potential solution to the aforementioned issues. Keywords neuro-symbolic, abductive reasoning, semantic segmentation, XAI 1. Introduction Semantic image segmentation is an area of computer vision that has seen significant advance- ments in recent years. This technique is extremely common today since it facilitates the description, categorization, and visualisation of regions of interest in a picture [1]. The primary goal of semantic image segmentation is to classify distinct portions of an image as separate objects by associating each pixel with a specific class label, and therefore recognising and understanding what is represented in the image at the pixel level. To date, there is a plethora of applications that make use of these algorithms. Self-driving cars and medical image analysis are some of the most significant applications. In certain scenarios, such as the two reported here, there is an apparent problem of data scarcity. In general, one of the main problems is that Deep Learning (DL) algorithms require annotated data to succeed. This is extremely evident in the domain of computer vision, as it is necessary to annotate every visual element contained in an image. Consequently, there is a remarkable demand of data-sets with a large number of annotations. Furthermore, if one considers the real world, as opposed to a simulated environment, the available data are sequential, subject to continual changes, and frequently unlabelled. In this regard, semantic segmentation algorithms should continuously Discussion Papers - 22nd International Conference of the Italian Association for Artificial Intelligence (AIxIA 2022) * Corresponding author. $ andrea.rafanelli@phd.unipi.it (A. Rafanelli); stefania.costantini@univaq.it (S. Costantini); andrea.omicini@unibo.it (A. Omicini)  0000-0001-8626-2121 (A. Rafanelli); 0000-0002-5686-6124 (S. Costantini); 0000-0002-6655-3869 (A. Omicini) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) adapt themselves to new contexts and data, asking for a certain degree of generalisation. As neural network models are subjected to sparse or low-quality data, it is widely recognised that their performance degrades drastically. Therefore, it is essential to address these issues by proposing alternatives to conventional systems. For instance, via solutions employing both symbolic and sub-symbolic systems (Neuro-Symbolic architectures). Here, the apparent benefit of integrating logical constructs into sub-symbolic models is related to the reduction in learning time, the possibility to improve the prediction accuracy, the decrease in the amount of data required to train the model, and a greater propensity for the model to adapt to previously un-observed situations. A particular type of reasoning, known as abductive reasoning, is embraced in this proposal. Abductive reasoning was studied and theorised by C.S. Peirce [2]. It is a form of reasoning in which hypotheses are developed in order to explain observations. In general, it is a process of reasoning invoked to explain an observation by inferring causes from observed effects. It can, therefore, be defined as an explanatory process. Abduction is a kind of non-monotonic form of reasoning in which the truth of a statement can vary when additional evidence is introduced. This reasoning is also highly adaptable to real-world circumstances, where it is frequently necessary to change some findings in light of fresh information. According to [2], humans use abduction to draw significant conclusions by focusing on a number of observations. On the basis of inadequate evidence, abductive reasoning generates hypotheses. Wherever massive data sets are not available for training a neural network model, the utility of this particular way of reasoning becomes evident. There are three main reasons why this particular type of reasoning is proposed. The first reason relates to the explainability of the algorithm. Abduction is defined to provide the formulation of potential explanations. In sensitive domains, such as those discussed previously involving autonomous driving and medical diagnosis, it is crucial to have an explanation for algorithmic decisions. The second reason, on the other hand, concerns the problem of the lack of data and the consequent inability of the algorithm to adapt to new settings, i.e. its lack of generalisation. In fact, providing the system with a suitable number of instances is not simple, firstly because it is difficult to predict in advance which information would be significant for the learning task [3], and secondly because a big amount of data is not always available. In this sense, abduction can be used to generate additional instances for a system and compensate for the scarcity of data in a specific domain. The third reason is related with limiting the search space. Training sub-symbolic models, such as neural networks, takes not only a large amount of data, but also a significant amount of time in order to detect and learn patterns below a particular class label. The greater the number of instances, the greater the algorithm’s accuracy and robustness, but the longer it takes to learn. Abduction in this sense could aid in the reduction of the search space. Given a Knowledge Base (KB) and a set of constraints, the minimal number of possible hypotheses 𝜃 is formed. Let be 𝜃 = {𝐻1 , 𝐻2 , 𝐻𝑁 }, where 𝐾𝐵 ∪ 𝐻 ⊨ 𝑂. In practice 𝐻 is an hypothesis for the observation 𝑂 asserting 𝑂 w.r.t the 𝐾𝐵. Consequently, only those feasible inferences formed from 𝜃 and deduced according to the KB are considered to be viable solutions. In accordance with the aforementioned considerations, we believe that the employment and integration of abductive reasoning into image segmentation algorithms can lead to improve- ments in terms of both learning time and the amount of data required for model training. We also believe that the potential of abduction can lead to a symbolic augmentation resulting in a final model that is more robust, explainable, and trustworthy. 1.1. Outline The remainder of this article is organised as follows: the semantic image segmentation state-of- the-art is detailed in Section 2 through a discussion of the principal methodologies used and the principal limitations inherent in this discipline; the motivations for our proposal are stated in Section 3; Section 4 outlines and explains in depth the primary challenges of this work; finally, two ideas for a framework implementing our proposal are presented in Section 5. 2. Background Semantic segmentation refers to the process of identifying which elements occur in an image and where they are located. Convolutional architectures are the most common architectures used for semantic segmentation [4]. Semantic image segmentation employs a multitude of models, the most significant of which are listed below. Fully Connected Layers (FCN) models are one of the best-known techniques [5]. These approaches are innovative since they use skip connections to connect non-adjacent layers. It is important to mention that nearly all the following methodologies used for semantic image segmentation adhere to the concept of skip connections. However, despite their widespread use, one of the major issues with these structures is their fully convolutional nature, which renders them incapable to localise labels within the feature hierarchy, and to process global context knowledge. As a consequence, the approaches involved to solve this issue diverge from the fully-connected models. Object-based techniques, of which R-CNN models are the most well-known, are one example. R-CNNs use a Regional Proposal Network (RPN) to propose unique regions inside a picture, followed by a Convolutional Neural Network (CNN) to locate items within each region. Therefore, the essential structure of this form of architecture consists of a collection of systems for determining the position of bounding boxes and the types of items that can be located within them. Over the years, other enhancements to the original system have been proposed, including R- CNN [6], Fast R-CNN [7], Faster R-CNN [8], and Mask R-CNN [9]. In this regard, the application of object-based approaches for semantic segmentation is a field that will be considerably advanced in the near future. However, image segmentation is still a young subject with many open problems and challenges. Consequently, new ideas and models are arising. Few-Shot Semantic Segmentation [10] is an example of a technique that attempts to perform segmentation using a small number of annotated instances rather than massive data-sets; similarly, Zero- Shot Semantic Segmentation [11] attempts to build visual features using word embedding vectors when no training data are provided. Semantic segmentation is extended by instance segmentation, which offers distinct labels for independent instances of elements belonging to the same class of objects. According to [12], the problem of instance segmentation is to simultaneously discover a solution for object recognition and semantic segmentation. Although numerous techniques developed over the years have demonstrated good performance in specific situations, a notable downside is that the classifier must be retrained every time new object categories are identified, in addition to the fact that object recognition at different scales is a big challenge in both semantic and instance segmentation. Other issues, including object occlusion and image degradation, are also highlighted. 3. Aims This research seeks to solve specific difficulties in the field of semantic segmentation by examin- ing the incorporation, with such models, of a reasoning module that might improve classification results and provide reasoning processes to the system. The introduction of a reasoning module to the chain would permit the incorporation of global context information, which would be useful for image segmentation. As also noted by [13], while neural networks are extremely effective at extracting local features and making accurate predictions with a small field of view, they lack the ability to use global context information and, therefore, cannot model interactions between different object instances directly. The objective is therefore to introduce a form of reasoning, the abductive one, that provides the system with the ability to abstract from its predictions, thereby guaranteeing an higher degree of generalisation of the model. In fact, the reasoning module could circumvent the problem of limited data by generating abductive hypotheses to be utilised as new examples in the training phase and by providing the system with some form of commonsense reasoning that can produce responses in the presence of unexplored new scenarios. In addition, this strategy could enable the model to obtain some explainability and interpretability characteristics. According to [2], abductive reasoning is motivated by the fact that, if the hypothesis were true, observable events would have been inevitable consequences of them. Abductive reasoning is considered as reliable when applied to real-world problems since it starts by providing initial explanations. Obviously, these explanations will require further investigation in the future to express their veracity. This form of reasoning can enable the neural network model (which is a black-box model) to provide external users with explanations of the outputs generated by its internal complex processes. Consequently, the main aim of this study is to propose a general hybrid framework for semantic image segmentation that can overcome some of the issues highlighted above and that, in the future: (i) can be applied to a variety of scenarios; (ii) can be used with a small amount of data; and (iii) is ultimately explicable, interpretable, and reliable. 4. Challenges The main idea here is to integrate abductive reasoning capabilities with CNNs—specifically segmentation-specific CNNs. The main challenges we aim to solve are outlined below: Data scarcity — One of the primary advantages of employing Neuro-Symbolic systems, i.e., systems incorporating the integration of symbolic and sub-symbolic systems, is the ability to use and combine the strengths of the two types of systems. Among them, it is argued that these hybrid models can be trained end-to-end with limited data. Evidently, in the case of computer vision, this strategy proves effective. In particular, one of the reasons why the combination of convolutional neural networks with a symbolic approach that exploits abductive reasoning is recommended is the needed to attempt to solve the data scarcity issue that frequently arises in this field. Robustness — In the area of computer vision, there are typically issues with image distortion or, as discussed in Section 2, issues with image occlusion, degradation, etc. If the model is not robust enough, these occurrences can result in a substantial loss of performance. Model robustness relates to system’s resistance to data disturbances and specifies the extent to which findings from a sample may be applied to other samples from the population. In a way, it is desired to believe that symbolic knowledge can be of support in this regard. In fact, the use of logical terms enables the neural network to learn and reason on the image without requiring millions of samples to comprehend what lies behind a blurred or obstructed image. It has been proven that using symbolic knowledge within a DNN can bring robustness to the learning system [14]. In the specifics of this proposed integration of abductive reasoning with CNN models, this can be seen as a ways of improving the generalisability of the model by generating and introducing additional hypotheses. Such hypotheses, in practice, are used as new possible answers to be considered and explored, hence enhancing the model’s ability to reason and drop into previously untested and unobserved scenarios. Explainability — In recent years, the demand for explainable algorithms has increased, in part because of the regulations and guidelines provided by the EU in the GDPR. Nonetheless, there is no unanimity on how to build these explanations. The integration we propose here may be an option. The employment of a symbolic basis with a neural network can produce explanations that closely resemble how human thinking works while preserving cutting-edge performance [15]. In addition, abductive reasoning can be viewed as a means of fortifying and enhancing the explainability of a model. Abduction is frequently defined as a method for inferring the best explanation since it tries to make inexplicable facts understandable. In a sense, it is anticipated that the ability to employ this form of reasoning will enable human-comprehensible explanations of the algorithm’s findings. Trustworthiness — Multiple features contribute to the trustworthiness of artificial intelligent systems, such as explainability, robustness, transparency, and accuracy [16]. For the same reasons discussed above, it can be argued that combining abductive reasoning into neural network can give rise to reliable systems. Specifically, a system is considered trustworthy when humans have confidence in the algorithm’s decisions, i.e., when they comprehend how and why certain decisions were made [17]. In this sense, system’s trustworthiness depends largely on its explicability and interpretability. Abductive reasoning, as stated in the previous point, is particularly adept at finding potential solutions in the presence of incomplete or misleading information, making it ideal for real-world situations. In this sense, this type of reasoning should be incorporated into artificial intelligence systems prior to achieving trusted autonomy, which is, according to a Forbes article1 , the point at which human trust artificial intelligence systems to perform complex tasks requiring 1 https://www.forbes.com/sites/forbestechcouncil/2017/08/30/why-human-like-reasoning-is-the-key-to-trusted-ai/ ?sh=2dcf1ff54edf Figure 1: Knowledge injection flexibility and rapid decision-making, even in the presence of incomplete or inaccurate data. 5. Proposal The potential of incorporating a reasoning module – specifically, one that performs abductive reasoning – into convolutional analysis is likely to result as beneficial in light of the aforemen- tioned limitations and requirements in the field of image segmentation. This section covers potential implementation frameworks for the development of this hybrid system. Obviously, the frameworks outlined at this time are primarily hypothetical and will require further research, implementation, and extensive experiments. The initial phase of both frameworks is identical: the neural network receives the image as in- put, while concepts are taken from the image to construct a Knowledge Base (KB) to symbolically represent what the black-box has learned. Next, two techniques should be investigated: Knowledge injection — The KB constrains the neural network in order to enhance its perfor- mances. The reasoner provides an explanation of the ultimate decision at the conclusion Figure 2: Knowledge learning of the procedure, based on the abductive hypotheses that were formed. In addition to providing explainability to the model, the advantage of this method is to supply symbolic information to the neural network during the training phase, hence reducing the learning time and the amount of input data (Figure 1); Knowledge learning — An abductive reasoner is utilised to adjust and correct the network’s output. In this sense, revisions are generated by abduction based on the knowledge base’s contents (Figure 2). Specifically, this concept is derived from the ABL framework [18] [19], which consists of three steps: i) initially, the model 𝑓 is utilised to get the symbolic predictions 𝑧 = 𝑓 (𝑥) as pseudo-labels, ii) ABL then modifies the pseudo-labels 𝑧 to 𝑧ˆ via abductive reasoning. There are typically multiple candidates 𝑧ˆ that are consistent with KB. The reasoning model deduces the most likely right pseudo-labels 𝑧ˆ based on the idea of minimal incosistency. iii) ABL treats the 𝑧ˆ labels as ground-truth labels in order to modify the model. This procedure is repeated iteratively. Given the two frameworks above, one might also consider a fusion of the two, potentially resulting in a system where both symbolic knowledge injection and a review of the final results with eventual system adjustments are possible. In truth, there are several paths worth to investigate, and it is currently unclear which ones should be actually pursued. 6. Conclusion In conclusion, this proposal highlights the potentials of combining a symbolic system that performs abductive reasoning with a sub-symbolic one in the context of semantic image seg- mentation. This integration may enable model training and adaption without the need for a large number of samples. In this way, it would be possible to design a system that is very generalisable across domains by using knowledge to make up for the small number of training instances. Due to the difficulty of studying the core representations and reasoning operations of neural networks, this could serve as the foundation for support in a scenario involving explainable artificial intelligence (eXplainable AI). The resulting integrated system will produce an explainable framework intended to incorporate knowledge representation and reasoning into black-box technologies. Even in this regard, as discussed above, the use of abductive reasoning is not arbitrary in our proposal; in fact, it is one of the most well-studied forms of logic to be explicable. This is because the abductive approach identifies plausible explanations for each alternative and then selects the most probable candidate. In perspective, the conclusions that might be reached from this line of reasoning are consequently open to interpretation and explanation. Acknowledgments This work has been partially supported by the European Community Horizon 2020 programme under the funding schema ERC-2018-ADG G.A. 834756 XAI: Science and technology for the eXplanation of AI decision making, and by the CHIST-ERA IV project “Expectation” – CHIST- ERA-19-XAI-005 –, co-funded by EU and the Italian MUR (Ministry for University and Research). References [1] M. K. Kar, M. K. Nath, D. R. Neog, A review on progress in semantic image segmentation and its application to medical images, SN Computer Science 2 (2021) 397. doi:10.1007/ s42979-021-00784-5. [2] F. Bellucci, A. Pietarinen, Peirce on the justification of abduction, Studies in History and Philosophy of Science Part A 84 (2020) 12–19. doi:10.1016/j.shpsa.2020.04.003. [3] F. Bergadano, V. Cutello, D. Gunetti, Abduction in machine learning, in: D. M. Gabbay, R. Kruse (Eds.), Abductive Reasoning and Learning, Springer Netherlands, Dordrecht, 2000, pp. 197–229. doi:10.1007/978-94-017-1733-5_5. [4] F. Lateef, Y. Ruichek, Survey on semantic segmentation using deep learning techniques, Neurocomputing 338 (2019) 321–348. doi:10.1016/j.neucom.2019.02.003. [5] E. Shelhamer, J. Long, T. Darrell, Fully convolutional networks for semantic segmentation, IEEE Transactions on Pattern Analysis and Machine Intelligence 39 (2017) 640–651. doi:10. 1109/TPAMI.2016.2572683. [6] R. Girshick, J. Donahue, T. Darrell, J. Malik, Rich feature hierarchies for accurate object detection and semantic segmentation, in: 2014 IEEE Conference on Computer Vision and Pattern Recognition, 2014, pp. 580–587. doi:10.1109/CVPR.2014.81. [7] R. Girshick, Fast R-CNN, in: 2015 IEEE International Conference on Computer Vision (ICCV), 2015, pp. 1440–1448. doi:10.1109/ICCV.2015.169. [8] S. Ren, K. He, R. Girshick, J. Sun, Faster R-CNN: Towards real-time object de- tection with region proposal networks, in: C. Cortes, N. Lawrence, D. Lee, M. Sugiyama, R. Garnett (Eds.), Advances in Neural Information Processing Sys- tems, volume 28, Curran Associates, Inc., 2015. URL: http://papers.nips.cc/paper/ 5638-faster-r-cnn-towards-real-time-object-detection-with-region-proposal-networks. [9] K. He, G. Gkioxari, P. Dollár, R. Girshick, Mask R-CNN, in: 2017 IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988. doi:10.1109/ICCV.2017. 322. [10] K. Wang, J. H. Liew, Y. Zou, D. Zhou, J. Feng, PANet: Few-shot image semantic segmentation with prototype alignment, in: Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2019. doi:10.1109/ICCV.2019.00929. [11] M. Bucher, T.-H. VU, M. Cord, P. Pérez, Zero-shot semantic segmentation, in: H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, R. Garnett (Eds.), Advances in Neural Information Processing Systems, volume 32, Curran Associates, Inc., 2019. URL: http://papers.nips.cc/paper/8338-zero-shot-semantic-segmentation. [12] F. Fan, X. Zeng, S. Wei, H. Zhang, D. Tang, J. Shi, X. Zhang, Efficient instance segmentation paradigm for interpreting SAR and optical images, Remote Sensing 14 (2022). doi:10. 3390/rs14030531. [13] M. Teichmann, R. Cipolla, Convolutional CRFs for semantic segmentation, in: 30th British Machine Vision Conference 2019 (BMVC 2019), BMVA Press, 2019, p. 142. URL: https://bmvc2019.org/wp-content/uploads/papers/0865-paper.pdf. [14] A. S. d’Avila Garcez, M. Gori, L. C. Lamb, L. Serafini, M. Spranger, S. N. Tran, Neural- symbolic computing: An effective methodology for principled integration of machine learning and reasoning, Journal of Applied Logics 6 (2019) 611–632. URL: https:// collegepublications.co.uk/ifcolog/?00033. [15] R. R. Hoffman, W. J. Clancey, S. T. Mueller, Explaining AI as an exploratory process: The peircean abduction model, CoRR abs/2009.14795 (2020). URL: https://arxiv.org/abs/2009. 14795. arXiv:2009.14795. [16] M. Harbers, R. Verbrugge, C. Sierra, J. Debenham, The examination of an information-based approach to trust, in: J. S. Sichman, J. Padget, S. Ossowski, P. Noriega (Eds.), Coordination, Organizations, Institutions, and Norms in Agent Systems III, volume 4870 of Lecture Notes in Computer Science, Springer, 2008, pp. 71–82. doi:10.1007/978-3-540-79003-7_6. [17] C. Rudin, Stop explaining black box machine learning models for high stakes decisions and use interpretable models instead, Nature Machine Intelligence 1 (2019) 206–215. doi:10.1038/s42256-019-0048-x. [18] W. Dai, Q. Xu, Y. Yu, Z. Zhou, Bridging machine learning and logical reason- ing by abductive learning, in: H. M. Wallach, H. Larochelle, A. Beygelzimer, F. d’Alché-Buc, E. B. Fox, R. Garnett (Eds.), Advances in Neural Information Process- ing Systems 32: Annual Conference on Neural Information Processing Systems 2019 (NeurIPS 2019), 2019, pp. 2811–2822. URL: https://proceedings.neurips.cc/paper/2019/hash/ 9c19a2aa1d84e04b0bd4bc888792bd1e-Abstract.html. [19] Y.-X. Huang, W.-Z. Dai, L.-W. Cai, S. H. Muggleton, Y. Jiang, Fast abductive learning by similarity-based consistency optimization, in: M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, J. W. Vaughan (Eds.), Advances in Neural Information Processing Systems, volume 34, Curran Associates, Inc., 2021, pp. 26574–26584. URL: https://proceedings. neurips.cc/paper/2021/hash/df7e148cabfd9b608090fa5ee3348bfe-Abstract.html.