Harnessing the Advantages of Binary Networks for Neural-Symbolic Computing Nataliia Kunanets1 , Yuriy Shcherbyna2 and Volodymyr Karpiv2,3 1 Lviv Polytechnic National University, Ukraine 2 Ivan Franko National University of Lviv, Ukraine 3 SoftServe, Ukraine Abstract In the dynamic field of AI, this paper explores the fusion of Neural-Symbolic Computing with binary neural networks, aiming to unify the precise logic of Symbolic AI with the adaptability of Connectionist AI. Focusing on integrating logical reasoning, this approach seeks to overcome the constraints of conventional methodologies. Our study emphasizes the significance of binary networks in achieving computational efficiency and structured logic integration. Utilizing the MNIST dataset, we demonstrate the practicality of our framework, while acknowledging the need to extend our methods to more complex systems and a broader array of datasets. This research lays the groundwork for future AI models that harmoniously combine learning and reasoning, paving the way for enhanced capabilities in various AI applications. Keywords Neural-Symbolic Computing, Symbolic AI, Connectionist AI, Deep Learning, Binary Neural Networks, Logical Reasoning 1. Introduction In the evolving landscape of artificial intelligence (AI), the pursuit of effective computational models has led to diverse philosophies and methodologies. Historically, the AI community has oscillated between two dominant paradigms: Symbolic AI and Connectionist AI. This paper argues for the importance of Neural-Symbolic Computing, a field that synergizes the strengths of both approaches. We aim to establish a framework based on binary neural networks as a foundation for integrating logic and logical operators, addressing a critical gap in current AI methodologies. In the dawn of AI research, Symbolic AI reigned supreme. This paradigm, rooted in formal logic and symbolic reasoning, was driven by the belief that intelligence could be emulated by explicitly programming rules and symbols. Pioneers like Newell and Simon, with their General Problem Solver [1], exemplified this belief. Symbolic AI excelled in domains with well-defined rules and clear objectives, such as chess. However, it struggled with real-world scenarios that required adaptive learning and handling of ambiguous data. COLINS-2024: 8th International Conference on Computational Linguistics and Intelligent Systems, April 12–13, 2024, Lviv, Ukraine Envelope-Open nek.lviv@gmail.com (N. Kunanets); yshcherbyna@yahoo.com (Y. Shcherbyna); volodymyr.karpiv@gmail.com (V. Karpiv) Orcid 0000-0003-3007-2462 (N. Kunanets); 0000-0002-4942-2787 (Y. Shcherbyna); 0009-0003-8439-8043 (V. Karpiv) © 2024 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR ceur-ws.org Workshop ISSN 1613-0073 Proceedings The limitations of Symbolic AI led to the ascendance of Connectionist AI, marked by the development of artificial neural networks. Inspired by biological neural networks, this approach focuses on learning from data, allowing systems to adapt to new situations and recognize patterns. Seminal works like Rumelhart, Hinton, and Williams’ backpropagation algorithm [2] catalyzed the deep learning revolution. However, this shift also led to skepticism towards knowledge-based systems, such as knowledge graphs, which were seen as rigid and unable to cope with the complexity and variability of real-world data. Neural-Symbolic Computing emerges as a promising paradigm that integrates the symbolic reasoning of Symbolic AI with the adaptive learning capabilities of Connectionist AI. This hybrid approach aims to leverage the interpretability and structured knowledge representation of symbolic systems alongside the pattern recognition and learning efficiency of neural networks. Notable works in this domain include the integration of logic programming with neural networks, as demonstrated by Garcez et al. [3], and the development of differentiable logic models. The primary advantage of Neural-Symbolic Computing lies in its potential to handle complex, real-world problems that require both structured knowledge and adaptive learning. It offers interpretability, a critical aspect in fields like healthcare and finance, where understanding decision-making processes is crucial. However, challenges remain, particularly in integrating these paradigms efficiently and ensuring that the hybrid models retain the strengths of both parent domains. This paper contributes to the Neural-Symbolic Computing field by proposing a binary neural network framework. This framework aims to serve as a robust basis for integrating logic and logical operators, addressing a gap in current methodologies. By focusing on binary neural networks, we aim to enhance computational efficiency and provide a more structured approach to logic integration in neural networks. The development of this framework is a step towards more sophisticated AI models that can seamlessly incorporate both learning and reasoning, a crucial advancement for complex problem-solving in various domains. 2. Related Works Symbolic AI, a foundational pillar in artificial intelligence, has significantly evolved through contributions emphasizing logic, symbols, and rule-based processing. The General Problem Solver (GPS) by Newell and Simon was a pioneering development [1], showcasing AI’s ability to replicate human problem-solving using symbolic representations. This was further advanced by John McCarthy’s work on formal logic and knowledge representation Check [4], Check [5], leading to a deeper understanding of how machines manipulate abstract concepts. Terry Winograd’s SHRDLU program [6] extended Symbolic AI into natural language processing, demonstrating machines’ capability to interpret and respond to human language in structured environments. The practical application of Symbolic AI was further exemplified in Feigenbaum and Barr’s DENDRAL project [7], applying it to chemistry, and Nilsson’s STRIPS system for robotics [8], showcasing the versatility of Symbolic AI across various domains. The emergence of Connectionist AI, with its focus on artificial neural networks and data- driven learning, marked a significant shift from Symbolic AI’s rule-based approach. The development of the backpropagation algorithm by Rumelhart, Hinton, and Williams [2] laid the foundation for modern deep learning, emphasizing adaptive learning’s value. Yann LeCun’s convolutional neural networks (CNNs) [9] revolutionized pattern recognition in image classi- fication, demonstrating neural networks’ practical capabilities in visual data processing. The introduction of Long Short-Term Memory (LSTM) networks by Hochreiter and Schmidhuber [10] expanded these applications to sequential data processing, like language understanding, showcasing Connectionist AI’s versatility. Bengio, LeCun, and Hinton’s overview of deep learning [11] and Krizhevsky, Sutskever, and Hinton’s AlexNet [12] further underscored the adaptability and efficacy of neural network-based AI approaches. Neural-Symbolic Computing, an integrative field blending neural networks’ learning ca- pabilities with Symbolic AI’s structured reasoning, emerged as AI research progressed. The integration of logic programming with neural networks by d’Avila Garcez [3], laid the ground- work for combining adaptive learning with logical reasoning. The new approach was introduced by França et al. [13], and Harnad’s exploration of the Symbol Grounding Problem [14]. These studies collectively illustrate Neural-Symbolic Computing’s potential in addressing complex problems requiring both structured knowledge and adaptive learning. Recent reviews in the field of neuro-symbolic computing have illuminated its evolving landscape and critical aspects. Wang and Yang [15] offer a systematic overview of neuro- symbolic computing advancements, emphasizing its role in merging symbolic reasoning with neural network learning for future AI development. Garcez and Lamb [16] discuss the integration of deep learning with logical reasoning in neuro-symbolic computing, stressing the need for AI to be safe and interpretable. Each paper collectively underscores the significance of neuro-symbolic computing in achieving trustworthy and advanced AI systems. In the neuro-symbolic computing domain, key studies have focused on the synergy of symbolic reasoning and neural learning to enhance AI development. Smolensky [17] and [18], in two distinct papers, emphasizes neurocompositional computing, advocating for the integration of Compositionality and Continuity to facilitate advanced AI systems with human-like cognition. Hitzler [19] offers a broad survey of the neuro-symbolic field, noting the blend of machine learning and symbolic AI as a significant trend. Van Krieken [20] delves into differentiable fuzzy logic’s role in neural network training, incorporating symbolic knowledge for improved learning outcomes. Hoernle [21] presents MultiplexNet, a method that integrates logical formulas to refine neural network training and decision-making. Silver [22] explores the application of neuro-symbolic approaches in robotics, particularly in task and motion planning. Giunchiglia [23] investigates the use of logical constraints to enhance deep learning models, focusing on performance and safety. Collectively, these works highlight the importance of merging symbolic and neural methods to create AI systems that are more effective, interpretable, and closely aligned with human cognitive processes. Exploring the forefront of neuro-symbolic computing, recent studies have innovated in architecture, logic, and reasoning frameworks to enhance AI’s capabilities. Karpas [24] presents the MRKL system, a neuro-symbolic architecture combining large language models with discrete reasoning, addressing the limitations of conventional language models. Stehr [25] proposes a Probabilistic Approximate Logic for neuro-symbolic learning, facilitating the integration of domain knowledge and neural computation. Pryor [26] introduces NeuPSL, an energy- based neuro-symbolic framework that significantly improves performance in low-data settings by integrating neural and symbolic learning. Aditya [27] discusses PyReason, a software for open-world temporal logic, enhancing reasoning over graphical structures and providing explainable inference. Hersche [28] proposes a neuro-vector-symbolic architecture (NVSA) that addresses the binding problem and rule-search inefficiencies, demonstrating high accuracy in cognitive tasks. Lastly, Li [29] introduces a softened symbol grounding approach for neuro- symbolic systems, improving the interaction between neural training and symbolic reasoning. These contributions collectively advance the neuro-symbolic field, pushing AI towards greater efficiency, interpretability, and integrated reasoning capabilities. In the realm of visual data analysis, neuro-symbolic computing is revolutionizing the way AI interprets and interacts with imagery. Yu, Yang [30] develop a bi-level probabilistic graphical reasoning framework, BPGR, enhancing Visual Relationship Detection (VRD) by integrating symbolic knowledge with deep learning, improving performance and interpretability. Gupta and Kembhavi [31] introduce VISPROG, a neuro-symbolic system for compositional visual tasks using natural language instructions, bypassing task-specific training by generating modular programs for interpretable solutions. Surís [32] presents ViperGPT, a framework combining vision-and-language models into executable subroutines for visual query answering, improving interpretability and task generalization without further training. Li [33] proposes LOGICSEG, a visual semantic parser that merges neural learning and logic reasoning, structuring semantic concepts hierarchically for improved segmentation and cognition-mimetic reasoning. These innovative approaches demonstrate neuro-symbolic computing’s potential to advance AI’s capabilities in visual data processing, offering more efficient, interpretable, and adaptable solutions. In the sphere of reinforcement learning applications, neuro-symbolic computing is enhancing AI’s problem-solving capabilities. Jin [34] introduces a deep reinforcement learning framework with symbolic options, addressing challenges of data efficiency, interpretability, and transferabil- ity. Their framework, validated in-game and real-world scenarios, shows improved performance by integrating symbolic knowledge to guide policy enhancement through planning and learning from interactive trajectories. Tian [35] proposes a weakly supervised neural symbolic learning model, WS-NeSyL, for cognitive tasks, leveraging logical reasoning. This model enhances learning efficiency and accuracy by using a back search algorithm to generate pseudo labels for supervision and incorporating probabilistic logic regularization. These approaches demonstrate how embedding symbolic reasoning into reinforcement learning can significantly improve AI’s ability to learn and adapt across different domains and tasks. In the quest for efficient AI systems, the field of Binary Networks has emerged as a promising avenue for integrating logic into AI. Courbariaux et al.’s BinaryConnect [36] introduced the concept of training neural networks with binary weights, significantly reducing computational complexity and memory requirements. Rastegari et al.’s study on Binary-Weight-Networks [37] applied binary weights to large-scale image processing tasks, demonstrating these networks’ practicality in complex applications. Hubara et al.’s comprehensive study on Binarized Neural Networks [38] extended the binary concept to both weights and activations, enhancing effi- ciency in network computation and storage. Lin et al.’s research on reducing multiplication operations in neural networks [39] highlighted the importance of computational efficiency in AI deployment, especially in resource-constrained environments. Zhou et al.’s development of DoReFa-Net [40], proposing a method for training neural networks with low bit-width weights and activations, offered insights into balancing efficiency and accuracy in neural architectures. These advancements in Binary Networks represent a significant step towards creating more efficient AI systems, making them highly suitable for integrating logic into AI, particularly in sectors where computational resources are limited. In the realm of binary neural networks, significant strides have been made in various appli- cations as evidenced by several notable papers. Zhuang et al. [41], introduces an innovative approach for detecting similarity in binary code, emphasizing the importance of semantic awareness in neural networks. Martinez et al. [42], explores techniques to enhance the training of binary neural networks, leveraging real-to-binary convolutions for improved efficiency and performance. Bai et al. [43], represents a breakthrough in natural language processing, pushing the boundaries of BERT model quantization to achieve efficient, yet powerful, binary repre- sentations. Lastly, Lin et al. [44], presents a specialized binary neural network design tailored for efficient keyword spotting, showcasing the adaptability and potential of binary networks in audio processing tasks. Together, these works by Zhuang, Martinez, Bai, and Lin highlight the versatility and advancing capabilities of binary neural networks in diverse domains of AI research. This literature review encapsulates the evolution from Symbolic AI’s structured problem- solving to Connectionist AI’s data-driven learning models, the unifying efforts in Neural- Symbolic Computing, and the efficiency-driven innovations in Binary Networks. Each field, with its unique contributions, demonstrates the multifaceted nature of AI research and its continuous progression towards more sophisticated, efficient, and integrated AI systems. 3. Methods 3.1. Challenges of Integrating Logic in Learning-Based Algorithms The integration of logical reasoning into learning-based algorithms faces the fundamental challenge of reconciling two inherently different paradigms: the symbolic, rule-based approach and the connectionist, data-driven approach. Symbolic models excel in structured problem- solving and explicit reasoning, whereas connectionist models thrive on pattern recognition and implicit learning. Merging these models requires a robust framework that can seamlessly accommodate the discrete, structured nature of logical rules within the fluid, statistical nature of neural networks. Another significant challenge is preserving the interpretability of logic-based systems when integrated with learning-based algorithms. Neural networks, especially deep learning models, are often seen as ”black boxes” due to their complex and opaque decision-making processes. Integrating logic into these models demands a methodology that enhances their transparency, ensuring that the decision-making process remains understandable and justifiable, which is vital for applications in critical domains like healthcare and law. Efficiency and scalability pose a third challenge. Traditional logic-based systems are com- putationally intensive and do not scale well with the increasing size and complexity of data, unlike neural networks. Integrating logic into learning-based algorithms requires an approach that can handle large-scale data without compromising on computational efficiency and speed, ensuring that the integrated system is both practical and effective for real-world applications. 3.2. Methods to Integrate Logic in Learning-Based Algorithms A key method for integrating logic into learning-based algorithms involves the creation of hybrid models that blend the strengths of both symbolic and connectionist approaches. In these models, neural networks are typically utilized for their ability in pattern recognition and data-driven inference, while symbolic systems are employed for rule-based reasoning and decision-making. The central aim is to design architectures where these two distinct paradigms can work in harmony, thus leveraging the adaptability and learning prowess of neural networks alongside the structured and clear logical reasoning of symbolic systems. Addressing the challenge of interpretability in neural networks requires the development of transparent mechanisms that can effectively map and elucidate their decision-making processes. This entails devising methods that allow for the visualization and explanation of the neural network’s inferences in a manner that is coherent with logical reasoning. Employing techniques such as attention mechanisms can shed light on the specific aspects of data that neural networks focus on during decision-making. Moreover, the use of explainable AI (XAI) methods can help in making the decisions of neural networks more transparent, ensuring they are in alignment with the principles of logical reasoning. To enhance computational efficiency in the integration of logical reasoning with learning- based algorithms, a dual approach of algorithm optimization and hardware acceleration can be pursued. Developing efficient training algorithms that are less demanding in terms of computational power and memory is essential. Concurrently, utilizing specialized hardware designed for neural network processing, like Graphics Processing Units (GPUs) or Tensor Processing Units (TPUs), can significantly boost the efficiency and scalability of these integrated systems. This optimization of both software algorithms and hardware resources is crucial for a seamless and effective integration of logical reasoning into learning-based algorithms. 3.3. Potential of Binary Networks in Integrating Logic Binary Networks present a simplified computational model compared to traditional neural networks, which significantly benefits the integration of logical reasoning. Their ability to represent weights and activations in binary form greatly reduces computational complexity. This reduction in complexity is particularly harmonious with the structured nature of logical operations, potentially easing and streamlining the process of integrating logic into these networks. The simplicity of binary representation in Binary Networks aligns well with the discrete nature of logical reasoning, suggesting a more natural and effective pathway for merging these two paradigms. Moreover, the binary architecture of these networks drastically cuts down memory require- ments and computational overhead, a crucial advantage when melding them with logic-based systems. Logic systems, with their symbolic nature, tend to be memory-intensive. The efficiency of Binary Networks, therefore, makes them ideally suited for scenarios where computational resources are limited, yet there is a need for robust logical reasoning capabilities. This efficiency not only reduces the strain on resources but also enhances the feasibility of deploying complex AI systems in various real-world applications. Inherent in their design, Binary Networks operate with discrete values, closely mirroring the binary nature of logical operations. This intrinsic compatibility suggests that Binary Networks could serve as an efficient medium for embedding logical reasoning within a neural framework. Such alignment facilitates a more natural integration of logical reasoning in learning-based algorithms, potentially leading to AI systems that are both computationally efficient and logically coherent. The architectural efficiency of Binary Networks extends to the processing of logical rules and operations. When these rules are represented in a binary format, they can be processed more rapidly and seamlessly within the network’s binary architecture. This synergy enhances the system’s overall efficiency and effectiveness, making Binary Networks a promising candidate for developing AI systems that seamlessly blend learning with logical reasoning. A particularly noteworthy advantage of Binary Networks is their potential to eliminate the need for traditional backpropagation, which is a computationally intensive aspect of training conventional neural networks. The simplified learning process inherent in Binary Networks allows for the exploration of alternative learning mechanisms that could be more in tune with the processes of logical reasoning. This capability of integrating logic without relying on backpropagation represents a significant leap forward in creating more efficient and streamlined AI systems. Lastly, Binary Networks take a step towards the realm of neuromorphic computing, where the goal is to develop computing architectures that mimic the neural structures of the human brain. This advancement holds considerable promise for the integration of logical reasoning, as neuromorphic designs could potentially provide a more intuitive framework for combining logical reasoning with learning-based approaches. By aligning AI systems more closely with human-like reasoning processes, Binary Networks could play a pivotal role in the evolution of artificial intelligence. 3.4. Dataset Preparation and Preprocessing The MNIST dataset is a large database of handwritten digits, widely used for training and testing in the field of machine learning. This dataset contains 70,000 images, split into a training set of 60,000 examples and a test set of 10,000 examples. Each image in the MNIST dataset is a 28x28 pixel grayscale representation of a digit (from 0 to 9). The simplicity and size of the MNIST dataset make it ideal for experiments in machine learning and neural network architectures. In our implementation, the MNIST dataset is loaded using the HDF5 file format, a versatile data model that can efficiently handle large, complex data. The dataset is then converted into a floating-point format, which is more suitable for processing with neural networks. Specifically, the pixel values are normalized to aid in the convergence of the training process. The target variable, which is the actual digit each image represents, is converted into a one-hot encoded format. One-hot encoding transforms the categorical data into a binary matrix representation, which is essential for classification tasks in neural networks. Once loaded, the dataset is divided into training and test sets. The training set consists of 60,000 images, while the test set comprises 10,000 images. This separation is crucial for evaluating the performance of the neural network model; the training set is used to train the model, and the test set is used to evaluate its performance on unseen data. The training data undergoes shuffling to ensure that the training process does not get biased by the order of the data. Shuffling the data helps in reducing variance and making sure that models remain general and overfit less. The data is then divided into batches. Batching is a crucial process in neural network training, particularly for large datasets like MNIST. It involves dividing the dataset into smaller, manageable batches, which are then used to train the model iteratively. This approach is not only computationally efficient but also helps in optimizing the neural network more effectively. 3.5. Binary Network Model and Initialization Unlike traditional neural networks that operate with high-precision weights, Binary Networks simplify these elements to binary values (-1 or 1). This architecture leads to a significant reduction in memory requirements and computational overhead, making them particularly suitable for applications where efficiency is a priority. The inherent simplicity of Binary Networks also aligns well with logical operations, potentially facilitating the integration of logical reasoning within a neural framework. In our implementation, the Binary Network is initialized with binary values for weights and biases. Weights and biases are randomly assigned either -1 or 1. This binary initialization is crucial to maintain the network’s binary nature and aligns with the overall computational efficiency goal. The architecture of the network can vary depending on the specified number of layers. For instance, a network with two layers would consist of one hidden layer and one output layer. However, the model’s architecture is flexible and can be adapted with more layers, depending on the complexity of the task at hand. Each layer in the Binary Network is designed to perform specific transformations on the input data. The first layer (input layer) receives the raw input data (in the case of the MNIST dataset, this would be the pixel values of the images). Subsequent hidden layers, if present, are responsible for extracting and processing features from this input. The final layer (output layer) produces the classification result, which, for the MNIST dataset, corresponds to the identified digit. 3.6. Activation Functions In our Binary Network, the Rectified Linear Unit (ReLU) function is the primary activation func- tion, selected for its effectiveness in introducing non-linearity while maintaining computational simplicity. ReLU, is especially suitable for binary network architectures, facilitating complex pattern learning efficiently. However, our framework is designed with inherent flexibility, allowing for easy application of alternative activation functions depending on specific task requirements or desired network characteristics. The network architecture supports several other activation functions, each of which is already implemented and can be seamlessly integrated into the model. These include the sigmoid function, binary step functions (binary01, binary11), and the scaled exponential linear unit (SeLU). The sigmoid function, known for its smooth gradient, is particularly useful in scenarios where a probabilistic output is required: 1 𝜎 (𝑥) = (1) 1 + 𝑒 −𝑥 On the other hand, binary step functions, including binary01 and binary11, align closely with the binary nature of the network, making them ideal for tasks that benefit from a clear, decisive output: 0 if 𝑥 < 0 𝑓01 (𝑥) = { (2) 1 if 𝑥 ≥ 0 −1 if 𝑥 < 0 𝑓11 (𝑥) = { (3) 1 if 𝑥 ≥ 0 The SeLU function introduces self-normalizing properties, which can be advantageous for maintaining stable gradients in deeper network architectures. In the SeLU function, λ λ and α α are predefined constants. Typically, λ ≈ 1.0507 λ≈1.0507 and α ≈ 1.67326 α≈1.67326 to ensure that the mean and variance of the inputs are preserved between layers during training: 𝑥 if 𝑥 > 0 SeLU(𝑥) = 𝜆 { (4) 𝛼𝑒 𝑥 − 𝛼 if 𝑥 ≤ 0 This flexibility in the choice of activation functions allows our Binary Network to be adaptable to a wide range of applications. Whether the task requires smooth probability distributions, clear binary outputs, or stable training dynamics in deep networks, the framework can easily accommodate these needs by simply switching the activation function. This feature enhances the network’s versatility, making it suitable for various machine learning tasks and experimental setups. 3.7. Loss Functions In our Binary Network framework, while cross-entropy is the default loss function, we have integrated two loss functions to provide different options. Cross-entropy, known for its effec- tiveness in classification tasks, measures the difference between the predicted probabilities and the actual distribution of labels. It is particularly valuable in guiding the optimization of neural networks, especially for multi-class tasks like digit recognition in the MNIST dataset. With M as the number of classes, the cross-entropy loss is defined: 𝑀 𝐿 = − ∑ 𝑦𝑜,𝑖 log(𝑝𝑜,𝑖 ) (5) 𝑖=1 However, recognizing the need for versatility in handling different types of problems, our framework also includes the root mean square error (RMSE) as an alternative loss function. RMSE is readily available and can be easily applied for tasks where the focus is on the magnitude of errors. This loss function is especially suitable for regression tasks or scenarios where assessing the accuracy of predictions in a quantitative manner is more relevant than evaluating probabilistic differences. The RMSE loss function is defined: 1 𝑛 𝑑𝑖 − 𝑓𝑖 2 𝑅𝑀𝑆𝐸 = Σ𝑖=1 ( ) (6) √𝑛 𝜎𝑖 The integration of both cross-entropy and RMSE in our framework offers users the flexibility to choose the most appropriate loss function based on their specific task. Whether it’s a classification problem requiring a probabilistic assessment or a regression task needing a quantitative error evaluation, the framework allows for seamless switching between these loss functions. This adaptability makes the Binary Network framework a versatile tool, capable of addressing a wide range of machine learning challenges effectively. 3.8. Training Parameters and Sampling Methods In our Binary Network framework, we diverge from the traditional backpropagation training approach, familiar in neural network training. Instead, we employ a variety of statistical sampling methods. These methods, well-established in statistical analysis, provide an alternative and potentially more efficient pathway for training neural networks, particularly apt for the unique characteristics of a binary architecture. Sampling methods bring a different perspective to network training, allowing for exploration and optimization in a manner distinct from gradient-based approaches. This shift is particularly advantageous in the context of Binary Networks, where the discrete nature of the parameters aligns well with the sampling-based exploration of the solution space. By leveraging these statistical techniques, we aim to harness their robustness and efficiency for effective training in the specialized environment of binary neural networks. The network undergoes training over 50 epochs, allowing for gradual and thorough learning from the MNIST dataset. The MNIST dataset comprises 60,000 training images of handwritten digits, each converted into a 784-dimensional input vector to fit the network’s input layer. The network architecture can have several layers, implying various hidden layers with 64 units each and an output softmax layer for classification. The training process involves batch processing with a size of 600 samples per batch. The batch size and the data-to-invert ratio (200) are calibrated to balance between computational efficiency and training effectiveness. The choice of the sampling method for weight updates plays a crucial role in the network’s training dynamics. The current setup uses global Gibbs Sampling, which is effective for global optimization in binary networks. However, other sampling methods like Local Random Sampling, Global Random sampling, and global Metropolis–Hastings are implemented and can be potential alternatives, each offering different benefits in terms of exploration and exploitation in the weight space. In summary, the current Binary Network implementation is designed with flexibility in mind, allowing for various activation and loss functions to suit different requirements. The choice of ReLU and cross-entropy aligns well with the network’s architecture and the nature of the MNIST dataset. The training process, characterized by specific hyperparameters and a chosen sampling method, is geared towards efficient and effective learning. 3.9. Logical AND Operator Mechanism In our methodology, we integrate the output features of three distinct binary neural networks to enhance prediction accuracy and reliability through a specialized logical operation. This process is not akin to conventional model ensembling techniques; instead, it involves a unique training and inference setup tailored for binary networks. The core of our approach lies in the application of a logical AND operator, designed to make final predictions based on the agreement between the outputs of the three networks. The logical AND operator functions under the principle that if at least two out of the three networks concur on a prediction, this consensus dictates the final output of the operator. This method leverages the collective intelligence of the networks, ensuring that the prediction reflects a majority agreement, thereby increasing the confidence in the decision made. In scenarios where each network outputs a different prediction, indicating a complete disagreement, the methodology defaults to the prediction made by the first network. This decision rule is predicated on the premise that each network, while trained under the same overarching framework, may possess subtle variations in specialization due to differences in initialization or training nuances. By prioritizing the first network’s output, we acknowledge its potential slight edge in capturing the essential features necessary for the task at hand. Let’s denote predictions as 𝑃1 , 𝑃2 , and 𝑃3 respectively. Each prediction 𝑃𝑖 for 𝑖 = 1, 2, 3 can be either 0 or 1, representing the binary output class of each network. The output of the AND operator, denoted as 𝑃AND , can be defined as follows: 3 1 if ∑𝑖=1 𝑃𝑖 ≥ 2, 𝑃AND = { 3 (7) 𝑃1 if ∑𝑖=1 𝑃𝑖 < 2, It is crucial to underline that this strategy diverges fundamentally from traditional model en- sembling. While ensembles typically combine models post-training to leverage their individual strengths during inference, our approach intertwines the combination logic within the training phase itself. This ensures that the networks are not only trained to perform their respective tasks effectively but also to do so in a manner that is synergistic, considering the logical AND operator’s requirements during the decision-making process. 4. Experimental Results 4.1. Network Training and Sampling Method Monte-Carlo sampling and its variants, such as Gibbs sampling and Metropolis-Hastings, have emerged as efficient methods for navigating high-dimensional spaces, which can be effectively applied to deep neural networks. Traditionally, the training of binary networks has involved either utilizing backpropagation while maintaining a full-precision version of the network or adopting Bayesian learning methodologies. However, Monte-Carlo methods present an alternative approach that complements the unique characteristics of binary networks. In our training approach, we implemented distinct strategies for the Gibbs Sampling Net (GSNet) and Metropolis-Hastings Net (MHNet), each involving the flipping of a subset of weights. In the GSNet architecture, which stands for Gibbs Sampling Net, the process involves selectively flipping a set of weights. The acceptance of this new configuration of weights is contingent on a specific criterion: it must lead to a reduction in the loss over a given batch of data. An interesting aspect of GSNet’s training methodology is the progressive increase in batch size relative to the number of epochs. This gradual scaling allows the network to initially focus on learning from smaller data segments, gradually adapting to larger and more varied data as training progresses. MHNet, short for Metropolis-Hastings Net, also engages in flipping a random subset of weights. However, MHNet distinguishes itself from GSNet in its acceptance criterion for new weight configurations. Unlike GSNet, MHNet may accept a new weight configuration that increases the loss, albeit with a small probability. This approach allows MHNet to explore a broader range of solutions in the weight space, potentially avoiding local minima and discovering more optimal configurations. In MHNet, the batch size remains constant over time, and the number of units flipped per batch is determined randomly, adding an element of variability and exploration to the training process. Our networks were benchmarked against the classification task on the MNIST dataset. While BinaryConnect has maintained state-of-the-art (SOTA) results for some time, more recent binary network architectures have focused on improving precision on larger and more complex datasets like ImageNet. However, it’s observed that these networks often do not perform as well on MNIST when compared to BinaryConnect. Initially, we conducted 100 experimental runs with 2000 epochs each, varying the number of hidden layers (1, 2, 3, 5, 10), the number of units (10, 20, 64, 128, 256), and testing sigmoid against ReLU nonlinearities, as well as cross-entropy against RMSE loss functions. The most promising configurations from these initial experiments were then subjected to extended training, spanning tens of thousands of epochs, to optimize for top precision. The training process is analyzed with respect to network depth, number of units, epochs, activation functions, and choice of loss function. It’s noted that training binary networks with either Gibbs Sampling or Metropolis-Hastings requires significantly more epochs to converge. This observation is evident in our results as shown in Figure 1 and Figure 2, where various fully connected architectures trained over 2000 epochs are compared. Interestingly, adding more hidden layers resulted in lower precision compared to architectures with fewer or no hidden layers. In our study, we observed that increasing the number of hidden units in the network initially leads to improved precision up to a certain point, after which the precision begins to decline, as illustrated in Figure 2. This trend suggests a nuanced relationship between the network’s complexity and its performance. Our hypothesis is that while architectures with a greater number of neurons have a higher capacity for learning and abstraction, they also demand a more extended period of training to fully leverage this increased capacity. This extended training requirement might be necessary to optimize the more complex parameter space effectively, thus benefiting from the network’s higher capability. In our experiments, the sigmoid activation function outperformed ReLU in simpler network designs, especially in networks with more hidden layers and units where sigmoid showed less saturation. Regarding loss functions, while RMSE led to better initial precision, cross-entropy loss yielded slightly improved results after extended training. Although binary inputs and activation functions slightly reduced performance, their potential Figure 1: Varying Number of Hidden Layer Units Impact on Classification Accuracy. for computational efficiency is notable. By replacing multiplication with bitwise operations, they could greatly accelerate network inference, making them an intriguing area for further research with various configurations and extended training periods. 4.2. Benchmarking Results In our study, we conducted a comprehensive comparison of the precision across different binary network configurations, see Table 1. It’s important to note that adding more layers and units generally improves the precision of neural networks. However, training networks with significantly more hidden layers and units poses challenges, often leading to convergence issues due to numerical limitations. Implementing techniques like batch normalization could potentially facilitate the training of deeper networks. Given the challenges and variances in training, we evaluate precision for networks either without hidden layers or for networks with a single hidden layer comprising 64 units. These binary networks were benchmarked against a full-precision network with decimal weights, trained for 50 epochs using Stochastic Gradient Descent (SGD) with momentum (see Table 2 for details). BinaryConnect, a notable approach in binary networks, involves using a decimal network for weight updates followed by binarization. This method, thanks to robust backpropagation, achieves high precision in just 250 epochs, showing only a 1% precision drop compared to the decimal network. The binarization process in BinaryConnect involves setting negative weights Figure 2: Varying Number of Hidden Layer Units Impact on Classification Accuracy. to -1 and positive weights to +1. An extension to basic weight binarization includes an additional scaling parameter per layer, ensuring the norm of weights per layer remains consistent post-binarization. This technique showed improved precision in networks with one hidden layer, while having a slight precision decrease in networks without hidden layers. BinaryConnect slightly outperforms GSNet, as shown in Table 1, and this gap could potentially be reduced by implementing batch normalization, an important factor for binary networks. However, it is worth noting that GSNet required a significantly longer training period, 20,000 epochs, as detailed in Table 2, indicating a need for accelerated training methods. On the other hand, MHNet demonstrated very competitive precision with substantially fewer training epochs compared to GSNet. Direct weight binarization yielded promising results, 79.67% and 73.76% as per Table 1, and the addition of a scale factor per layer further improved precision for networks with one hidden layer 76.83% as per Table 1. 4.3. Integration of Logical Reasoning The training process for individual binary neural networks, as evidenced by the systematic reduction in training loss across epochs, indicates a successful optimization trajectory. Ex- periment 1 through Experiment 5, each representing a standalone binary network, shows a consistent decline in loss values, signifying that the networks are effectively learning from the data over time, see Figure 3. This pattern is characteristic of a well-tuned training regimen, where the network parameters are being refined in response to the given training stimuli, leading to improved performance on the training dataset. Table 1 MNIST Benchmarking Results Method No Hidden Layers Single Hidden Layer Full Precision 91.6% 97.6% BinaryConnect 90.6% 96.7% GSNet 88.3% 94.2% MHNet 83.7% 90.1% Weight Binarization 79.7% 73.8% Weight Binarization with Scaling 79.2% 76.8% Table 2 Method Details for the MNIST Classification Method Activation Last Activation Loss Optimization Epochs Other Full Precision sigmoid softmax cross-entropy SGD with momentum 50 - BinaryConnect ReLU identity squared hinge loss ADAM 250 Batch Norm GSNet sigmoid softmax RMSE GS with adaptive rate 20000 - MHNet sigmoid softmax MSE MS with temperature 5000 - Weight Binarization sigmoid softmax cross-entropy SGD with momentum 50 - Weight Binarization with Scaling sigmoid softmax cross-entropy SGD with momentum 50 - Figure 3: Training Loss for five experiments of a single binary network without AND operator. However, an interesting divergence is observed when these binary networks are combined using the proposed AND operator. The resultant system, which integrates the output features of the three networks and adjudicates the final prediction based on a majority rule, exhibits a less favorable optimization pattern. The training loss for this ensemble does not decrease as expected, suggesting that the joint training under the AND constraint is less effective. This could imply that the coupling of outputs in this manner introduces complexity that hinders the learning process, possibly due to conflicting gradients or a loss surface that is difficult to navigate. The intricacies of this combined operation warrant further investigation to understand the underlying causes and to explore potential modifications that could lead to more stable and efficient training dynamics. 5. Discussions In our investigation, we have identified a current limitation within our framework: the training of the AND operator. Despite the individual binary networks showing promising results, the AND operator, which aims to combine the predictions from multiple networks, is not being trained effectively at this stage. A notable limitation in our current research is the challenge of applying our binary network framework to more complex neural architectures. Despite achieving high precision with simpler structures, particularly on the MNIST dataset, scaling up to networks with an increased number of layers and units poses significant challenges. These complex architectures demand more advanced training strategies and heightened computational requirements. Future work will be dedicated to developing methodologies that can efficiently integrate our binary network approach into these more intricate architectures. This will involve exploring new training techniques, possibly incorporating alternative activation functions, and considering innovative layer types tailored for binary networks. Another area for improvement is the extension of our binary network framework to tasks beyond the MNIST dataset. While MNIST serves as a fundamental benchmark in machine learning, it lacks the complexity found in other datasets used for more advanced tasks like natural language processing or detailed image recognition. Our future research aims to apply the binary network framework to a diverse array of datasets and tasks. This expansion is crucial not only to test the versatility of our approach but also to refine the network’s ability to process various types of data. Such an expansion could lead to new insights and improvements in how binary networks are structured and trained, potentially opening up new applications in AI. Finally, a significant area for future exploration is the efficient integration of logical reasoning into the binary network framework. While binary networks are inherently well-suited for logical operations, embedding complex logical reasoning within these networks efficiently remains challenging. Future phases of our research will focus on discovering methods to incorporate sophisticated logical reasoning into binary networks more effectively. This could include developing novel training algorithms, experimenting with hybrid neural-symbolic models, or creating specialized layers for logic processing. The ultimate goal is to enhance binary networks not only in terms of computational efficiency but also to equip them with advanced reasoning and decision-making capabilities. This research represents an early-stage exploration into integrating logical reasoning with binary networks. Currently, its applicability is limited, reflecting the developing nature of this innovative approach. However, with continued development and refinement, this framework has significant potential to scale across a broader range of datasets and tasks in the future. Success with this approach could pave the way for binary networks that not only perform standard computational functions but also possess the capability for advanced reasoning. The prospect of binary networks effectively conducting logical operations opens up exciting possibilities for their application in more complex, real-world scenarios, ultimately enhancing the scope and functionality of AI systems. 6. Conclusions In this paper, we explored the novel integration of Neural-Symbolic Computing with binary neural networks, an innovative approach that merges structured reasoning with the dynamic learning capabilities of Connectionist AI. This pioneering synthesis is designed to forge a cutting- edge framework that seamlessly blends logical operators within AI systems, thereby significantly boosting computational efficiency and adaptability. Our exploration marks a contribution to the field, highlighting the potential of binary networks to revolutionize Neural-Symbolic Computing by offering a more efficient, logical, and adaptable AI architecture. Through our research, we demonstrated that binary neural networks could effectively embody this integration, as evidenced by our experiments with the MNIST dataset. These networks offer a promising avenue for AI applications, especially in scenarios demanding both logical processing and learning adaptability. However, our findings also underscore the challenges in scaling these networks for more complex architectures and broader datasets. Looking ahead, our research opens several pathways for further exploration. The potential expansion of binary network applications to more sophisticated tasks, and their adaptation to handle a wider variety of data types, stand out as promising future endeavors. Additionally, the efficient integration of logical reasoning into these networks remains a pivotal area for ongoing development. In conclusion, our work contributes to the broader field of AI by proposing a novel approach that leverages the strengths of both traditional symbolic systems and modern neural networks. The development of this binary neural network framework marks a step towards creating more advanced AI models capable of seamless learning and reasoning. It is a step towards the realization of AI systems that are not only efficient and powerful but also interpretable and adaptable, capable of tackling complex problems across various domains. References [1] A. Newell, J. C. Shaw, H. A. Simon, Report on a general problem-solving program, Proceedings of the International Conference on Information Processing (1959). [2] D. E. Rumelhart, G. E. Hinton, R. J. Williams, Learning representations by back-propagating errors, Nature (1986). doi:10.1038/323533a0 . [3] A. S. Garcez, L. C. Lamb, D. M. Gabbay, Neural-symbolic cognitive reasoning, 2008. [4] J. McCarthy, Programs with common sense, 1959. [5] J. McCarthy, Situations, actions, and causal laws, Comtex Scientific, 1963. [6] T. Winograd, Procedures as a representation for data in a computer program for under- standing natural language (1971). [7] B. G. Buchanan, E. A. Feigenbaum, Dendral and meta-dendral: Their applications di- mension, Readings in artificial intelligence (1981). doi:10.1016/B978- 0- 934613- 03- 3. 50026- X . [8] R. E. Fikes, N. J. Nilsson, Strips: A new approach to the application of theorem proving to problem solving, Artificial intelligence (1971). doi:10.1016/0004- 3702(71)90010- 5 . [9] Y. LeCun, B. Boser, J. S. Denker, D. Henderson, R. E. Howard, W. Hubbard, J. L. D., Back- propagation applied to handwritten zip code recognition, Neural computation (1989). doi:10.1162/neco.1989.1.4.541 . [10] S. Hochreiter, J. Schmidhuber, Long short-term memory, Neural computation (1997). doi:10.1162/neco.1997.9.8.1735 . [11] Y. LeCun, Y. Bengio, G. Hinton, Deep learning, Nature (2015). doi:10.1038/nature14539 . [12] A. Krizhevsky, I. Sutskever, G. E. Hinton, Imagenet classification with deep convolutional neural networks, Advances in neural information processing systems (2012). doi:10.1145/ 3065386 . [13] M. V. França, G. Zaverucha, A. S. d’Avila Garcez, Fast relational learning using bottom clause propositionalization with artificial neural networks, Machine learning (2014). doi:10.1007/s10994- 013- 5392- 1 . [14] S. Harnad, The symbol grounding problem, Physica D: Nonlinear Phenomena (1990). doi:10.1016/0167- 2789(90)90087- 6 . [15] W. Wang, Y. Yang, Towards data-and knowledge-driven artificial intelligence: A survey on neuro-symbolic computing, arXiv preprint (2022). doi:10.48550/arXiv.2210.15889 . [16] A. d’Avila Garcez, L. C. Lamb, Neurosymbolic ai: The 3 rd wave, Artificial Intelligence Review (2023). doi:10.48550/arXiv.2012.05876 . [17] P. Smolensky, R. T. McCoy, R. Fernandez, M. Goldrick, J. Gao, Neurocompositional computing in human and machine intelligence: A tutorial, Microsoft Technical Report (2022). doi:10.48550/arXiv.2205.01128 . [18] P. Smolensky, R. McCoy, R. Fernandez, M. Goldrick, J. Gao, Neurocompositional computing: From the central paradox of cognition to a new generation of ai systems, AI Magazine (2022). doi:10.1002/aaai.12065 . [19] P. Hitzler, A. Eberhart, M. Ebrahimi, M. K. Sarker, , L. Zhou, Neuro-symbolic approaches in artificial intelligence, National Science Review (2022). doi:10.1093/nsr/nwac035 . [20] E. van Krieken, E. Acar, , F. van Harmelen, Analyzing differentiable fuzzy logic operators, Artificial Intelligence (2022). doi:10.1016/j.artint.2021.103602 . [21] N. Hoernle, R. M. Karampatsis, V. Belle, K. Gal, Multiplexnet: Towards fully satisfied logical constraints in neural networks, Artificial Intelligence (2022). doi:10.48550/arXiv. 2111.01564 . [22] T. Silver, A. Athalye, J. B. Tenenbaum, T. Lozano-Perez, L. P. Kaelbling, Learning neuro- symbolic skills for bilevel planning, Robot Learning (2022). doi:10.48550/arXiv.2206. 10680 . [23] E. Giunchiglia, M. Stoian, T. Lukasiewicz, Deep learning with logical constraints, IJ- CAI/AAAI Press (2022). doi:10.24963/ijcai.2022/767 . [24] E. Karpas, O. Abend, Y. Belinkov, B. Lenz, O. Lieber, N. Ratner, Y. Shoham, H. Bata, Y. Levine, K. Leyton-Brown, Mrkl systems: A modular, neuro-symbolic architecture that combines large language models, external knowledge sources and discrete reasoning, arXiv preprint (2022). doi:10.48550/arXiv.2205.00445 . [25] M.-O. Stehr, M. Kim, C. Talcott, A probabilistic approximate logic for neuro-symbolic learning and reasoning, Log Algebr Methods Program (2022). doi:10.1016/j.jlamp.2021. 100719 . [26] C. Pryor, C. Dickens, E. Augustine, A. Albalak, W. Wang, , L. N. Getoor, Neural probabilistic soft logic, IJCAI-23 (2022). doi:10.48550/arXiv.2205.14268 . [27] D. Aditya, K. Mukherji, S. Balasubramanian, A. Chaudhary, P. P. Shakarian, Software for open world temporal logic, arXiv preprint (2023). doi:10.48550/arXiv.2302.13482 . [28] M. Hersche, M. Zeqiri, L. Benini, A. Sebastian, A. Rahimi, A neuro-vector-symbolic architecture for solving raven’s progressive matrices, Nature Machine Intelligence (2023). doi:10.48550/arXiv.2203.04571 . [29] Z. Li, Y. Yao, T. Chen, J. Xu, C. Cao, X. Ma, L. Jianetal, Softened symbol grounding for neuro- symbolic systems, The Eleventh International Conference on Learning Representations (2023). doi:10.48550/arXiv.2403.00323 . [30] D. Yu, B. Yang, Q. Wei, A. Li, S. Pan, A probabilistic graphical model based on neural-symbolic reasoning for visual relationship detection, CVPR (2022). doi:10.1109/ CVPR52688.2022.01035 . [31] T. Gupta, A. Kembhavi, Visual programming: Compositional visual reasoning without training, IEEE Conf. Comput. Vis. Pattern Recognit. (2023). doi:10.48550/arXiv.2211. 11559 . [32] D. S. ́ıs, S. Menon, C. Vondrick, Vipergpt: Visual inference via python execution for reasoning, ICCV (2023). doi:10.48550/arXiv.2303.08128 . [33] L. Li, W. Wang, Y. Yang, Logicseg: Parsing visual semantics with neural logic learning and reasoning, ICCV (2023). doi:10.1109/ICCV51070.2023.00381 . [34] M. Jin, Z. Ma, K. Jin, H. H. Zhuo, C. Chen, C. Yu, Creativity of ai: Automatic symbolic option discovery for facilitating deep reinforcement learning, AAAI Conference on Artificial Intelligence (2022). doi:10.1609/aaai.v36i6.20663 . [35] J. Tian, Y. Li, W. Chen, L. Xiao, H. He, Y. Jin, Weakly supervised neural symbolic learning for cognitive tasks, AAAI (2022). doi:10.1609/aaai.v36i5.20533 . [36] M. Courbariaux, Y. Bengio, J.-P. David, Binaryconnect: Training deep neural networks with binary weights during propagations, Advances in neural information processing systems (2015). doi:10.48550/arXiv.1511.00363 . [37] M. Rastegari, V. Ordonez, J. Redmon, A. Farhadi, Xnor-net: Imagenet classification using binary convolutional neural networks, European conference on computer vision (2016). doi:10.1007/978- 3- 319- 46493- 0_32 . [38] I. Hubara, M. Courbariaux, D. Soudry, R. El-Yaniv, Y. Bengio, Binarized neural networks, Advances in neural information processing systems (2016). [39] Z. Lin, M. Courbariaux, R. Memisevic, Y. Bengio, Neural networks with few multiplications, arXiv preprint (2015). doi:10.48550/arXiv.1510.03009 . [40] S. Zhou, Y. Wu, Z. Ni, X. Zhou, H. Wen, Y. Zou, Dorefa-net: Training low bitwidth convolutional neural networks with low bitwidth gradients, arXiv preprint (2016). doi:10. 48550/arXiv.1606.06160 . [41] Z. Yu, R. Cao, Q. Tang, S. Nie, J. Huang, S. Wu, Order matters: Semantic-aware neural networks for binary code similarity detection, Proceedings of the AAAI conference on artificial intelligence (2020). doi:10.1609/aaai.v34i01.5466 . [42] B. Martinez, J. Yang, A. Bulat, G. Tzimiropoulos, Training binary neural networks with real-to-binary convolutions, arXiv preprint (2020). doi:10.48550/arXiv.2003.11535 . [43] H. Bai, W. Zhang, L. Hou, L. Shang, J. Jin, X. Jiang, I. King, Binarybert: Pushing the limit of bert quantization, arXiv preprint (2021). doi:10.48550/arXiv.2012.15701 . [44] H. Qin, X. Ma, Y. Ding, X. Li, Y. Zhang, Y. Tian, X. Liu, Bifsmn: Binary neural network for keyword spotting, arXiv preprint (2022). doi:10.48550/arXiv.2202.06483 .