<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Visual Sudoku Puzzle Classification: A Suite of Collective Neuro-Symbolic Tasks</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Eriq Augustine</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Connor Pryor</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Charles Dickens</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Jay Pujara</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>William Yang Wang</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Lise Getoor</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>University of California</institution>
          ,
          <addr-line>Santa Barbara</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>University of California</institution>
          ,
          <addr-line>Santa Cruz</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Southern California</institution>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Neuro-symbolic computing (NeSy) is an emerging field that has the goal of integrating the low-level representational power of deep neural networks with high-level symbolic reasoning. Due to the youth of the field and the complexity of neuro-symbolic integration, there are few benchmarks that showcase the powers of NeSy, and even fewer built specifically with NeSy in mind. To address the lack of NeSy benchmarks, we introduce Visual Sudoku Puzzle Classification (ViSudo-PC). ViSudo-PC is a new NeSy benchmark dataset combining visual perception with relational constraints. The goal of the benchmark is to both highlight opportunities and elicit challenges. In addition to providing a new NeSy benchmark suite, we also provide an exploratory analysis that showcases ViSudo-PC's dificulty and possibilities.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Benchmark</kwd>
        <kwd>Dataset</kwd>
        <kwd>Neuro-Symbolic Integration</kwd>
        <kwd>Relational Data</kwd>
        <kwd>Structured Prediction</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Integrating neural and symbolic reasoning is a
long-standing challenge in the machine
learning community. Neuro-symbolic computing
(NeSy), which combines low-level neural
perception and logic-based reasoning [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">1, 2, 3</xref>
        ],
is a promising area of research that aims to
integrate these concepts in a seamless
fashion. NeSy systems have shown the advantage
of incorporating neural and logical
reasoning, including the ability to learn with less
data, robustness to noise, the ability to
perform joint reasoning (structured prediction),
and more. Unfortunately, there is a dearth
and realistic. A comprehensive NeSy test suite designed with the ability to vary the structural
constraints and perceptual dificulty is a challenge facing the community.
      </p>
      <p>
        There are a variety of tasks and datasets commonly used in NeSy research. Many involve
visual reasoning. Examples include identifying the subject or context of the image such as
Visual Relationship Detection [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], Semantic Image Interpretation [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ], and Visual Genome [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ].
These datasets support interesting and complex visual tasks. However, the complex nature of
the task can lead to ambiguous answers (e.g., even humans may misunderstand the context of an
arbitrary image). Additional NeSy test domains involve reasoning with knowledge graphs such
as FB15k and WN18 [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. These datasets often lack direct subsymbolic information, requiring
that information to be generated from other sources (e.g., from word embeddings). Finally,
MNIST Addition [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] is another popular NeSy dataset. MNIST Addition uses MNIST digit images
as operands in addition equations, with the goal of predicting the sum of the digit images.
MNIST Addition is an excellent NeSy testbed, however, it is limited by the ease of MNIST image
classification. With MNIST classifiers that can achieve over 99.9% accuracy [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], MNIST Addition
(both single and multi variants) is possible using little symbolic reasoning.
      </p>
      <p>Our goal is to create a comprehensive NeSy benchmark that can be used to further NeSy
research. Such a benchmark needs to take into account the nature of neuro-symbolic computing
at each step. Specifically, a benchmark made for the NeSy community should: 1) include
selfcontained symbolic and subsymbolic information, both of which are necessary to solve the
problem; 2) contain settings/tasks with varying degrees of hardness; 3) include entities that can
be collectively reasoned over; and 4) have unambiguous labels.</p>
      <p>
        In this work, we introduce a novel benchmark specifically designed for NeSy systems, Visual
Sudoku Puzzle Classification (ViSudo-PC). ViSudo-PC expands on the concept of visual Sudoku
puzzles introduced in Wang et al. (2019) and visual Sudoku puzzle classification introduced in
Pryor et al. (2022). Given a Sudoku puzzle constructed from images as input, the classification
task is to determine whether the Sudoku puzzle is correctly solved. Preforming well on the
classification of visual Sudoku puzzles requires systems that are able to reason about the
perceptual information in the images as well as the collective information from Sudoku constraints.
ViSudo-PC expands upon the perceptual challenge of previous MNIST compositional tasks
[
        <xref ref-type="bibr" rid="ref10 ref11 ref8">8, 10, 11</xref>
        ] by drawing images from four diferent sources. Additionally, ViSudo-PC includes a
collection of progressively harder tasks.
      </p>
      <p>Our key contributions are as follows: 1) We construct ViSudo-PC, an extensive NeSy
benchmark that integrates four canonical visual datasets into five tasks of varying dificulty requiring
symbolic and sub-symbolic reasoning to solve, 2) We perform an exploratory evaluation over
two ViSudo-PC tasks to quantify the dificulty of these tasks in diferent settings, and 3) We
discuss ways that the data and tasks of ViSudo-PC can be extended and improved.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Benchmark Data</title>
      <p>
        Visual Sudoku Puzzle Classification (ViSudo-PC) expands upon the classification task proposed
in [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ]. The data includes completed Sudoku puzzles, along with their classification as “correct"
(solved) or “incorrect"12. For those unfamiliar with Sudoku, Sudoku is a puzzle game in which
there is a 9x9 grid, called a “puzzle" or “board", in which each cell is populated with numbers 1 –
9. A puzzle is correct if no row, column, or non-overlapping 3x3 subgrid (or “block") contains the
same number. Classifying whether a Sudoku puzzle is solved correctly simply involves checking
the three types of constraints. Following [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], rather than providing symbolic information
(e.g., labels) for the cell content, we can complicate the problem by providing subsymbolic
information in the form of images. The images can be of digits (or, as we’ll see later, other
objects) for each cell as in Figure 1. Note that no information is provided about the label of each
cell, so a classification system must learn to identify or distinguish between the cell labels at
the same time as to check if the puzzle is solved.
      </p>
      <sec id="sec-2-1">
        <title>2.1. Data Sources</title>
        <p>
          We build upon existing image classification work [
          <xref ref-type="bibr" rid="ref12 ref13 ref14 ref15">12, 13, 14, 15</xref>
          ] to create Sudoku puzzles where
the cell contents and labels originate from multiple data sources. Each data source provides 28
pixel by 28 pixel grayscale images covering a diferent domain of objects, examples of these
images are displayed in Figure 9. Table 1 summarizes the data sources used by ViSudo-PC to
construct Sudoku puzzles.
        </p>
        <p>Dataset
MNIST
EMNIST-ML
FMNIST
KMNIST</p>
        <p>Train Examples Test Examples Labels Subject</p>
        <p>
          MNIST MNIST is one of the most well-known and widely used image classification datasets
[
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]. It is composed of 70,000 examples of handwritten digits distributed roughly evenly across
the ten digit classes.
        </p>
        <p>
          Extended MNIST (EMNIST) extends MNIST by introducing additional examples of handwritten
digits as well as examples of handwritten English letters [
          <xref ref-type="bibr" rid="ref13">13</xref>
          ]. EMNIST provides several diferent
ways to group/classify the data, including by author, by class (arranged into 62 classes denoting
[
          <xref ref-type="bibr" rid="ref1 ref2 ref3 ref4 ref5 ref6 ref7 ref8 ref9">0-9</xref>
          ], [a-z], and [A-Z]), and by merged classes. The merged classes setting combines similar
uppercase and lowercase letters, e.g., “c" &amp; “C", “m" &amp; “M”, and “o" &amp; “O”, into 47 total classes.
ViSudo-PC uses the merged classes, but removes digits to avoid overlap with the MNIST digits.
We refer to this subset of EMNIST as EMNIST Merged Letters (EMNIST-ML). The images provided
in EMNIST are transposed horizontally and rotated 90 degrees anti-clockwise. To maintain
consistency with the other data sources, each EMNIST-ML image is adjusted to an upright
position.
        </p>
        <sec id="sec-2-1-1">
          <title>1Data is available at https://linqs-data.soe.ucsc.edu/public/datasets/ViSudo-PC/v01/. 2ViSudo-PC also provides a data generation tool discussed in Appendix B.</title>
          <p>
            Fashion-MNIST (FMNIST) uses images of fashion products [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ] instead of digits. Classes
include items such as “Coats", “Bags", and “Trousers". Samples of each FMNIST class are
illustrated in Figure 9c. The complex nature of fashion items and wide range of variants makes
classification FMNIST images considerably harder than standard MNIST [
            <xref ref-type="bibr" rid="ref14">14</xref>
            ].
Kuzushiji-MNIST (KMNIST) uses Kuzushiji Japanese characters [
            <xref ref-type="bibr" rid="ref15">15</xref>
            ]. Kuzushiji is a cursive
form of Japanese writing rarely used today. To reduce the 49 character Japanese alphabet down
to 10 classes, KMNIST chooses one character from each of the 10 Hiragana rows (representing
diferent consonant sounds). The stylistic nature of a cursive writing variant, like Kuzushiji,
makes KMNIST intrinsically more dificult than standard MNIST.
          </p>
        </sec>
      </sec>
      <sec id="sec-2-2">
        <title>2.2. Puzzle Construction</title>
        <p>ViSudo-PC puzzles are square and can be any dimension  with an integer square root, as long
as enough cell labels are provided by the data sources (e.g., MNIST, with 10 classes, alone can
only support Sudoku puzzles with a dimension of 9 or less). To construct ViSudo-PC puzzles,
cell labels are first selected from the relevant data sources. The exact method of choosing data
sources and cell labels varies between tasks and is discussed in detail in Section 3. Cell labels are
randomly selected for each cell, ensuring that no Sudoku constraint is violated. Once cell labels
are determined, images of those labels are assigned to each cell. To create a pool of images for
each cell label, train and test splits from each data source are merged, shufled, partitioned by
label, and split into train, test, and validation image pools. Images are never shared between
splits.</p>
        <p>To create negative puzzle examples (incorrectly solved puzzles), existing correct puzzles are
corrupted. Our method of corruption allows ViSudo-PC to contain incorrect puzzles that are
just slightly incorrect, instead of incorrect puzzles that are constructed by random and would
likely have many mistakes. Puzzles are corrupted in one of two ways: via replacement or via
substitution. Replacement corruptions randomly choose a location in a puzzle and an alternate
label, and then replaces that cell with an image uniformly sampled from the split’s pool of
images for that label. Substitution corruptions swap two random cells in the same puzzle. After
each corruption is made, a coin with a configurable bias is flipped to see if another corruption
of the same type is performed. Finally, each corrupted puzzle is checked to ensure that multiple
corruptions did not create a correct puzzle.</p>
        <p>To increase the connectivity of puzzles, ViSudo-PC allows for the introduction of overlap
into the data. Overlap is when the same image is used multiple times when generating puzzles.
Adding overlap gives the predictor an opportunity to recognize the same entity being used in
the same or diferent puzzles. A predictor employing joint reasoning can take advantage of this
opportunity to improve performance.</p>
        <p>The degree of overlap is controlled by a parameter . From a collection of images , ||
images are uniformly sampled with replacement. The sampled images are added to , which is
then shufled, forming a new collection of || + || images.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Benchmark Tasks</title>
      <p>Five diferent tasks are provided as a part of the ViSudo-PC benchmark, each providing
increasingly dificult problems. In all settings, the goal is to classify Sudoku puzzles as correct or
incorrect. Cell labels are provided for debugging, but should never be used in any oficial task.
Basic In this task, a single data source is specified and the first  cell labels from the data source
are used, where  is the dimension of the puzzles. These are the cell labels for the puzzles, and
then images for the cell labels are chosen randomly. This extends the original
Visual-SudokuClassification problem from Pryor et al. (2022) by including alternate data sources.</p>
      <p>Random Label Per Split (PerSplit) This task builds upon the Basic task by randomizing
the cell labels used in each split. PerSplit randomly selects  cell labels from the specified data
sources to use throughout all train, test, and validation puzzles. Any non-empty subset of data
sources can be specified. The challenge posed by PerSplit is that any model/architecture used
must be efective on several types of images, and not specific to one label set. For example, an
architecture specialized to classify MNIST digits may fail to classify the shoes and purses in
FMNIST. Any architecture that performs well on all variations of this task must be able to deal
with the digits, English letters, clothes, and Japanese characters that may appear in a single
puzzle.
Random Label Per Puzzle (PerPuzzle) This task increases the dificulty by re-sampling
the  cell labels for each individual puzzle. So for each puzzle, a set of  cell labels is uniformly
sampled from the specified data sources. Note that this task becomes considerably more dificult
when more data sources are used, since the pool of possible cell labels is larger. All cell labels
present in the test and validation puzzles are guaranteed to be used in train puzzles. For this task
(and the following tasks), methods that rely solely on subsymbolic information (image pixels)
will likely have great dificulty. Because of the larger pool of cell labels, each cell label may be
represented by fewer examples than in previous tasks. To perform well in this task, models
may need to distinguish between cell labels for each puzzle and use the symbolic information
in Sudoku constraints, instead of learning an image classifier on the train split.</p>
      <p>
        Random Label Per Cell (PerCell) Instead of limiting each puzzle to using  cell labels, this
task randomly chooses a cell label for each cell in each puzzle. Thus it can use as many as 2
cell labels (limited by the puzzle size and provided data sources). The number of cell labels used
per puzzle is randomly chosen and that information is not provided to the predictor outside of
the train puzzles. This is the only task that violates the full rules of Sudoku, as more that  cell
labels are potentially used. In this task, a puzzle is considered correct as long as the row, column,
and block constraints of Sudoku are not violated, i.e., no duplicate cell labels appear in any row,
column, or block. Again, we guarantee that all cell labels present in the test and validation
puzzles are also present in train puzzles. The potentially large and unknown number of cell
labels makes this task extremely challenging for any system that relies on an image classifier.
To perform well on this task, a predictor needs to be able to discriminate between cell labels
without seeing many examples of each and without knowledge of the number of cell labels.
Transfer The nfial task is a transfer learning task. The same process used for Basic is used
here, except that two disjoint sets of cell labels are chosen, one for training and another one
for test and validation. For example, when MNIST is used as a data source with  = 4, the cell
labels [
        <xref ref-type="bibr" rid="ref1 ref2 ref3">0 - 3</xref>
        ] are present in the train set, while [
        <xref ref-type="bibr" rid="ref4 ref5 ref6 ref7">4 - 7</xref>
        ] are present in the test and validation sets.
      </p>
    </sec>
    <sec id="sec-4">
      <title>4. Exploratory Evaluation</title>
      <p>We provide an initial exploratory evaluation of ViSudo-PC using the Basic and Random Label
Per Split tasks. We investigate the following questions: 1) Is there a diference in the dificulty
of each data source? 2) How does the use of multiple data sources afect performance? 3) How
does overlap afect performance?</p>
      <sec id="sec-4-1">
        <title>4.1. Models</title>
        <p>We evaluate over three models from Pryor et al. (2022) using hyperparameters specified in the
paper; all unspecified parameters were left at their default values.</p>
        <p>Baseline-Digit This model takes as input the cell labels of a Sudoku puzzle and outputs a
probability of the puzzle being valid. Note that this can be seen as the best possible scenario (or
cheating), where the neural model is able to correctly identify every cell image. This model uses
a feedforward multi-layer perceptron trained to minimize the cross-entropy loss. Formally, this
neural baseline consists of 3 fully connected dense layers of sizes 16, 512, and 256 each with a
ReLu activation and a final dense output layer of size 1 with a softmax activation.
Baseline-Visual This model takes as input the pixels for each cell in a Sudoku puzzle and
outputs the probability of the puzzle being valid. This model uses a convolutional neural network
multi-layer perceptron trained to minimize the cross-entropy loss. Formally, this neural baseline
consists of 3 convolutional layers with kernel size of 3, where each is followed by a max pooling
layer of size 2 with stride 2. This then feeds into the same model as bld.</p>
        <p>NeuPSL A NeSy model that has distinct neural perception and symbolic reasoning components.
The NeuPSL neural model takes as input the pixels for each cell in a Sudoku puzzle, and outputs
a probability distribution for each class, which it then feeds into a symbolic model that verifies
the Sudoku constraints. Formally, the NeuPSL neural model is a simple image classifier first
mentioned in Manhaeve et al. (2021). This neural model consists of 2 convolutional layers with
kernel size of 5, where each is followed by a max pooling layer of size 2 with stride 2. This then
feeds into fully connected dense layers of sizes 256, 120, 84 with a ReLu activation and a final
dense output layer of size  with a softmax activation (where  is the number of classes). The
symbolic PSL model implements the rules of Sudoku as described in Pryor et al. (2022), i.e., has
no duplicate digits in any row, column, or square.</p>
      </sec>
      <sec id="sec-4-2">
        <title>4.2. Data Source Dificulty</title>
        <p>
          To assess the dificulty of each data source for the tasks presented by ViSudo-PC, we first look
at the dificulty of each task in the simpler context of image classification. Table 2 shows the
state-of-the-art image classification performance for each data source at the time of writing
[
          <xref ref-type="bibr" rid="ref17">17</xref>
          ]. All data sources achieve accuracy in the 90s, with MNIST and KMNIST performing the
best and both achieving more than 99% accuracy, while EMNIST is the hardest with an accuracy
of only 91.59%.
        </p>
        <p>By examining each data source’s image classification performance, we can get an idea of
the best possible performance a naive ViSudo-PC model can achieve. Where a naive model
simply attempts to classify each cell in a puzzle independently (assuming cell labels are supplied
to the model). ViSudo-PC provides both 4x4 and 9x9 puzzles (with the ability to generate
larger puzzles), therefore the expected accuracy of a naive model is ()16 and
()81 respectively3.</p>
        <p>Data Source</p>
        <p>Model
MNIST
EMNIST-Merged4
FMNIST
KMNIST</p>
        <p>
          CNN Ensemble [
          <xref ref-type="bibr" rid="ref9">9</xref>
          ]
WaveMix [
          <xref ref-type="bibr" rid="ref18">18</xref>
          ]
DARTS [19]
SpinalNet [20]
        </p>
        <p>Image
Accuracy</p>
        <p>Additionally, we assess the performance of the three models on the Basic task using 50
4x4 training puzzles from each data source. Table 3 shows the results of the three models on
each data source. The Baseline-Visual is unable to generalize over any of the data sources.
Baseline-Digit, however, performs approximately the same over all data sources, as it does not
use any perceptual information. Unsurprisingly, NeuPSL performs the best on MNIST, which
has the simplest image cell labels. And despite being the most dificult data source for the
state-of-the-art image classifiers, NeuPSL performed second best on EMNIST-ML.
3The expected accuracy only includes performance on positive examples, and additionally excludes cases where
multiple classification mistakes are made which result is an accidental correct classification.
4Digits are included in this setting, but excluded for ViSudo-PC.
0.71 ± 0.04
0.69 ± 0.05
0.70 ± 0.04
0.71 ± 0.03</p>
      </sec>
      <sec id="sec-4-3">
        <title>4.3. Performance across Multiple Data Sources</title>
        <p>To examine the impact of using multiple data sources for puzzle generation, we ran the Random
Label Per Split task using 4x4 puzzles generated with data from one or more data sources.
As shown by Figure 7, NeuPSL performs well in almost all settings, outperforming
BaselineDigit, but struggling more whenever KMNIST is used. This result is consistent with NeuPSL’s
previous performance with KMNIST. Baseline-Digit has very consistent performance and is
almost agnostic to the diferent data sources. As expected, Baseline-Visual fails to generalize
and produces consistently poor performance.</p>
      </sec>
      <sec id="sec-4-4">
        <title>4.4. Overlap Performance</title>
        <p>To determine the efect of overlap on performance, we ran the three models on the Basic task
using difering amounts of overlapping images in 4x4 puzzles from each data source. As shown in
Figure 8, both the Baseline-Digit and NeuPSL models show a benefit from increasing the amount
of overlap. NeuPSL, which is able to collectively reason, shows much larger improvements as
the amount of overlap is increased than Baseline-Digit, which may just be benefiting from fewer
unique images. Here, Baseline-Visual also fails to generalize and cannot beat random guessing.</p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Discussion</title>
      <p>There are several interesting ways in which both the data and tasks in ViSudo-PC can be extended
by including additional structural information. Cell labels can be expanded to hierarchies of
labels, e.g., a MNIST zero may be a part of the cell label hierarchy: 0 → Digit → Alphanumeric
→ Glyph. These hierarchies can then be used to create a variety of new tasks, at diferent
abstraction levels. For example, one could define a task where cell labels sharing a common
hierarchical ancestor are considered the same. Another possible new task involves adding
additional constraints on the Sudoku problem. For example, requiring specific blocks to contain
cells from diferent label classes, e.g., all numbers or letters. Recall that in the current set of
tasks, no cell labels are given, and results are not evaluated over cell labels, only over puzzle
classification. Another set of new tasks can be introduced by including cell labels in either
the inference or evaluation process. Finally, an additional interesting direction is introducing
confounding information. For example, following [21], confounding information in the form of
color could be explicitly added to the data generation process.</p>
    </sec>
    <sec id="sec-6">
      <title>Acknowledgments</title>
      <p>This work was partially supported by the National Science Foundation grants CCF-1740850 and
CCF-2023495.
[19] M. S. Tanveer, M. U. K. Khan, C.-M. Kyung, Fine-tuning darts for image classification, in:</p>
      <p>International Conference on Pattern Recognition (ICPR), 2021, pp. 4789–4796.
[20] H. Kabir, M. Abdar, S. M. J. Jalali, A. Khosravi, A. F. Atiya, S. Nahavandi, D. Srinivasan,</p>
      <p>Spinalnet: Deep neural network with gradual input, arXiv (2020).
[21] B. Kim, H. Kim, K. Kim, S. Kim, J. Kim, Learning not to learn: Training deep neural
networks with biased data, in: IEEE/CVF Conference on Computer Vision and Pattern
Recognition (CVPR), 2019, pp. 9012–9020.</p>
    </sec>
    <sec id="sec-7">
      <title>A. Data Sources</title>
      <p>Examples of each data source are provided in Figure 9. MNIST, FMNIST, and KMNIST have all
their provided classes displayed. EMNIST-ML is shown with 10 of its 47 classes.
(a) MNIST</p>
      <p>(b) EMNIST-ML
(c) FMNIST
(d) KMNIST</p>
    </sec>
    <sec id="sec-8">
      <title>B. Data Generation</title>
      <p>ViSudo-PC provides a data generation tool that allows users to construct their own ViSudo-PC
datasets5. The settings in Table 4 are used to create the data provided with ViSudo-PC6. Here,
we provide a brief description of each parameter.</p>
      <p>Parameter</p>
      <p>Value
Dimension {4, 9}
Data Sources  ({MNIST, EMNIST − ML, FMNIST, KMNIST}) − {}
Train Count {1, 2, 5, 10, 20, 30, 40, 50, 100}
Test Count 100
Valid Count 100
Overlap {0.0, 0.5, 1.0, 2.0}</p>
      <p>Corrupt Chance 0.5</p>
      <p>Dimension Dimension determines the size of the Sudoku puzzles generated. All Sudoku
puzzles in ViSudo-PC are square, and the dimension  is the number of cells on each side of
the puzzle. Additionally, because Sudoku puzzles require square blocks within each puzzle, the
puzzle dimension must have an integer square root.</p>
      <p>Data Sources Data sources determines the cell labels and images. For many of the tasks
discussed in Section 3, data may come from more than one source.</p>
      <p>Train/Test/Valid Counts The number of correct puzzles to generate for each split. During
the corruption process, the same number of incorrect puzzles will also be generated.
Overlap
Section 2.2.</p>
      <p>The constant  controls the number of examples that are duplicated, as discussed in
Corruption Chance While generating negative instances as described earlier, the chance
of continuing the corruption process after each corruption is made. This controls how many
mistakes are in the negative examples.</p>
      <sec id="sec-8-1">
        <title>5Code is available at https://github.com/linqs/visual-sudoku-puzzle-classification. 6Data is available at https://linqs-data.soe.ucsc.edu/public/datasets/ViSudo-PC/v01/.</title>
        <p>{ MNIST }
{ MNIST }
Images /
Instance
9 × 9
60,000</p>
      </sec>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T. R.</given-names>
            <surname>Besold</surname>
          </string-name>
          , A. S.
          <string-name>
            <surname>d'Avila Garcez</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Bader</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Bowman</surname>
            ,
            <given-names>P. M.</given-names>
          </string-name>
          <string-name>
            <surname>Domingos</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          <string-name>
            <surname>Hitzler</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          <string-name>
            <surname>Kühnberger</surname>
            ,
            <given-names>L. C.</given-names>
          </string-name>
          <string-name>
            <surname>Lamb</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          <string-name>
            <surname>Lowd</surname>
            ,
            <given-names>P. M. V.</given-names>
          </string-name>
          <string-name>
            <surname>Lima</surname>
            , L. de Penning, G. Pinkas,
            <given-names>H.</given-names>
          </string-name>
          <string-name>
            <surname>Poon</surname>
          </string-name>
          , G. Zaverucha,
          <article-title>Neural-symbolic learning and reasoning: A survey and interpretation</article-title>
          , arXiv (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A. S.</given-names>
            <surname>d'Avila Garcez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Gori</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L. C.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Spranger</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. N.</given-names>
            <surname>Tran</surname>
          </string-name>
          ,
          <article-title>Neuralsymbolic computing: An efective methodology for principled integration of machine learning and reasoning</article-title>
          ,
          <source>Journal of Applied Logics</source>
          <volume>6</volume>
          (
          <year>2019</year>
          )
          <fpage>611</fpage>
          -
          <lpage>632</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>L.</given-names>
            <surname>De Raedt</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumančić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <surname>G.</surname>
          </string-name>
          <article-title>Marra, From statistical relational to neurosymbolic artificial intelligence</article-title>
          ,
          <source>in: IJCAI</source>
          ,
          <year>2020</year>
          , pp.
          <fpage>4943</fpage>
          -
          <lpage>4950</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>C.</given-names>
            <surname>Lu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bernstein</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Fei-Fei</surname>
          </string-name>
          ,
          <article-title>Visual relationship detection with language priors</article-title>
          ,
          <source>in: European Conference on Computer Vision (ECCV)</source>
          ,
          <year>2016</year>
          , pp.
          <fpage>852</fpage>
          -
          <lpage>869</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>I.</given-names>
            <surname>Donadello</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Serafini</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. D.</given-names>
            <surname>Garcez</surname>
          </string-name>
          ,
          <article-title>Logic tensor networks for semantic image interpretation</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>R.</given-names>
            <surname>Krishna</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Zhu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Groth</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Johnson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Hata</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Kravitz</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Chen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Kalantidis</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.-J.</given-names>
            <surname>Li</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D. A.</given-names>
            <surname>Shamma</surname>
          </string-name>
          , et al.,
          <article-title>Visual genome: Connecting language and vision using crowdsourced dense image annotations</article-title>
          ,
          <source>International Journal of Computer Vision</source>
          (IJCV)
          <volume>123</volume>
          (
          <year>2017</year>
          )
          <fpage>32</fpage>
          -
          <lpage>73</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Bordes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Usunier</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Garcia-Durán</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Weston</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Yakhnenko</surname>
          </string-name>
          ,
          <article-title>Translating embeddings for modeling multi-relational data</article-title>
          ,
          <source>in: International Conference on Neural Information Processing Systems (NeurIPS)</source>
          ,
          <year>2013</year>
          , pp.
          <fpage>2787</fpage>
          -
          <lpage>2795</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumancic</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          , L. De Raedt,
          <article-title>Deepproblog: Neural probabilistic logic programming</article-title>
          ,
          <source>in: NeurIPS</source>
          ,
          <year>2018</year>
          , pp.
          <fpage>3753</fpage>
          -
          <lpage>3763</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>S.</given-names>
            <surname>An</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Lee</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Park</surname>
          </string-name>
          ,
          <string-name>
            <given-names>H.</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>So</surname>
          </string-name>
          ,
          <article-title>An ensemble of simple convolutional neural network models for mnist digit recognition</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2020</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <given-names>P.-W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Donti</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Wilder</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Kolter</surname>
          </string-name>
          , Satnet:
          <article-title>Bridging deep learning and logical reasoning using a diferentiable satisfiability solver</article-title>
          ,
          <source>in: International Conference on Machine Learning (ICML)</source>
          ,
          <year>2019</year>
          , pp.
          <fpage>6545</fpage>
          -
          <lpage>6554</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>C.</given-names>
            <surname>Pryor</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Dickens</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Augustine</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Albalak</surname>
          </string-name>
          ,
          <string-name>
            <given-names>W.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>L.</given-names>
            <surname>Getoor</surname>
          </string-name>
          , Neupsl:
          <article-title>Neural probabilistic soft logic</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>Y.</given-names>
            <surname>LeCun</surname>
          </string-name>
          , L. Bottou,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Bengio</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Hafner</surname>
          </string-name>
          ,
          <article-title>Gradient-based learning applied to document recognition</article-title>
          ,
          <source>Proceedings of the IEEE</source>
          <volume>86</volume>
          (
          <year>1998</year>
          )
          <fpage>2278</fpage>
          -
          <lpage>2324</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G.</given-names>
            <surname>Cohen</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Afshar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Tapson</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A. V.</given-names>
            <surname>Schaik</surname>
          </string-name>
          , Emnist:
          <article-title>Extending mnist to handwritten letters</article-title>
          ,
          <source>in: International Joint Conference on Neural Networks (IJCNN)</source>
          ,
          <year>2017</year>
          , pp.
          <fpage>2921</fpage>
          -
          <lpage>2926</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>H.</given-names>
            <surname>Xiao</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Rasul</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Vollgraf</surname>
          </string-name>
          ,
          <article-title>Fashion-mnist: A novel image dataset for benchmarking machine learning algorithms</article-title>
          , arXiv (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <string-name>
            <given-names>T.</given-names>
            <surname>Clanuwat</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Bober-Irizar</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kitamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Lamb</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Yamamoto</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Ha</surname>
          </string-name>
          ,
          <article-title>Deep learning for classical japanese literature</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2018</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [16]
          <string-name>
            <given-names>R.</given-names>
            <surname>Manhaeve</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Dumančić</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Kimmig</surname>
          </string-name>
          ,
          <string-name>
            <given-names>T.</given-names>
            <surname>Demeester</surname>
          </string-name>
          , L. De Raedt,
          <article-title>Neural probabilistic logic programming in deepproblog</article-title>
          ,
          <source>Artificial Intelligence</source>
          <volume>298</volume>
          (
          <year>2021</year>
          )
          <fpage>103504</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>P. W.</given-names>
            <surname>Code</surname>
          </string-name>
          , Image classification, https://paperswithcode.com/task/image-classification,
          <year>2022</year>
          . Accessed:
          <fpage>2022</fpage>
          -05-27.
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>P.</given-names>
            <surname>Jeevan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            <surname>Sethi</surname>
          </string-name>
          , Wavemix:
          <article-title>Resource-eficient token mixing for images</article-title>
          ,
          <source>arXiv</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>