<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>General Grid Recognition and Logic-Driven Gameplay for Intelligent Robots</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Tayyab Ateeq</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Department of Mathematics and Computer Science, Univesity of Calabria</institution>
          ,
          <addr-line>Rende (CS)</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>The recognition of complex visual patterns, such as grid structures, is a trivial task for humans. In contrast, enabling a robot to reliably detect and interpret such patterns remains a significant challenge in the field of autonomous perception. We recognise grid-like patterns in our daily life such as a grid of windows in high-rise buildings, grid-like arrangement of products on shelves in hypermarkets and superstores, grid design in fabrics and mobile games based on grid arrangement of objects such as match-3 and chess games. This research aims to develop a   to identify a grid-like structure in a given image and then detect and classify objects having diferent shapes and colours within the cells of the grid. Such a framework could be used, for instance, to auto-detect and count the products placed on store shelves; it could also be used in recognizing grids in match-3 or chess games and become part of an iterative decision-making pipeline, allowing a delta robot to play such games. Furthermore, the goal of this research work is to design a general framework since we expect the architecture to be reusable on many diferent types of images such as buildings, store shelves and board games. This research currently aims to validate the proposed method by adapting an extensive dataset involving a variety of mobile games of diferent types and resolutions such as  ℎ ,   , ℎ, ℎ and many more.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Computer Vision</kwd>
        <kwd>Logic Programming</kwd>
        <kwd>Object Recognition</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>The notion of intelligent robotics fundamentally depends on advanced computer vision techniques
that allow robots to perceive their environment and make autonomous decisions through reasoning.
Within this context, this research investigates computer vision methods for the detection of grid-like
structures and the recognition of objects arranged within them — a pattern commonly observed not
only in mobile board games but also in diverse real-world settings such as store shelves, building
façades, and industrial layouts. The core objective is to develop a robust, general-purpose grid detection
framework capable of identifying grid-like structures in various image domains, regardless of specific
screen layouts, resolutions, or visual styles. While current eforts focus on mobile games — including
match-3 puzzles and chess-like games— the envisioned solution remains agnostic to device types and
game-specific designs, making it broadly applicable across domains.</p>
      <p>
        Beyond grid detection, this research aims to integrate the vision system with an Answer Set
Programming (ASP) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] reasoning module. Following grid and object detection, symbolic information
representing game states and object positions can be extracted and translated into ASP facts. Declarative
logic, powered by ASP, enables the system to autonomously infer valid actions, plan strategies, and
adapt gameplay, efectively allowing a robot to participate in visual games without human intervention.
The goal would be to integrate these proposed techniques in BrainyBot [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. This framework enables
robots to play puzzle games by detecting repeated colored shapes and feeding that data into a symbolic
reasoning module. The main limitation can be found in the usage of Template Matching to detect those
shapes: this results in the tedious job of retrieving such templates to be looked for within the image.
      </p>
      <p>Moreover, this approach is really resolution-dependent, requiring diferent templates for diferent
resolutions of the same game. Enriching the pipeline with the   would relieve researchers
from the burden of collecting said templates.</p>
      <p>The synergy between machine vision and symbolic reasoning demonstrates significant potential
for solving complex, real-world problems where structured visual patterns appear — from automated
product counting on store shelves to enabling intelligent agents to interact with grid-based games. This
combination leverages the strengths of both paradigms: computer vision for environmental perception
and ASP for diagnostic reasoning, high-level control, and decision-making.</p>
      <p>A selection of grid-like game scenarios is illustrated in Figure 1, highlighting the broad applicability
of the proposed approach to diverse visual patterns.</p>
      <p>(a)
(b)
(c)</p>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>Considerable work has been done regarding object detection for real-world images, but there are only a
few works that are focused on recognition in game scenes. Furthermore, no considerable work has been
done to detect grid-like structures in games or other images. However, there are some works to detect
repeated objects in images that partially align with our problem. Furthermore, the reviewed studies also
cover a wide range of methodologies for object detection, pattern recognition, and repetitive structure
analysis, applied to domains including computer games, printed fabrics, biological images, and urban
environments.</p>
      <p>
        Yoon et al. [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ] proposed a model focused on an educational competition based on the Angry Birds
game, where the objective was to design AI agents capable of playing both seen and unseen levels.
Though the organizers provided a built-in API for detecting game elements like birds and objects, it had
limitations in accuracy and scope. The paper emphasised student learning rather than technical novelty
and did not delve deeply into the vision techniques or AI methods employed. In contrast, Ge et al. [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]
applied qualitative stability analysis to detect unknown objects based on interactions between gravity
and stability of known objects, primarily within Angry Birds. Techniques such as Canny edge detection
and modified K-means clustering were used. Their results achieved a recall of 0.79 and precision of 0.68
and were also tested on Candy Crush and Super Stack games. Another work [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] proposed a hybrid
approach of Canny edge detection, Contour Detection, and Circular Hough Transform (CHT) to deal
with counting overlapped circular objects. This method overcame the problems of overlapping circular
shapes normally present in cases, such as medical or biological imagery.
      </p>
      <p>
        Liu et al. [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] developed a method for detecting near-regular textures (NRT) using Generalized
PatchMatch (GPM) and Markov Random Fields (MRF). Their pipeline rectified image geometry via
homography, identified repeating patches, and expanded the lattice structure iteratively. Tested on
datasets from the 2013 Symmetry Detection competition and NRT images, they reached a detection
accuracy of 91.1%, outperforming previous methods. Grant et al. [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ] presented a lattice detection
approach for automatic geotagging. They used SIFT features and RANSAC to detect repeated patterns
in images of man-made environments and then matched them with a 3D image database. While the
method didn’t prioritise exhaustive lattice detection, it provided enough pattern recognition to support
geolocation. Canada et al. [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] presented a method to detect 2D lattice structures in zebrafish larva
arrays. Using image alignment, morphological closing, and symmetry analysis, they achieved 100%
accuracy on 19 out of 20 test images, though the method required 5 minutes per image. Doubek et
al. [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] designed a method for detecting multiple repetitive patterns, involving blob detection, and
SIFT descriptors. A cluster-based approach, combined with Hough Transform and cross-correlation,
allowed the detection and completion of partially visible patterns. Final tiles were transformed into a
shift-invariant representation using the Fourier Transform, enabling image retrieval.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3. Vision-Based Agnostic Detection of Grid</title>
      <p>My proposed method consists of a sequential pipeline of classical image processing and computer vision
techniques. Diferent phases of the proposed method are depicted in the following section.
Isolating the Main Game Area from the Screenshot: Given an input image  , which is a mobile
game screenshot—such as one from Candy Crush Saga, Candy Fever, Chess, Checkers, or any other
grid-based board game—the goal of the first processing stage is to isolate the main game region from the
rest of the image, which may contain UI components and background elements. The image  is first
converted to grayscale to reduce complexity and eliminate color information, which is not relevant at
this stage. This grayscale image is then smoothed using a Gaussian blur filter to reduce noise and make
contour detection more robust. Next, adaptive thresholding is applied to generate a binary image, which
highlights potential structures such as grid lines by separating foreground content from the background
based on local intensity variations. Edge detection is then performed using the Canny method, helping
to identify clear boundaries in the binary image. These edges are further refined using morphological
operations: first dilated to close small gaps, then eroded to clean up noise. From the resulting image,
external contours are extracted, and the largest contour max is selected under the assumption that it
corresponds to the visible game grid. A binary mask is then created from this contour and applied to
the original image  , producing a new image 1, which contains only the region inside the main
game grid, with the rest of the image set to black.</p>
      <sec id="sec-3-1">
        <title>Cropping Extrusions and Noise from the Bottom and Sides: In the second part of the algorithm,</title>
        <p>the output image 1 is further processed to remove noise and irrelevant content near the horizontal
and vertical borders of the extracted region. This step begins by analyzing the image row-wise to count
the number of non-black pixels in each row, scanning both from the bottom up and from the top down.
The algorithm checks whether each row contains a suficient amount of meaningful content. If a row
contains fewer non-black pixels than a specified threshold, it is considered background or noise and is
replaced entirely with black pixels. This process is repeated iteratively until a row with a significant
number of non-black pixels is found, indicating the presence of relevant information in that row and its
consecutive neighbors. The same procedure is then applied column-wise to clean up the vertical edges.
This function produces the output image 2, which ofers a cleaner and more focused representation of
the game grid for the next analysis phase.</p>
        <p>Grid Construction: After getting 2, we now have to find the grid lines underlying the board. This
is achieved by detecting horizontal and vertical lines (,  ) using the probabilistic Hough Transform
height ℎ.
and filtering them by angle and continuity. These lines are clustered according to the distance between
them using a tolerance value. By analyzing the extracted grid the missing lines are inserted to fill gaps
left by some undetected lines. Using the completed sets of lines (,  ), the algorithm constructs the
full grid by pairing intersecting lines to form rectangular boxes and producing 3. Let  be the set of
all the grid cells in the game board; each box  ∈  is annotated with its center (, ), width , and
False Positives Removal: To eliminate empty or invalid boxes that may be black or irrelevant, each
box in  is evaluated based on pixel intensity as depicted below. If the ratio of black pixels in a box
exceeds a threshold, it is removed.</p>
        <p>ℬvalid = { ∈ ℬ |   &lt;  max}
where   is the black pixel ratio in box b, and  max is a predefined threshold.
around the mean area ¯ are considered invalid as follows.</p>
      </sec>
      <sec id="sec-3-2">
        <title>Grid Validation by Identifying Boxes with Size Discrepancies:</title>
        <p>Next, the algorithm identifies
boxes with inappropriate dimensions, which are either significantly larger or smaller than the average
cell size. This step is primarily related to the auto-tuning part of the algorithm, which will be discussed
in the subsequent section. The area  of each box is calculated, and boxes outside a certain range
¯ =</p>
        <p>1
|ℬvalid| ∈ℬvalid
∑︁ , Invalid box =</p>
        <p>⃒⃒  &gt; ¯ ·  or  &lt;
︂{
⃒
⃒
¯ }︂

where  is a scaling factor.</p>
        <p>Clustering of Similar Boxes: For identifying similar objects within grid cells, template matching
technique is used. A small template is extracted from each unassigned box and matched against other
boxes using normalized cross-correlation. If a match exceeds the threshold  , the box is assigned to the
current template’s group. The total number of unique object types  is equal to the number of unique
groups formed.</p>
        <p>Genetic Algorithm for Parameter Tuning: The techniques discussed above rely on multiple
parameters whose values vary across various games. As already stated, the aim of the research is to be
as agnostic as possible: in order to enable parameters optimization for each game without previous
knowledge of the game itself, a genetic algorithm is proposed. As the name suggests, it uses a genetic
framework, a population-based search heuristic inspired by the principles of natural selection and
evolution. The core objective is to tune parameters such as edge detection thresholds and line detection
constraints that yield the best performance in grid construction and object grouping, based on an
evaluation function. The algorithm begins by identifying the set of parameter names to be optimized
and creating a bounded range for each parameter based on the lower and upper limits provided in the
input. A population of candidate parameter sets, also called individuals, is then randomly initialized.
Each individual is a vector representing one possible configuration. The discussed algorithm is executed
with each of these configurations and each individual is assigned to some fitness evaluation. The
iftness function relies on: the total number of detected objects or boxes, the number of distinct object
types, the number of vertical and horizontal lines added or removed, and the number of
inappropriatesized boxes. The fitness of a candidate is computed primarily based on maximizing the ratio = object
/ object types while minimizing invalid boxes, added lines, and removed lines. These metrics are
prioritized hierarchically to compare fitness between candidates. The selection method being used is
tournament selection, where a random sample of individuals are compared with each other, and the
best of each group are chosen as a parent for reproduction. From these selected parents, crossover
is performed, swapping segments of two parent solutions to create new ofspring. It keeps the best
(1)
(2)
solution found so far and keep track of how many generations have passed without improvement. After
a given number of generations with no improvement, the loop terminates, implying that no parameters
are being adjusted anymore. All generations are then scored using the best parameter set, and the one
with the highest score is returned as a final result. Such auto-tuning allows for the system to be scalable,
robust and adaptable to new game environments or image styles without the requirement of manual
calibration.</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>This research proposes a method to detect grid layouts in given images using computer vision methods.
In the simplest terms, this is a visual perception layer in which the raw screenshots are converted into
structured, machine-readable data. In the case of grid-based games, such as Candy Crush Saga, Chess,
or Checkers, the structured output produced by our pipeline, such as a matrix of objects with associated
types and coordinates, is fed into the next logical step of logic and reasoning in which possible moves
in the game are predicted. This vision system, structured yet adaptable, connects passive recognition
to active decisions, allowing a robot to play games on its own by perceiving, abstracting, planning,
and acting on its own, based entirely on visual information. After implementing and performing the
validation on an extensive dataset of board-based mobile games, we intend to extend the work to other
domains such as recognition of grid arrangement of store shelf products, grid formed by repeated
windows in buildings, printed grid patterns on fabrics and many more.</p>
    </sec>
    <sec id="sec-5">
      <title>Declaration on Generative AI</title>
      <p>The author(s) have not employed any Generative AI tools.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>T.</given-names>
            <surname>Eiter</surname>
          </string-name>
          , G. Ianni, T. Krennwallner,
          <article-title>Answer set programming: A primer, in: Reasoning Web</article-title>
          .,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>D.</given-names>
            <surname>Angilica</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Avolio</surname>
          </string-name>
          , G. Beraldi,
          <string-name>
            <given-names>G.</given-names>
            <surname>Ianni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Pacenza</surname>
          </string-name>
          ,
          <article-title>From vision to execution: Enabling knowledge representation and reasoning in hybrid intelligent robots playing mobile games</article-title>
          ,
          <source>in: KR</source>
          ,
          <year>2023</year>
          , pp.
          <fpage>44</fpage>
          -
          <lpage>54</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>D</given-names>
            <surname>.-M. Yoon</surname>
          </string-name>
          ,
          <string-name>
            <surname>K.-J. Kim</surname>
          </string-name>
          ,
          <article-title>Challenges and opportunities in game artificial intelligence education using angry birds</article-title>
          ,
          <source>Ieee Access</source>
          <volume>3</volume>
          (
          <year>2015</year>
          )
          <fpage>793</fpage>
          -
          <lpage>804</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>X.</given-names>
            <surname>Ge</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Renz</surname>
          </string-name>
          ,
          <string-name>
            <surname>P. Zhang,</surname>
          </string-name>
          <article-title>Visual detection of unknown objects in video games using qualitative stability analysis</article-title>
          ,
          <source>IEEE Transactions on Computational Intelligence and AI in Games</source>
          <volume>8</volume>
          (
          <year>2015</year>
          )
          <fpage>166</fpage>
          -
          <lpage>177</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>J.</given-names>
            <surname>Ni</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Khan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>K.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>S. K.</given-names>
            <surname>Haider</surname>
          </string-name>
          ,
          <article-title>Automatic detection and counting of circular shaped overlapped objects using circular hough transform and contour detection</article-title>
          ,
          <source>in: 2016 12th World Congress on Intelligent Control and Automation (WCICA)</source>
          , IEEE,
          <year>2016</year>
          , pp.
          <fpage>2902</fpage>
          -
          <lpage>2906</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>S.</given-names>
            <surname>Liu</surname>
          </string-name>
          , T.-T. Ng,
          <string-name>
            <given-names>K.</given-names>
            <surname>Sunkavalli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M. N.</given-names>
            <surname>Do</surname>
          </string-name>
          ,
          <string-name>
            <given-names>E.</given-names>
            <surname>Shechtman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Carr</surname>
          </string-name>
          ,
          <article-title>Patchmatch-based automatic lattice detection for near-regular textures</article-title>
          ,
          <source>in: Proceedings of the IEEE international conference on computer vision</source>
          ,
          <year>2015</year>
          , pp.
          <fpage>181</fpage>
          -
          <lpage>189</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>G.</given-names>
            <surname>Schindler</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Krishnamurthy</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Lublinerman</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>F.</given-names>
            <surname>Dellaert</surname>
          </string-name>
          ,
          <article-title>Detecting and matching repeated patterns for automatic geo-tagging in urban environments</article-title>
          ,
          <source>in: 2008 IEEE Conference on Computer Vision</source>
          and Pattern Recognition, IEEE,
          <year>2008</year>
          , pp.
          <fpage>1</fpage>
          -
          <lpage>7</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>B. A.</given-names>
            <surname>Canada</surname>
          </string-name>
          ,
          <string-name>
            <given-names>G. K.</given-names>
            <surname>Thomas</surname>
          </string-name>
          , K. C. Cheng,
          <string-name>
            <given-names>J. Z.</given-names>
            <surname>Wang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Liu</surname>
          </string-name>
          ,
          <article-title>Automatic lattice detection in nearregular histology array images</article-title>
          ,
          <source>in: 2008 15th IEEE International Conference on Image Processing</source>
          , IEEE,
          <year>2008</year>
          , pp.
          <fpage>1452</fpage>
          -
          <lpage>1455</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>P.</given-names>
            <surname>Doubek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J.</given-names>
            <surname>Matas</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Perdoch</surname>
          </string-name>
          ,
          <string-name>
            <given-names>O.</given-names>
            <surname>Chum</surname>
          </string-name>
          ,
          <article-title>Image matching and retrieval by repetitive patterns</article-title>
          ,
          <source>in: 2010 20th international conference on pattern recognition, IEEE</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>3195</fpage>
          -
          <lpage>3198</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>