<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Decompiling Language Models into Logic Programs</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jacinto Dávila Quintero</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Universidad de Los Andes, CESIMO</institution>
          ,
          <addr-line>Mérida</addr-line>
          ,
          <country country="VE">Venezuela</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2025</year>
      </pub-date>
      <abstract>
        <p>In this paper, we sketch a strategy to take advantage of LLM's capabilities with natural language processing while avoiding the risk of incorrect outputs. We aim to extract knowledge from LLMs into logic programs and queries and, therefore, add LLMs to the materials and tools used to teach logic programming. Large Language Models, LLM, are a compiled, sub-symbolic form of knowledge representation that can achieve human level performance in natural language processing[1]. The transformer architecture[2], that creates and processes LLMs, is an stochastic machine able to generate new, original, human-like outputs from natural language inputs (prompts), after having been trained with massive amounts of data[3]. This relatively new and increasingly popular brand of Artificial Intelligence Applications has been termed Generative AI[4]. Unfortunately, the stochastic machine is bound to produce some outputs that can not be logically justified from the inputs or the training data. These "original" outputs, sometimes called hallucinations, cannot be avoided[5] and can be regarded as serious errors in many domains of use.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;Decompiling</kwd>
        <kwd>LLM</kwd>
        <kwd>Logic Programming</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
    </sec>
    <sec id="sec-2">
      <title>2. Related Work</title>
      <p>
        This type of hybrid systems, combining Generative AI with a form of Logical AI, some times also
called neuro-symbolic systems, are being researched and tested intensively [
        <xref ref-type="bibr" rid="ref5">6</xref>
        ], [
        <xref ref-type="bibr" rid="ref6">7</xref>
        ], [
        <xref ref-type="bibr" rid="ref7">8</xref>
        ]. These are
the latest developments of a trend to improve the reasoning capabilities of LLMs that started with
“internal” solutions such as Chain-of-Though[
        <xref ref-type="bibr" rid="ref8">9</xref>
        ], C-o-T, which prompted for a series of
intermediate reasoning steps to improve the ability of the LLM to perform complex reasoning.
C-ot, however, does not prevent the aforementioned hallucination errors and even fails to justify that
the referred reasoning is what the LLM is actually performing.
      </p>
      <sec id="sec-2-1">
        <title>We are following the lead of solutions that integrate some well-known logical reasoner with the</title>
      </sec>
      <sec id="sec-2-2">
        <title>LLM so that the logic that it is been followed when a conclusion is reached or a question is answered is perfectly well accounted for [10], even when the reasoning must be adapted to provide a fit to the LLM, like in soft-unification on noisy facts[11]. A more extreme approach is to actually</title>
        <p>
          intervene the Transformer architecture with a representation closer connected with logical
formulae and a process that matches a reasoning system, as done by Thuy &amp; Yamamoto [
          <xref ref-type="bibr" rid="ref11">12</xref>
          ].
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Unfortunately, this requires a different neural system and may compromise the efficiency of the resulting models for natural language processing.</title>
      </sec>
      <sec id="sec-2-4">
        <title>There is increasing evidence that to guarantee trustworthiness of sub-symbolic models, they must be integrated with some robust, external reasoner (a hybrid system [13]). A latter study suggest that language reasoning models, LRMs, versions of LLMs optimized for reasoning, actually collapse while solving reasoning problems at a certain scale [14].</title>
        <p>In this paper, as a teaching exercise, we want to step back before moving onto the integration of
an LLM with a reasoner, to revisit the basic structures and practical elements that underlie
language models as they are understood in Generative AI. After, we will relate some experiments
to decompile knowledge from an LLM into logic programs that provide explainable answers to
users’ questions.</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>3. Back to first principles</title>
      <sec id="sec-3-1">
        <title>3.1. Compiling and de-compiling</title>
        <p>
          A compiler is a basic concept in Computer Science: “A compiler is a program that can read a
program in one language, the source language, and translate it into an equivalent program in
another language, the target language” [
          <xref ref-type="bibr" rid="ref14">15</xref>
          ]. In Computing, the code in the target language is
known as a compiled set of instructions that a machine can take as input and then follow to
perform some computation, generating some outputs in the process. Another related concept is an
interpreter, “another common kind of language processor. Instead of producing a target program
as a translation, an interpreter appears to directly execute the operations specified in the source
program on inputs supplied by the user” (.ibid). In both cases, it is all about translating human
instructions into machine actions. Figure 1 depicts both processes:
        </p>
        <sec id="sec-3-1-1">
          <title>De-compiling would be the reverse process of obtaining the source program given the target</title>
          <p>program. When the target is a very basic code, like a binary representation, the process faces an
enormous combinatorial complexity and it is very difficult to obtain the source. In some cases,
however, with the use of inputs and corresponding outputs, and some other general information,
like blue-prints or frameworks associated with the whole system, de-compilation is practically
possible, as depicted in figure 2.</p>
          <p>A neural network can be regarded as a compiled representation of knowledge, as will be
illustrated below. It is normally produced by a training process that short-circuits inputs and
outputs and computes the weights of the network that generalize the function that associates those
inputs and outputs and, hopefully, variants of them. Once a neural network is trained, its weights
and biases represent the learned knowledge in an executable form. In this operational sense, the
net is like a compiled program: an executable form of the original source code. We do not see the
‘source code’ (the training data and learning process) directly, but we have the ‘executable’ (the
trained model) that embodies the learned relationships. Trusting these relationships implies
trusting there is a meaningful connection between the ‘source code’ and the ‘executable’.</p>
        </sec>
        <sec id="sec-3-1-2">
          <title>Thus, after training, the network can be used as a compiled program to produce outputs given the corresponding inputs. Figure 3 (top) represents this possibility and (middle) the training process.</title>
          <p>language model trained on logic programs could be used to reproduce them by appropriate
prompting.</p>
          <p>We envisage the possibility depicted at the bottom of Figure 3. An LLM, which is a specialized
form of neural network coupled with a processing device (e. g. a transformer). Having being
trained to represent logic programs, the transformer could be prompted to reproduce correct
sources by providing it with some samples. It would be de-compiling logic programs from the
subsymbolic representation. In each case, we aim to extract a logic program whose inherent structure
and rules represent meaningful, correct, and independently verifiable knowledge about the domain,
irrespective of the neural network's specific input-output mappings.</p>
        </sec>
      </sec>
      <sec id="sec-3-2">
        <title>3.2. Neural Networks and Logic</title>
        <sec id="sec-3-2-1">
          <title>Let us complete this review of basic principles by looking at the relationship between neural</title>
          <p>
            networks and logic programs. It is no longer as simple as it will be presented below, because the
technology of the connectionist systems, as they are also called, has evolved. Transformers[
            <xref ref-type="bibr" rid="ref2">2</xref>
            ], in
particular, operates in a very different manner. But the essence of a neural network as an array of
weights still holds.
          </p>
        </sec>
        <sec id="sec-3-2-2">
          <title>The neural network shown in figure 4 is very simple and has a direct correspondence with a rule in logic:</title>
          <p>A communicative act is effective with a strength W
if the act conveys information with a strength W1
and the act is easy to understand with a strength W2
and W = f(W1, W2).</p>
        </sec>
        <sec id="sec-3-2-3">
          <title>Which can be rendered in even simpler human, natural language, as:</title>
          <p>A communicative act is more effective, the more it conveys information and the easier it is to
understand.</p>
        </sec>
        <sec id="sec-3-2-4">
          <title>This goes to show that neural networks can have some human meaning associated with them.</title>
        </sec>
        <sec id="sec-3-2-5">
          <title>That connection is, of course, harder to elicit as the network grows. Moreover, in the Transformer</title>
        </sec>
        <sec id="sec-3-2-6">
          <title>Architecture [1], that connection is mediated by a process computing a prediction for the next token in a sequence. One can say, however, that the success of this process logically depends on how well the connection between that token and its context to the previous ones is enshrined in the network or its input.</title>
        </sec>
        <sec id="sec-3-2-7">
          <title>To first test that idea, consider the following collection of examples in table 1:</title>
        </sec>
        <sec id="sec-3-2-8">
          <title>This small set of examples could be used to train a network like the one shown in figure 5:</title>
          <p>
            to read or skip (From Kowalski [
            <xref ref-type="bibr" rid="ref15">16</xref>
            ]).
          </p>
        </sec>
        <sec id="sec-3-2-9">
          <title>We posed the following prompt to Claude Sonnet 4:</title>
          <p>Consider this set of examples that correspond, respectively, to Author,
Thread, Length, Location, Action:
known, new, long, home, skip
unknown, new, short, work, read
unknown, follow-up, long, work, skip
known, new, short, home, read
Write rules to decide when to read and when to skip. Write the rules as
logical clauses like skip if known and new. Explain the rules.</p>
          <p>and obtained this answer:
&lt;Start Content enhancement by Claude Sonnet 4&gt;1</p>
        </sec>
        <sec id="sec-3-2-10">
          <title>Looking at the examples, I can derive these rules:</title>
          <p>1Labeled as suggested here: https://ceur-ws.org/GenAI/Taxonomy.html
Rules:</p>
        </sec>
        <sec id="sec-3-2-11">
          <title>1. Skip if long (regardless of other factors)</title>
        </sec>
        <sec id="sec-3-2-12">
          <title>2. Read if short (regardless of other factors)</title>
          <p>Explanation:</p>
        </sec>
        <sec id="sec-3-2-13">
          <title>The key pattern in the data is that length is the decisive factor:</title>
          <p>• All long emails are skipped (examples 1 and 3)
• All short emails are read (examples 2 and 4)</p>
        </sec>
        <sec id="sec-3-2-14">
          <title>The other attributes (Author, Thread, Location) don't appear to influence the decision. For</title>
          <p>instance:
• Both known and unknown authors can be read or skipped
• Both new threads and follow-ups can be read or skipped
• Both home and work locations can be read or skipped</p>
        </sec>
        <sec id="sec-3-2-15">
          <title>The decision rule is elegantly simple: length determines action. This suggests a practical email</title>
          <p>management strategy where you prioritize shorter emails (quick to process) over longer ones
(time-consuming), regardless of who sent them, whether they're new conversations, or where
you're reading them.</p>
        </sec>
        <sec id="sec-3-2-16">
          <title>This approach maximizes email processing efficiency by tackling quick wins first, which is a common productivity strategy.</title>
          <p>&lt;End Content enhancement by Claude Sonnet 4&gt;</p>
          <p>This exercise illustrates how an LLM can take an input with a set of data and some instructions
about how to produce an output and, not only carrying on the instructions regarding the data, but
producing useful rules to model it. It even goes on to also hypothesize an explanation. At this
point, one could test the rules to check on their validity independently from the LLM. We end up
with a hybrid system that is very effective at parsing natural language and can be fully accounted
for while reasoning.</p>
        </sec>
      </sec>
      <sec id="sec-3-3">
        <title>3.3. Small Language Models</title>
        <sec id="sec-3-3-1">
          <title>Examples as the one in the previous section are, of course, encouraging but tend to drive the hype around Generative AI. Users develop a form of trust not just on the parsing abilities of LLMs but on their reasoning capabilities, which can be justified by instances such as that in which they appear to be reasoning correctly.</title>
          <p>
            To help us recover perspective, we have developed a very simple example of a language model
on one of software platform used for the large ones [
            <xref ref-type="bibr" rid="ref16">17</xref>
            ]. The code is shown in Appendix A. We
have trained this little model with the following three sequences of proofs in Prolog notation, to
indicate how those elementary rules and facts could be used to deduce another fact in a very
straightforward exercise of forward reasoning.
patterns = [
"a.b:-a.b.",
"b.c:-b.c.",
"a.b.d:-a,b.d."
]
          </p>
        </sec>
        <sec id="sec-3-3-2">
          <title>To test the model, we provide strings of characters and expect to recover the following character on one of the training patterns. These are the outputs:</title>
          <p>Input: 'a.' -&gt; Predicted: '&lt;eos&gt;' (Expected: 'b')
Input: 'a.b' -&gt; Predicted: '.' (Expected: ':')
Input: 'a.b:' -&gt; Predicted: '-' (Expected: '-')
Input: 'a.b:-' -&gt; Predicted: 'a' (Expected: 'a')
Input: 'a.b:-a' -&gt; Predicted: '&lt;eos&gt;' (Expected: '.')
Input: 'a.b:-a.' -&gt; Predicted: '&lt;eos&gt;' (Expected: 'b')
Input: 'a.b:-a.b' -&gt; Predicted: '.' (Expected: '.')
Input: 'b.c:-b.' -&gt; Predicted: '&lt;eos&gt;' (Expected: 'c')
Input: 'a.b.d:-a,b.' -&gt; Predicted: 'd' (Expected: 'd')</p>
          <p>This tiny model correctly predicts a correct continuation of 5 out of 9 cases (as marked in
yellow and green highlighting). One of them (last line in red) could be (mis)interpreted as the
correct application of forward reasoning. That output lists what comes from the model (the
prediction) and what one can expect given the inputs. For example, in the last line, we expected d
because of the input pattern: "a.b.d:-a,b.d.". And the model correctly predicts that d follows
a.b.d:a,b. That prediction also coincides with the results of inferring d from a and b through the rule
d:a,b., a coincidence that could have people believing that the model does modus ponens.</p>
        </sec>
        <sec id="sec-3-3-3">
          <title>This experiment, of course, has no statistical significance. But it does suggest that Transformers’ language models, while simply predicting the next token, could be seen as doing some form of reasoning. That they can produce outputs as elaborated as the one shown in the previous section, just by scaling, is, of course, extraordinary and requires further investigations.</title>
        </sec>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. De-compiling an LLM into Logic Programs</title>
      <sec id="sec-4-1">
        <title>We have performed some more practical experiments to generate logic programs from LLMs.</title>
      </sec>
      <sec id="sec-4-2">
        <title>In a first experiment, we[18] prompted Google’s LLM, Gemini (1.5), with a set of instructions to</title>
        <p>list the most common ways to ask the same question and then map the corresponding alternatives
as new entries to a Prolog dictionary2 (.ibid). That dictionary is used by the control natural
language Logical English, LE. We ended with a series of descriptions of relations in natural
language, called templates in LE, that can be used to query a document processed by the regular LE
engine in Prolog.</p>
      </sec>
      <sec id="sec-4-3">
        <title>In a recent experiment, we prompted an LLM (DeepSeek V3) to produce alternative texts to extend and improved a logic program written en Logical English. We started with the prompt:</title>
        <p>This is a set of rules, an scenario for the rules and a list of queries:
and then included the content of a Logical English document describing an Australian Tax Law3,
but without including templates. Then, we asked:.</p>
        <p>Can you produce some interesting scenarios to test those rules
and it produced a list of 6 scenarios to test the given rules, described in plain English but not in</p>
      </sec>
      <sec id="sec-4-4">
        <title>Logical English. Following the same chat thread, we asked:</title>
        <p>Can you restate those scenarios using these relations:</p>
        <p>and included all the templates that accompany that document. The answer is a partial but
computable list of examples that could be added to the document in question, such as this one:
&lt;Start Content enhancement by DeepSeek V3&gt;
2https://github.com/LogicalContracts/LogicalEnglish/blob/main/kb/4_affiliates_3.pl
https://github.com/LogicalContracts/LogicalEnglish/blob/8ee7ff2740cb1e7231d115572a5a3dad7c673535/le_input.pl#L2478
(where the particular entries start)
3https://github.com/LogicalContracts/LogicalEnglish/blob/main/moreExamples/1_cgt_assets_and_exemptions_3.le
Scenario 1: The Crypto Collector</p>
      </sec>
      <sec id="sec-4-5">
        <title>Some extra processing is still required, but the content is already in the correct syntax so that it</title>
        <p>could be added to a runnable document in Logical English, with the following content4
scenario Crypto is:</p>
        <p>Alice is a taxpayer.
asset bitcoin is a cryptocurrency.
asset bitcoin belongs to Alice.
asset bitcoin was acquired at 2020-01-01.
asset bitcoin is used for the purchase of items for personal consumption.
asset nft_rare belongs to Alice.
asset nft_rare costed 15000 to acquire.
asset nft_gift is a gift.
asset nft_gift belongs to Alice.</p>
        <p>asset nft_gift was gifted through a will to a deductible gift recipient
beneficiary.</p>
      </sec>
      <sec id="sec-4-6">
        <title>The produced facts approximately follow the logic in the document and cannot be trusted to</title>
        <p>
          cover it all. Some adjustments and extra information are still required to make it work. But there
seems no need to use a particular syntax, like Prolog’s, and the generated outputs are very useful to
extend the codification of testing cases. It can be argued that the difficult part is to generate the
rules. Some preliminary results show that it is also feasible with few shots prompting [
          <xref ref-type="bibr" rid="ref18">19</xref>
          ].
        </p>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>5. Conclusion</title>
      <p>We have made an argument for the use of LLM to generate logic programs. We have shown a few
experiments in which proper prompting of an LLM produces fragments of logic programs and
queries with immediate utility. The read/skip example produces a small logic program, a previous
paper shows how we could generate the queries (in Prolog) from English and the last example is
about producing facts and queries for scenarios that test the rules in a given document. This has
led us to believe that we could exploit the extraordinary capacity of Transformers for natural
language parsing, without having to rely on them for reasoning. By de-compiling logic programs
from LLMs, we could scale the production of systems integrating generative AI + Logical AI, which
could provide a flexible interaction with humans in natural language while offering well-verified
4https://github.com/LogicalContracts/LogicalEnglish/blob/main/moreExamples/cgt_assets.le
capabilities for reasoning. We aim to extract knowledge from LLMs into logic programs and
queries and, therefore, add LLMs to the materials and tools used to teach logic programming.</p>
    </sec>
    <sec id="sec-6">
      <title>6. Future work</title>
      <p>We envisage a road map to develop these ideas, bridging LLMs and symbolic logic together, and
emphasizing the goal of extracting independently meaningful and correct logic. First, to establish a
robust communication interface between LLMs and Prolog. Second, to define and implement
methods to evaluate the semantic correctness of extracted logic, independent of the LLM's original
output, which will probably bring us back to the discussion about the semantics of LP and will
include developing an iterative agentic toolchain where the LLM proposes and refines Prolog rules
based on feedback from the Prolog interpreter, and create benchmarks for evaluating this deep
logic extraction. And, third, to explore the scalability of the approach and identify real-world
applications for independently correct logic extraction. Some related research questions that we
could discuss at the workshop are: What constitutes "independently meaningful and correct
logic"? (A precise definition is key); What to do if the LLM "hallucinates" plausible-looking
but incorrect logic? (The role of the Prolog interpreter as a strict validator could be crucial);
What are the theoretical limits of extracting symbolic logic from sub-symbolic models?;
What specific architectures for LLMs would best support logic extraction or LP
decompilation? (e.g., models explicitly trained on symbolic reasoning tasks, or those with more
transparent internal representations); and, How can we scale this beyond toy examples to
real-world complexity? (Modularization, hierarchical reasoning, combining with other AI
paradigms).</p>
    </sec>
    <sec id="sec-7">
      <title>Acknowledgements</title>
      <sec id="sec-7-1">
        <title>We are grateful to LodgeIT for their encouragement and support to encode Australian Tax Law in</title>
      </sec>
      <sec id="sec-7-2">
        <title>Logical English.</title>
      </sec>
    </sec>
    <sec id="sec-8">
      <title>Declaration on Generative AI</title>
      <sec id="sec-8-1">
        <title>During the preparation of this work, the author used Claude Sonnet 4 for Content enhancement to</title>
        <p>produce the example of guided-rule generation, in section 3.2. We also used DeepSeek V3 for</p>
      </sec>
      <sec id="sec-8-2">
        <title>Content enhancement to improve Logical English documents, as described in section 4. After using</title>
        <p>these tools/services, the author reviewed and edited the content as needed and takes full
responsibility for the publication’s content.
5https://github.com/google/BIG-bench/graphs/contributors https://arxiv.org/abs/2206.04615</p>
      </sec>
    </sec>
    <sec id="sec-9">
      <title>A. Appendix</title>
      <p>import torch
import torch.nn as nn
import torch.optim as optim
from collections import defaultdict
# Special tokens
PAD_ID = 0
class PatternTransformer(nn.Module):
def __init__(self, vocab_size, embed_size=16, nhead=1):
super().__init__()
self.embed = nn.Embedding(vocab_size, embed_size)
self.attention = nn.MultiheadAttention(embed_size, nhead, batch_first=True)
self.out = nn.Linear(embed_size, vocab_size)
# Pre-process patterns
def preprocess(text):</p>
      <p>return [vocab[ch] for ch in text] + [EOS_ID]
pattern_ids = [preprocess(p) for p in patterns]
#print(pattern_ids)
# Create training samples
def create_samples():
samples = []
for seq in pattern_ids:
for i in range(2, len(seq)): # Need at least 2 tokens for prediction
input_seq = [BOS_ID] + seq[:i]
target = seq[i]
samples.append((torch.tensor(input_seq), torch.tensor(target)))
return samples
# Initialize model
model = PatternTransformer(len(vocab))
opt = optim.Adam(model.parameters(), lr=0.01)
criterion = nn.CrossEntropyLoss()
# Training
samples = create_samples()
print(samples)
for epoch in range(500):
total_loss = 0
for input_seq, target in samples:
opt.zero_grad()
logits = model(input_seq.unsqueeze(0))
loss = criterion(logits[0,-1,:], target)
loss.backward()
opt.step()
total_loss += loss.item()
if epoch % 50 == 0:</p>
      <p>print(f"Epoch {epoch}, Loss: {total_loss/len(samples):.4f}")
# Testing
test_cases = [
("a.", "b"),
("a.b", ":"),
("a.b:", "-"),
("a.b:-", "a"),
("a.b:-a", "."),
("a.b:-a.", "b"),
("a.b:-a.b", "."),
("b.c:-b.", "c"),
("a.b.d:-a,b.", "d")</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>Aarohi</given-names>
            <surname>Srivastava</surname>
          </string-name>
          and more than
          <volume>400</volume>
          <fpage>contributors5</fpage>
          .
          <article-title>"Beyond the imitation game: Quantifying and extrapolating the capabilities of language models</article-title>
          .
          <source>" arXiv preprint arXiv:2206.04615</source>
          (
          <year>2022</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Vaswani</surname>
          </string-name>
          ,
          <string-name>
            <surname>Ashish</surname>
          </string-name>
          , et al.
          <article-title>"Attention is all you need</article-title>
          .
          <source>" Advances in neural information processing systems</source>
          <volume>30</volume>
          (
          <year>2017</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Emily</surname>
            <given-names>M.</given-names>
          </string-name>
          <string-name>
            <surname>Bender</surname>
          </string-name>
          , Timnit Gebru,
          <string-name>
            <surname>Angelina McMillan-Major</surname>
            , and
            <given-names>Shmargaret</given-names>
          </string-name>
          <string-name>
            <surname>Shmitchell</surname>
          </string-name>
          .
          <year>2021</year>
          .
          <article-title>On the Dangers of Stochastic Parrots: Can Language Models Be Too Big?</article-title>
          .
          <source>In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency (FAccT '21)</source>
          .
          <article-title>Association for Computing Machinery</article-title>
          , New York, NY, USA,
          <fpage>610</fpage>
          -
          <lpage>623</lpage>
          . https://doi.org/10.1145/3442188.3445922
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <given-names>Mohammadamin</given-names>
            <surname>Barektain</surname>
          </string-name>
          , Anant Nawalgaria, Daniel J. Mankowitz, Majd Al Merey,
          <string-name>
            <given-names>Yaniv</given-names>
            <surname>Leviathan</surname>
          </string-name>
          , Massimo Mascaro, Matan Kalman, Elena Buchatskaya, Aliaksei Severyn, Irina Sigler, and Antonio Gulli.
          <source>Foundational Large Language Models &amp; Text Generation. Google</source>
          (
          <year>2025</year>
          ) https://www.kaggle.
          <article-title>com/whitepaper-foundational-llm-and-text-generation Xu, Ziwei</article-title>
          ,
          <string-name>
            <given-names>Sanjay</given-names>
            <surname>Jain</surname>
          </string-name>
          , and
          <string-name>
            <given-names>Mohan</given-names>
            <surname>Kankanhalli</surname>
          </string-name>
          .
          <article-title>"Hallucination is inevitable: An innate limitation of large language models</article-title>
          .
          <source>" arXiv preprint arXiv:2401.11817</source>
          (
          <year>2024</year>
          ).
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>Xiaocheng</given-names>
            <surname>Yang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Bingsen</given-names>
            <surname>Chen</surname>
          </string-name>
          , and
          <string-name>
            <surname>Yik-Cheung Tam</surname>
          </string-name>
          .
          <year>2024</year>
          .
          <article-title>Arithmetic Reasoning with LLM: Prolog Generation &amp; Permutation</article-title>
          .
          <source>In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers)</source>
          , pages
          <fpage>699</fpage>
          -
          <lpage>710</lpage>
          ,
          <string-name>
            <surname>Mexico</surname>
            <given-names>City</given-names>
          </string-name>
          , Mexico. Association for Computational Linguistics.
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Borazjanizadeh</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Piantadosi</surname>
            ,
            <given-names>S. T.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Reliable reasoning beyond natural language</article-title>
          .
          <source>arXiv preprint arXiv:2407</source>
          .
          <fpage>11373</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Tan</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Deng</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qiu</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Qu</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chu</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xu</surname>
            ,
            <given-names>Y</given-names>
          </string-name>
          &amp; Qi,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>THOUGHT-LIKEPRO: Enhancing Reasoning of Large Language Models through Self-Driven Prolog-based Chain-of-Thought</article-title>
          .
          <source>arXiv preprint arXiv:2407</source>
          .
          <fpage>14562</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Wei</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Wang</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Schuurmans</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bosma</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xia</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chi</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Zhou</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          (
          <year>2022</year>
          ).
          <article-title>Chain-ofthought prompting elicits reasoning in large language models</article-title>
          .
          <source>Advances in neural information processing systems</source>
          ,
          <volume>35</volume>
          ,
          <fpage>24824</fpage>
          -
          <lpage>24837</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Tarau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2025</year>
          , January).
          <article-title>Leveraging LLM Reasoning with Dual Horn Programs</article-title>
          .
          <source>In International Symposium on Practical Aspects of Declarative Languages</source>
          (pp.
          <fpage>163</fpage>
          -
          <lpage>178</lpage>
          ). Cham: Springer Nature Switzerland.
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [11]
          <string-name>
            <surname>Tarau</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>On LLM-generated Logic Programs and their Inference Execution Methods</article-title>
          .
          <source>arXiv preprint arXiv:2502</source>
          .
          <fpage>09209</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [12]
          <string-name>
            <surname>Thuy</surname>
            ,
            <given-names>P. T. T.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Yamamoto</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          (
          <year>2024</year>
          ).
          <article-title>Implementing Derivations of Definite Logic Programs with Self-Attention Networks</article-title>
          .
          <source>arXiv preprint arXiv:2410</source>
          .
          <fpage>11396</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [13]
          <string-name>
            <surname>Liu</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Fu</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ding</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ning</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          , Zhang,
          <string-name>
            <given-names>C.</given-names>
            ,
            <surname>Liu</surname>
          </string-name>
          ,
          <string-name>
            <given-names>X.</given-names>
            , &amp;
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <surname>Y.</surname>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Logical Reasoning in Large Language Models: A Survey</article-title>
          .
          <source>arXiv preprint arXiv:2502</source>
          .
          <fpage>09100</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [14]
          <string-name>
            <surname>Shojaee</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Mirzadeh</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Alizadeh</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Horton</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Bengio</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          , &amp;
          <string-name>
            <surname>Farajtabar</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>The illusion of thinking: Understanding the strengths and limitations of reasoning models via the lens of problem complexity</article-title>
          .
          <source>arXiv preprint arXiv:2506</source>
          .
          <fpage>06941</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [15]
          <string-name>
            <surname>Aho</surname>
            ,
            <given-names>Alfred V.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lam</surname>
          </string-name>
          , Monica S.,
          <string-name>
            <surname>Sethi</surname>
          </string-name>
          , Ravi, &amp;
          <string-name>
            <surname>Ullman</surname>
            ,
            <given-names>Jeffrey E.</given-names>
          </string-name>
          (
          <year>2007</year>
          )
          <article-title>Compilers: Principles, Techniques and</article-title>
          <string-name>
            <given-names>Tools. Second</given-names>
            <surname>Edition</surname>
          </string-name>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [16]
          <string-name>
            <surname>Kowalski</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          (
          <year>2011</year>
          ).
          <article-title>Computational Logic for Human Communication</article-title>
          . Imperial College. London, UK.
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          [17]
          <string-name>
            <surname>Dávila</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Decompiling LM into LP</article-title>
          . Kaggle Notebook. https://www.kaggle.com/code/jacintodavila/decompiling-lm
          <string-name>
            <surname>-</surname>
          </string-name>
          into-lp
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [18]
          <string-name>
            <surname>Dávila</surname>
            ,
            <given-names>J. Controlled</given-names>
          </string-name>
          <string-name>
            <surname>Natural Language Models</surname>
          </string-name>
          .
          <source>Prolog Education Workshop 2024, Workshop Proceedings of the 40th International Conference on Logic Programming (ICLP-WS</source>
          <year>2024</year>
          ). https://ceur-ws.
          <source>org/</source>
          Vol-
          <volume>3799</volume>
          /
          <year>paper10PEG2</year>
          .0.pdf
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [19]
          <string-name>
            <surname>Zin</surname>
          </string-name>
          , May Myo; Borges, Georg; Satoh, Ken; FUNGWACHARAKORN,
          <string-name>
            <surname>Wachara</surname>
          </string-name>
          (
          <year>2025</year>
          ).
          <article-title>Towards Machine-Readable Traffic Laws: Formalizing Traffic Rules into PROLOG Using LLMs</article-title>
          .
          <source>The 20th International Conference on Artificial Intelligence and Law</source>
          ,
          <string-name>
            <surname>ICAIL</surname>
          </string-name>
          <year>2025</year>
          . Chicago. June 16 to 20,
          <year>2025</year>
          <article-title>def forward(self, x): x = self.embed(x) seq_len = x.size(1) attn_mask = torch.triu(torch.ones(seq_len, seq_len), diagonal=1).bool() x, _ = self.attention(x, x, x, attn_mask=attn_mask.to(x.device)) return self.out(x) # Create vocabulary from multiple patterns patterns = [ "a.b:-a.b.", "b</article-title>
          .c:
          <article-title>-b.c.", "a.b.d:-a,b.d." print("\nTesting:") for test_input, expected in test_cases: input_ids = torch.tensor([BOS_ID] + [vocab[ch] for ch in test_input]) with torch.no_grad(): logits = model(input_ids.unsqueeze(0)) #print(logits) pred = logits.argmax(-1)[0,-1].item() print(f"Input: '{test_input}' -&gt; Predicted: '{list(vocab.keys())[pred]}' (Expected: '{expected}')")</article-title>
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>