<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Towards Consistent Language Models Using Declarative Constraints</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Jasmin Mousavi</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Arash Termehchy</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Oregon State University</institution>
          ,
          <addr-line>1500 SW Jeferson Ave, Corvallis, OR 97331</addr-line>
          ,
          <country country="US">United States</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the complexity and uninterpretability of the internal learned representations, it is challenging to modify language models such that they provide correct and consistent results. The data management community has developed various methods and tools for repairing inconsistent datasets. In these methods, users specify the desired properties of data in a domain in the form of high-level declarative constraints. This approach has provided usable and scalable methods to delivering consistent information from inconsistent datasets. We propose to build upon this success and leverage these methods to modify language models such that they deliver consistent and accurate results. We investigate the challenges of using these ideas to obtain consistent and accurate language models.</p>
      </abstract>
      <kwd-group>
        <kwd>eol&gt;large language models</kwd>
        <kwd>declarative constraints</kwd>
        <kwd>consistent modeling</kwd>
        <kwd>model repair</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>One approach is to fine-tune the LLM on a set of
domain-specific data sources to improve the quality of
Large language models (LLMs) have shown unprece- its answers for questions in a given domain [4].
Nonethedented abilities in processing natural languages [1, 2]. less, it has shown that these methods may also lead to
They efectively generalize to perform various tasks with many inaccurate answers [5]. This is, in part, due to the
few or no training examples. Thus, there is a rapidly fact that fine-tuning is inherently under-specified and
growing interest in using them to solve data-driven prob- may not suficiently modify the model to eliminate its
lems, such as, interactive question answering. already learned spurious information. Another approach</p>
      <p>Nonetheless, LLMs often provide incorrect answers to is to augment LLMs with additional and potentially
relinput queries and perform inaccurate inferences [3, 2]. evant information from external data sources [6, 7, 8].
Some studies indicate the recent LLMs provide up to These methods often add extra information to the
con40% erroneous answers to factual questions [2]. These text considered during pretraining. This line of research
erroneous results are important obstacle for wide-spread have improved the accuracy of LLMs to a limited degree
use of LLMs in real-world applications. as it does not address the core issue of having
spuri</p>
      <p>To address the problem of inaccurate answers returned ous and incorrect information in LLMs. It is not clear
by LLMs, we should recognize that LLMs are not knowl- whether adding more relevant information eliminate
inedge bases, but rather probabilistic or approximate accurate information stored in the model. Moreover, it is
models of factual information. LLMs may over- often challenging to find suficiently many relevant data
generalize patterns and relationships observed in the sources, particularly for long-tail entities.
sub-sequences of pretraining documents, which might It is challenging to ensure that an LLM learns accurate
lead to returning spurious relationships and inaccurate generalizations and returns correct answers as it may
results. The uninterpretable mixture of linguistic pat- require perfect knowledge of unobserved data.
Nonetheterns and factual information has made it challenging to less, we may be able to restrict its pretrained
representaeliminate incorrect information. It is in sharp contrast to tion to adhere to semantic constraints in the domain
traditional approaches to database querying in which the to avoid generating incorrect results.
user interface, e.g., query language, is clearly separated This is akin to the problem of cleaning databases
from the source of the information, e.g., databases. to satisfy a set of declarative semantic constraints [9].
Databases often contain data that does not comply with
Joint Workshops at 49th International Conference on Very Large Data
Bases (VLDBW’23) — Workshop on LLMs and Databases (LLMDB’23), the semantic constraints in their domains. For example,
August 28 - September 1, 2023, Vancouver, Canada a person might not have any social security number or
$ mousavij@oregonstate.edu (J. Mousavi); have more than one in a human resource database. The
termehca@oregonstate.edu (A. Termehchy) usual query processing methods might return inaccurate
 https://web.engr.oregonstate.edu/~termehca/ (A. Termehchy)</p>
      <p>© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License results over incomplete or inconsistent databases. The
CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) data management community has developed a unified,
usable, and scalable approach to repairing inconsistent
data to comply with declarative semantic constraints [9].</p>
      <p>Instead of writing long and complex imperative programs
to check inconsistencies and repair the data, users
specify the properties of the consistent dataset succinctly
in a high-level declarative language. There are several
types of constraints based on the model of the data, e.g.,
functional dependencies for relational data or description
logic rules for RDF data. They are usually subsets of first
order logic that are suficiently expressive to capture
important knowledge in the domain yet not too expressive
to make reasoning intractable. Hence, data systems may
check incompatibilities or redundancies in constraints
eficiently. These constraints may also be learned from
high-quality datasets in the domain.</p>
      <p>In this paper, we propose a novel approach to reduce
inconsistencies in LLMs using high-level declarative
constraints. We believe that the success of using declarative
constraints to provide reliable information in data
management indicates that our proposed approach has the
potential to deliver a usable and scalable method for
creating and maintaining reliable and consistent LLMs. This,
in turn, enables users to leverage LLMs in real-world
applications with high confidence and accuracy.</p>
      <p>We also discuss challenges of using high-level
declarative constraints to reduce inconsistencies in LLMs.
Specially, it is not clear how to enforce declarative constraints
in an LLM eficiently. It might be challenging to find
correspondence between the symbolic declarative
constraints and information in the continuous
representation learned by LLMs. We investigate how to leverage
existing ideas in data cleaning and management [9] and
current methods to embedding structured information
[10, 11, 12] to address this problem. Since pretraining and
ifne-tuning are often time-consuming and
computationally expensive, we also investigate methods of updating
a pretrained LLM that ensures it follows a set of
constraints.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Creating Consistent Models</title>
    </sec>
    <sec id="sec-3">
      <title>Using Pretraining &amp; Fine-tuning</title>
      <p>Since LLMs are created using pretraining, it is natural to
consider methods that incorporate semantic constraints
during pretraining to create consistent LLMs.
Nonetheless, pretraining usually takes long time and substantial
computational resources. Researchers often use a
relatively fast process called fine-tuning to modify a
pretrained LLM [8]. During fine-tuning, the LLM is trained
with additional information using its pretrained weights
as initial values. In this section, we explore methods for
creating or modifying an LLM so that complies with a
set of constraints using pretraining and fine-tuning.
The semantic properties and constraints in a domain are
often represented in form of ontologies [13]. In a nutshell,
an ontology consists of a set of facts, where each fact is a
triple in the form of (subject, relationship, object), and a
set of constraints on these facts. The triples in an
ontology introduce concepts, e.g., Person, and their instances,
e.g., Obama. They also represent relationship between
diferent concepts in the domain, e.g., President is-a
Person. Constraints in an ontology lay out the conditions
that concepts and relationships must follow, e.g., is-a has
the transitive property. Constraints are usually expressed
in a subset of rfist order logic, e.g., description logic.
Generally speaking, each constraint establishes that if some
concepts satisfy certain conditions, i.e., premise, they
must satisfy other conditions, i.e., conclusion. For
instance, for is-a relation, we have for all concepts , , ,
if (,  − , ) and (,  − , ), then (,  − , ).</p>
      <p>It is important for an LLM to encapsulate both the facts
and the constraints on those facts in a domain to provide
consistent results. An LLM might not learn the facts from
the textual data over which it is pretrained. It could be
because some facts are not in the text or do not appear
in closely related text spans and contexts. Constraints in
an ontology represent semantic meaning of concepts and
relationships in the domain. This information does not
often appear explicitly in the data used to pretrain LLMs,
therefore, LLMs might not learn them during pretraining.</p>
      <p>Thus, our goal is to create LLMs that contain and
follow both facts and constraints in a given ontology. To
simplify our exposition and because each fact can also
be represented as a special type of constraint, unless
otherwise noted, we refer to both facts and constraints in an
ontology as constraint.</p>
      <sec id="sec-3-1">
        <title>2.2. Mixing Constraints with Training</title>
      </sec>
      <sec id="sec-3-2">
        <title>Data</title>
        <p>Incorporating this structured information into LLMs
poses challenges since LLMs are trained on unstructured
data. One may supplement the training data with textual
ontology information, e.g., Obama is a President.
However, translating facts and constraints into text introduces
two problems. First, in domains containing numerous
semantic constraints, the augmented training data may
exceed maximum sequence lengths (commonly restricted to
512 in most models). Second, converting structured data
into unstructured text may cause the model to view this
information merely as additional context, without
preserving higher-order constraints vital for comprehending
semantics of concepts in the domain.</p>
        <p>To overcome these issues, constraint reduction
techniques can be applied. One method involves reasoning
over the constraints to find a minimal set [ 14], but does
not guarantee that the augmented input will conform Alongside traditional LLM objectives, e.g., masked
obto the maximum sequence length. Another approach jective tasks, one can integrate constraint objective tasks
is to encode the ontology information into an embed- and constraint embeddings during pretraining. These
ded representation using an LSTM [6], integrated via a methods capture the ontology’s structural information,
gating function. This allows the LLM to control what resulting in a model that is consistent with
domaininformation augments the input, successfully limiting specific constraints. Given an ontology and text
docthe sequence length. However, it may not be optimal for uments, constraint objective tasks and constraint
embedincorporating constraints, as it may cause information dings can also be used for fine-tuning. However, these
loss and is more apt for enhancing input with extra facts, techniques may prove more efective if implemented
durrather than filtering incorrect information. ing the pretraining process.</p>
        <p>These methods fall short of incorporating the
ontology in a way that preserves its semantic information,
highlighting the dificulty of integrating high order con- 3. Model Repair
straints into LLMs.</p>
      </sec>
      <sec id="sec-3-3">
        <title>2.3. Retaining Constraint Information</title>
        <p>To ensure that a database complies with a constraint, we
often find the information in the database that do not
follow the constraint and update them so the database
satisfies the constraint. One may adopt this approach
to repair a pretrained model so it satisfies a set of given
constraints. In other words, one may find the portion
of the model responsible for representing a constraint
or lack thereof and update them if necessary so that the
resulting model satisfies the constraint. As opposed to
representing information in a database, factual
information is stored in an LLM implicitly and through some
pretrained weights in a model. Hence, it is dificult to
ifnd and revise the factual data that violates a set of given
constraints in an LLM. In this section, we describe two
approach to repairing pretrained models and discuss their
challenges.</p>
        <p>Constraint Embedding. Ideally, the representation
learned by a LLM should capture the structural
information present in the semantic constraints of an
ontology. Geometric embeddings (e.g. box, circle, cone)
have been widely explored for learning representations
of graph structures such as ontologies and knowledge
bases [15, 11, 16, 10, 12]. For instance, if an ontology
has the constraint that President is-a Person, the
geometric embedding for Person should contain the geometric
embedding for President, reflecting the transitive
property is-a and that President is a subset of Person. These
embeddings preserve the structural properties and
relationships in an embedded space, ensuring that the output
representations maintain the specified constraints.</p>
        <p>When training an LLM, one can incorporate geometric 3.1. Fact-based Repair
or constraint embeddings for unstructured text data in
order to retain information from ontologies. If the on- There has been some recent success in updating facts
reptology data is consistent and the model learns a perfect resented in an LLM [18]. Each update aims at changing
constraint embedding, it should respect the facts and the object in a given triple in form of (subject (), relation
constraints within the domain. However, since this is (), object ()) to a new object ′. These methods first
unlikely, it may be necessary to apply optimization tech- ifnd the weights responsible for representing  and its
niques to the objective function. Such techniques can relationship to  in the model. They then modify these
help facilitate LLMs to learn representations that efec- weights so that the model represents the new object ′
tively capture higher order relationships and constraints in the fact with high probability.
that extend beyond the training domain. Building upon this line of work, one may ensure that
an LLM satisfies a set of constraints by finding and
modi</p>
        <p>Constraint Objective Task. Since the ontology is a
source of knowledge, then it can also be used to train the fying the pretrained weights that represent the facts that
LLM directly. External knowledge can be created from violate the constraints. An algorithm to check whether
the ontology by extracting triples in the form of rich text an LLM satisfies a given constraint could be as follows.
spans, thereby providing more information about con- First, the algorithm samples a set of facts that follow the
straints to the model. Using this data, one may construct a constraint from the ontology. For each instance of a
conword prediction or masking objective that aligns with the straint, it will prompt/query the LLM to check whether
external knowledge of semantic constraints. One strat- and how the LLM represents the facts in the instance.
egy is type modeling [17], where entities are replaced If the LLM’s representations of the facts in the instance
with their type, and the model predicts the entity type for violate the constraint, the algorithm modifies the
repthe next word or word span. This idea can be extended resentations so they follow the constraint. The larger
to a masking objective, where the model predicts masked the set of samples is, the more likely the repaired model
types in the output. satisfies the constraint. Users can change the size of the</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>4. Related Works</title>
      <p>sample based on their available time and resources as
well as desired confidence for satisfying constraints.</p>
      <p>This algorithm might require a large number of up- Lexical Constraints for Language Models. There has
dates to the model, which could be time-consuming. been recent efort on limiting the output of LLMs so they
Moreover, since facts are represented implicitly in the follow given syntactical patterns, e.g., not contain certain
model, the aforementioned methods might not always keywords [5, 19, 20]. In these systems, users write
(imperifnd the updates that modify a fact to its desired form. ative) programs that detect some invalid patterns in the
To address these challenges, one may find a minimal set output of LLMs. These systems, then, use constrained
opof facts and their corresponding update operations such timization or probabilistic inference over the sequences
that modifying their representations in the model will generated by the LLM to reduce the probability of the
most likely create a model that follows the constraint. outputs with invalid patterns. These eforts are steps in
The repair algorithm, then, will update the weights in the right direction but fall short of providing a usable
the model for facts in this minimal set. and scalable method to deliver consistent information</p>
      <p>It is known that there are often many possible mod- over LLMs. First, they do not generally support semantic
ifications of an inconsistent dataset to satisfy a set of constraints. Second, users may have to write multiple
constraints. It is challenging to maintain and query all and possibly long programs to clean up the output of the
these repairs of databases. Hence, researchers have pro- model. As some domain may have numerous constraints,
posed heuristics to choose a few of these repairs, e.g., the it is challenging to develop and maintain these programs.
ones that difer the least from the original database. The Users must check manually whether these programs are
same problem might also happen in repairing models. consistent with each other and there is no redundancy
One may use similar approaches to reduce the number across diferent programs. Third, they are usually applied
of repaired models. only during the decoding stage, therefore, the LLM may
still learn and represent spurious relationships. As it is
3.2. Constraint-based Repair challenging to interpret learned representations in LLMs,
it is dificult to control all the implications of their learned
It may take a long time to update a large number of facts imprecise information. For instance, the learned spurious
in a model [18]. Thus, the approach of fact-based repair relationship about one entity might impact how an LLM
may eficiently modify the model to satisfy constraints answers a question about a diferent but related entity. As
with a relatively few instances, e.g., facts in the ontology, opposed to this line of work, we propose an end-to-end
but it might be computationally challenging to do for approach that uses declarative semantic constraints to
constraints with many instances. Also, if a constraint reduce inconsistent information in LLMs.
has many instances, this approach might deliver many Self-Consistency of Language Models. It is known
possible model repairs even after applying the aforemen- that language models produce contradictory answers to
tioned heuristics to reduce the space of possible repairs. the questions that seek the same information but phrased
Therefore, it will be challenging to query or train these diferently. Researchers have proposed methods to
admodels for a given task. dress this issue by prompting the language model to</p>
      <p>LLMs generalize input data during pretraining. They critique and refine its own output during inference [ 21].
have also been successfully used to generate data that This method prompts the language model with
diferclosely resembles real-world data and train accurate mod- ently phrased questions and builds a (weighted) model
els using a relatively few training examples for various over answers to infer the most likely result. We,
howtasks. Hence, we hypothesize that they might represent ever, mainly focus on ensuring that the language model
some constraints in the domain in whole or in part. If follows semantic constraints.
this hypothesis is true, an LLM does not satisfy some Extracting Knowledge from Language Models.
Reconstraints because the LLM might represent them in- searchers have proposed methods to extract generic
statecompletely or erroneously. ments or factual knowledge from language models using</p>
      <p>Hence, to ensure that the model satisfies a constraint, prompt engineering and human supervision [22]. The
instead of repairing all facts that violate the constraint, prompts are constructed in a way that encourages
sucone might change directly the portion of the model that cinct factual statements. They use human labeled data
represents a constraint. This portion might be signifi- to detect inaccurate outputs and fine-tune the language
cantly smaller than the parts that represent the violating model. However, it might be challenging to collect a
facts. Thus, it might be substantially faster and easier to suficient amount of training data to extract accurate
ifnd the weights in the model responsible for incomplete statements.
or erroneous representation of the constraint than doing Querying Language Models. There has been some
the same for all facts that violate that constraint. recent efort to design programming languages for
prompting large language models, i.e., language model
programming [23, 24, 25]. There are generally
domainspecific programming languages to extract information
from and control the output of a large language model to
satisfy the users’ input hard constraints, akin to where
conditions in SQL queries. Some of these languages
resemble database query languages, e.g., SQL [24]. These
languages aim at making it easier to query and prompt
and optimize the number of calls to large language
models. However, these languages do not generate consistent
results conditioned on domain constraints. Thus, they
may return answers that violate semantic constraints in
the domain.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          <string-name>
            <surname>ICLR</surname>
          </string-name>
          ,
          <year>2023</year>
          . [19]
          <string-name>
            <given-names>H.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Peng</surname>
          </string-name>
          , G. Van den Broeck,
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          ation,
          <source>in: Proceedings of the 40th International Con-</source>
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          <source>ference on Machine Learning (ICML)</source>
          ,
          <year>2023</year>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          https://arxiv.org/pdf/2304.07438.pdf. [20]
          <string-name>
            <surname>A. K. Lew</surname>
          </string-name>
          ,
          <string-name>
            <surname>T.</surname>
            Zhi-Xuan,
            <given-names>G.</given-names>
          </string-name>
          <string-name>
            <surname>Grand</surname>
          </string-name>
          , V. K. Mans-
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          <article-title>language models using probabilistic programs</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          <source>arXiv:2306</source>
          .
          <fpage>03081</fpage>
          . [21]
          <string-name>
            <given-names>A.</given-names>
            <surname>Madaan</surname>
          </string-name>
          ,
          <string-name>
            <given-names>N.</given-names>
            <surname>Tandon</surname>
          </string-name>
          ,
          <string-name>
            <given-names>P.</given-names>
            <surname>Gupta</surname>
          </string-name>
          , S. Hallinan,
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          <article-title>Iterative refinement with self-feedback</article-title>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          <source>arXiv:2303</source>
          .
          <fpage>17651</fpage>
          . [22]
          <string-name>
            <given-names>C.</given-names>
            <surname>Bhagavatula</surname>
          </string-name>
          ,
          <string-name>
            <given-names>J. D.</given-names>
            <surname>Hwang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>D.</given-names>
            <surname>Downey</surname>
          </string-name>
          , R. L.
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          <string-name>
            <given-names>P.</given-names>
            <surname>West</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Y.</given-names>
            <surname>Choi</surname>
          </string-name>
          , I2d2:
          <article-title>Inductive knowledge dis-</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          <article-title>tillation with neurologic and self-</article-title>
          <string-name>
            <surname>imitation</surname>
          </string-name>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          <source>arXiv:2212</source>
          .
          <fpage>09246</fpage>
          . [23]
          <string-name>
            <given-names>L.</given-names>
            <surname>Beurer-Kellner</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Fischer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Vechev</surname>
          </string-name>
          , Prompt-
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          <source>Programming Languages</source>
          <volume>7</volume>
          (
          <year>2023</year>
          )
          <fpage>1946</fpage>
          -
          <lpage>1969</lpage>
          . URL:
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          https://doi.org/10.1145%2F3591300. doi:
          <volume>10</volume>
          .1145/
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          3591300. [24]
          <string-name>
            <surname>Microsoft</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          <string-name>
            <surname>Lundberg</surname>
          </string-name>
          , Guidance: A guidance
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          https://github.com/microsoft/guidance,
          <year>2023</year>
          . [25]
          <string-name>
            <given-names>N.</given-names>
            <surname>Computing</surname>
          </string-name>
          ,
          <string-name>
            <given-names>R.</given-names>
            <surname>Louf</surname>
          </string-name>
          , Outlines: Genera-
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          <source>normal-computing/outlines</source>
          ,
          <year>2023</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>