1. Introduction

Towards Consistent Language Models Using Declarative Constraints

Jasmin Mousavi

Arash Termehchy

0 0 Oregon State University , 1500 SW Jeferson Ave, Corvallis, OR 97331 , United States

Large language models have shown unprecedented abilities in generating linguistically coherent and syntactically correct natural language output. However, they often return incorrect and inconsistent answers to input questions. Due to the complexity and uninterpretability of the internal learned representations, it is challenging to modify language models such that they provide correct and consistent results. The data management community has developed various methods and tools for repairing inconsistent datasets. In these methods, users specify the desired properties of data in a domain in the form of high-level declarative constraints. This approach has provided usable and scalable methods to delivering consistent information from inconsistent datasets. We propose to build upon this success and leverage these methods to modify language models such that they deliver consistent and accurate results. We investigate the challenges of using these ideas to obtain consistent and accurate language models.

eol>large language models declarative constraints consistent modeling model repair

1. Introduction

One approach is to fine-tune the LLM on a set of domain-specific data sources to improve the quality of Large language models (LLMs) have shown unprece- its answers for questions in a given domain [4]. Nonethedented abilities in processing natural languages [1, 2]. less, it has shown that these methods may also lead to They efectively generalize to perform various tasks with many inaccurate answers [5]. This is, in part, due to the few or no training examples. Thus, there is a rapidly fact that fine-tuning is inherently under-specified and growing interest in using them to solve data-driven prob- may not suficiently modify the model to eliminate its lems, such as, interactive question answering. already learned spurious information. Another approach

Nonetheless, LLMs often provide incorrect answers to is to augment LLMs with additional and potentially relinput queries and perform inaccurate inferences [3, 2]. evant information from external data sources [6, 7, 8]. Some studies indicate the recent LLMs provide up to These methods often add extra information to the con40% erroneous answers to factual questions [2]. These text considered during pretraining. This line of research erroneous results are important obstacle for wide-spread have improved the accuracy of LLMs to a limited degree use of LLMs in real-world applications. as it does not address the core issue of having spuri

To address the problem of inaccurate answers returned ous and incorrect information in LLMs. It is not clear by LLMs, we should recognize that LLMs are not knowl- whether adding more relevant information eliminate inedge bases, but rather probabilistic or approximate accurate information stored in the model. Moreover, it is models of factual information. LLMs may over- often challenging to find suficiently many relevant data generalize patterns and relationships observed in the sources, particularly for long-tail entities. sub-sequences of pretraining documents, which might It is challenging to ensure that an LLM learns accurate lead to returning spurious relationships and inaccurate generalizations and returns correct answers as it may results. The uninterpretable mixture of linguistic pat- require perfect knowledge of unobserved data. Nonetheterns and factual information has made it challenging to less, we may be able to restrict its pretrained representaeliminate incorrect information. It is in sharp contrast to tion to adhere to semantic constraints in the domain traditional approaches to database querying in which the to avoid generating incorrect results. user interface, e.g., query language, is clearly separated This is akin to the problem of cleaning databases from the source of the information, e.g., databases. to satisfy a set of declarative semantic constraints [9]. Databases often contain data that does not comply with Joint Workshops at 49th International Conference on Very Large Data Bases (VLDBW’23) — Workshop on LLMs and Databases (LLMDB’23), the semantic constraints in their domains. For example, August 28 - September 1, 2023, Vancouver, Canada a person might not have any social security number or $ mousavij@oregonstate.edu (J. Mousavi); have more than one in a human resource database. The termehca@oregonstate.edu (A. Termehchy) usual query processing methods might return inaccurate https://web.engr.oregonstate.edu/~termehca/ (A. Termehchy)

© 2023 Copyright for this paper by its authors. Use permitted under Creative Commons License results over incomplete or inconsistent databases. The CPWrEooUrckReshdoinpgs IhStpN:/c1e6u1r3-w-0s.o7r3g ACttEribUutRion W4.0oInrtekrnsahtioonpal (PCCroBYce4.0e).dings (CEUR-WS.org) data management community has developed a unified, usable, and scalable approach to repairing inconsistent data to comply with declarative semantic constraints [9].

Instead of writing long and complex imperative programs to check inconsistencies and repair the data, users specify the properties of the consistent dataset succinctly in a high-level declarative language. There are several types of constraints based on the model of the data, e.g., functional dependencies for relational data or description logic rules for RDF data. They are usually subsets of first order logic that are suficiently expressive to capture important knowledge in the domain yet not too expressive to make reasoning intractable. Hence, data systems may check incompatibilities or redundancies in constraints eficiently. These constraints may also be learned from high-quality datasets in the domain.

In this paper, we propose a novel approach to reduce inconsistencies in LLMs using high-level declarative constraints. We believe that the success of using declarative constraints to provide reliable information in data management indicates that our proposed approach has the potential to deliver a usable and scalable method for creating and maintaining reliable and consistent LLMs. This, in turn, enables users to leverage LLMs in real-world applications with high confidence and accuracy.

We also discuss challenges of using high-level declarative constraints to reduce inconsistencies in LLMs. Specially, it is not clear how to enforce declarative constraints in an LLM eficiently. It might be challenging to find correspondence between the symbolic declarative constraints and information in the continuous representation learned by LLMs. We investigate how to leverage existing ideas in data cleaning and management [9] and current methods to embedding structured information [10, 11, 12] to address this problem. Since pretraining and ifne-tuning are often time-consuming and computationally expensive, we also investigate methods of updating a pretrained LLM that ensures it follows a set of constraints.

2. Creating Consistent Models Using Pretraining & Fine-tuning

Since LLMs are created using pretraining, it is natural to consider methods that incorporate semantic constraints during pretraining to create consistent LLMs. Nonetheless, pretraining usually takes long time and substantial computational resources. Researchers often use a relatively fast process called fine-tuning to modify a pretrained LLM [8]. During fine-tuning, the LLM is trained with additional information using its pretrained weights as initial values. In this section, we explore methods for creating or modifying an LLM so that complies with a set of constraints using pretraining and fine-tuning. The semantic properties and constraints in a domain are often represented in form of ontologies [13]. In a nutshell, an ontology consists of a set of facts, where each fact is a triple in the form of (subject, relationship, object), and a set of constraints on these facts. The triples in an ontology introduce concepts, e.g., Person, and their instances, e.g., Obama. They also represent relationship between diferent concepts in the domain, e.g., President is-a Person. Constraints in an ontology lay out the conditions that concepts and relationships must follow, e.g., is-a has the transitive property. Constraints are usually expressed in a subset of rfist order logic, e.g., description logic. Generally speaking, each constraint establishes that if some concepts satisfy certain conditions, i.e., premise, they must satisfy other conditions, i.e., conclusion. For instance, for is-a relation, we have for all concepts , , , if (, − , ) and (, − , ), then (, − , ).

It is important for an LLM to encapsulate both the facts and the constraints on those facts in a domain to provide consistent results. An LLM might not learn the facts from the textual data over which it is pretrained. It could be because some facts are not in the text or do not appear in closely related text spans and contexts. Constraints in an ontology represent semantic meaning of concepts and relationships in the domain. This information does not often appear explicitly in the data used to pretrain LLMs, therefore, LLMs might not learn them during pretraining.

Thus, our goal is to create LLMs that contain and follow both facts and constraints in a given ontology. To simplify our exposition and because each fact can also be represented as a special type of constraint, unless otherwise noted, we refer to both facts and constraints in an ontology as constraint.

2.2. Mixing Constraints with Training Data

Incorporating this structured information into LLMs poses challenges since LLMs are trained on unstructured data. One may supplement the training data with textual ontology information, e.g., Obama is a President. However, translating facts and constraints into text introduces two problems. First, in domains containing numerous semantic constraints, the augmented training data may exceed maximum sequence lengths (commonly restricted to 512 in most models). Second, converting structured data into unstructured text may cause the model to view this information merely as additional context, without preserving higher-order constraints vital for comprehending semantics of concepts in the domain.

To overcome these issues, constraint reduction techniques can be applied. One method involves reasoning over the constraints to find a minimal set [ 14], but does not guarantee that the augmented input will conform Alongside traditional LLM objectives, e.g., masked obto the maximum sequence length. Another approach jective tasks, one can integrate constraint objective tasks is to encode the ontology information into an embed- and constraint embeddings during pretraining. These ded representation using an LSTM [6], integrated via a methods capture the ontology’s structural information, gating function. This allows the LLM to control what resulting in a model that is consistent with domaininformation augments the input, successfully limiting specific constraints. Given an ontology and text docthe sequence length. However, it may not be optimal for uments, constraint objective tasks and constraint embedincorporating constraints, as it may cause information dings can also be used for fine-tuning. However, these loss and is more apt for enhancing input with extra facts, techniques may prove more efective if implemented durrather than filtering incorrect information. ing the pretraining process.

These methods fall short of incorporating the ontology in a way that preserves its semantic information, highlighting the dificulty of integrating high order con- 3. Model Repair straints into LLMs.

2.3. Retaining Constraint Information

To ensure that a database complies with a constraint, we often find the information in the database that do not follow the constraint and update them so the database satisfies the constraint. One may adopt this approach to repair a pretrained model so it satisfies a set of given constraints. In other words, one may find the portion of the model responsible for representing a constraint or lack thereof and update them if necessary so that the resulting model satisfies the constraint. As opposed to representing information in a database, factual information is stored in an LLM implicitly and through some pretrained weights in a model. Hence, it is dificult to ifnd and revise the factual data that violates a set of given constraints in an LLM. In this section, we describe two approach to repairing pretrained models and discuss their challenges.

Constraint Embedding. Ideally, the representation learned by a LLM should capture the structural information present in the semantic constraints of an ontology. Geometric embeddings (e.g. box, circle, cone) have been widely explored for learning representations of graph structures such as ontologies and knowledge bases [15, 11, 16, 10, 12]. For instance, if an ontology has the constraint that President is-a Person, the geometric embedding for Person should contain the geometric embedding for President, reflecting the transitive property is-a and that President is a subset of Person. These embeddings preserve the structural properties and relationships in an embedded space, ensuring that the output representations maintain the specified constraints.

When training an LLM, one can incorporate geometric 3.1. Fact-based Repair or constraint embeddings for unstructured text data in order to retain information from ontologies. If the on- There has been some recent success in updating facts reptology data is consistent and the model learns a perfect resented in an LLM [18]. Each update aims at changing constraint embedding, it should respect the facts and the object in a given triple in form of (subject (), relation constraints within the domain. However, since this is (), object ()) to a new object ′. These methods first unlikely, it may be necessary to apply optimization tech- ifnd the weights responsible for representing and its niques to the objective function. Such techniques can relationship to in the model. They then modify these help facilitate LLMs to learn representations that efec- weights so that the model represents the new object ′ tively capture higher order relationships and constraints in the fact with high probability. that extend beyond the training domain. Building upon this line of work, one may ensure that an LLM satisfies a set of constraints by finding and modi

Constraint Objective Task. Since the ontology is a source of knowledge, then it can also be used to train the fying the pretrained weights that represent the facts that LLM directly. External knowledge can be created from violate the constraints. An algorithm to check whether the ontology by extracting triples in the form of rich text an LLM satisfies a given constraint could be as follows. spans, thereby providing more information about con- First, the algorithm samples a set of facts that follow the straints to the model. Using this data, one may construct a constraint from the ontology. For each instance of a conword prediction or masking objective that aligns with the straint, it will prompt/query the LLM to check whether external knowledge of semantic constraints. One strat- and how the LLM represents the facts in the instance. egy is type modeling [17], where entities are replaced If the LLM’s representations of the facts in the instance with their type, and the model predicts the entity type for violate the constraint, the algorithm modifies the repthe next word or word span. This idea can be extended resentations so they follow the constraint. The larger to a masking objective, where the model predicts masked the set of samples is, the more likely the repaired model types in the output. satisfies the constraint. Users can change the size of the

4. Related Works

sample based on their available time and resources as well as desired confidence for satisfying constraints.

This algorithm might require a large number of up- Lexical Constraints for Language Models. There has dates to the model, which could be time-consuming. been recent efort on limiting the output of LLMs so they Moreover, since facts are represented implicitly in the follow given syntactical patterns, e.g., not contain certain model, the aforementioned methods might not always keywords [5, 19, 20]. In these systems, users write (imperifnd the updates that modify a fact to its desired form. ative) programs that detect some invalid patterns in the To address these challenges, one may find a minimal set output of LLMs. These systems, then, use constrained opof facts and their corresponding update operations such timization or probabilistic inference over the sequences that modifying their representations in the model will generated by the LLM to reduce the probability of the most likely create a model that follows the constraint. outputs with invalid patterns. These eforts are steps in The repair algorithm, then, will update the weights in the right direction but fall short of providing a usable the model for facts in this minimal set. and scalable method to deliver consistent information

It is known that there are often many possible mod- over LLMs. First, they do not generally support semantic ifications of an inconsistent dataset to satisfy a set of constraints. Second, users may have to write multiple constraints. It is challenging to maintain and query all and possibly long programs to clean up the output of the these repairs of databases. Hence, researchers have pro- model. As some domain may have numerous constraints, posed heuristics to choose a few of these repairs, e.g., the it is challenging to develop and maintain these programs. ones that difer the least from the original database. The Users must check manually whether these programs are same problem might also happen in repairing models. consistent with each other and there is no redundancy One may use similar approaches to reduce the number across diferent programs. Third, they are usually applied of repaired models. only during the decoding stage, therefore, the LLM may still learn and represent spurious relationships. As it is 3.2. Constraint-based Repair challenging to interpret learned representations in LLMs, it is dificult to control all the implications of their learned It may take a long time to update a large number of facts imprecise information. For instance, the learned spurious in a model [18]. Thus, the approach of fact-based repair relationship about one entity might impact how an LLM may eficiently modify the model to satisfy constraints answers a question about a diferent but related entity. As with a relatively few instances, e.g., facts in the ontology, opposed to this line of work, we propose an end-to-end but it might be computationally challenging to do for approach that uses declarative semantic constraints to constraints with many instances. Also, if a constraint reduce inconsistent information in LLMs. has many instances, this approach might deliver many Self-Consistency of Language Models. It is known possible model repairs even after applying the aforemen- that language models produce contradictory answers to tioned heuristics to reduce the space of possible repairs. the questions that seek the same information but phrased Therefore, it will be challenging to query or train these diferently. Researchers have proposed methods to admodels for a given task. dress this issue by prompting the language model to

LLMs generalize input data during pretraining. They critique and refine its own output during inference [ 21]. have also been successfully used to generate data that This method prompts the language model with diferclosely resembles real-world data and train accurate mod- ently phrased questions and builds a (weighted) model els using a relatively few training examples for various over answers to infer the most likely result. We, howtasks. Hence, we hypothesize that they might represent ever, mainly focus on ensuring that the language model some constraints in the domain in whole or in part. If follows semantic constraints. this hypothesis is true, an LLM does not satisfy some Extracting Knowledge from Language Models. Reconstraints because the LLM might represent them in- searchers have proposed methods to extract generic statecompletely or erroneously. ments or factual knowledge from language models using

Hence, to ensure that the model satisfies a constraint, prompt engineering and human supervision [22]. The instead of repairing all facts that violate the constraint, prompts are constructed in a way that encourages sucone might change directly the portion of the model that cinct factual statements. They use human labeled data represents a constraint. This portion might be signifi- to detect inaccurate outputs and fine-tune the language cantly smaller than the parts that represent the violating model. However, it might be challenging to collect a facts. Thus, it might be substantially faster and easier to suficient amount of training data to extract accurate ifnd the weights in the model responsible for incomplete statements. or erroneous representation of the constraint than doing Querying Language Models. There has been some the same for all facts that violate that constraint. recent efort to design programming languages for prompting large language models, i.e., language model programming [23, 24, 25]. There are generally domainspecific programming languages to extract information from and control the output of a large language model to satisfy the users’ input hard constraints, akin to where conditions in SQL queries. Some of these languages resemble database query languages, e.g., SQL [24]. These languages aim at making it easier to query and prompt and optimize the number of calls to large language models. However, these languages do not generate consistent results conditioned on domain constraints. Thus, they may return answers that violate semantic constraints in the domain.

ICLR , 2023 . [19]

Zhang ,

Dang ,

Peng , G. Van den Broeck,

ation, in: Proceedings of the 40th International Con-

ference on Machine Learning (ICML) , 2023 . URL:

https://arxiv.org/pdf/2304.07438.pdf. [20] A. K. Lew , T. Zhi-Xuan, G. Grand , V. K. Mans-

language models using probabilistic programs , 2023 .

arXiv:2306 . 03081 . [21]

Madaan ,

Tandon ,

Gupta , S. Hallinan,

Iterative refinement with self-feedback , 2023 .

arXiv:2303 . 17651 . [22]

Bhagavatula ,

J. D.

Hwang ,

Downey , R. L.

West ,

Choi , I2d2: Inductive knowledge dis-

tillation with neurologic and self-

imitation , 2023 .

arXiv:2212 . 09246 . [23]

Beurer-Kellner ,

Fischer ,

Vechev , Prompt-

Programming Languages 7 ( 2023 ) 1946 - 1969 . URL:

https://doi.org/10.1145%2F3591300. doi: 10 .1145/

3591300. [24] Microsoft , S. Lundberg , Guidance: A guidance

https://github.com/microsoft/guidance, 2023 . [25]

Computing ,

Louf , Outlines: Genera-

normal-computing/outlines , 2023 .