=Paper=
{{Paper
|id=Vol-3203/short8
|storemode=property
|title=Flix: A Meta Programming Language for Datalog
|pdfUrl=https://ceur-ws.org/Vol-3203/short8.pdf
|volume=Vol-3203
|authors=Magnus Madsen,Jonathan Starup,Ondřej Lhoták
|dblpUrl=https://dblp.org/rec/conf/datalog/MadsenSL22
}}
==Flix: A Meta Programming Language for Datalog==
Flix: A Meta Programming Language for Datalog Magnus Madsen1 , Jonathan Lindegaard Starup1 and Ondřej Lhoták2 1 Department of Computer Science, Aarhus University, Denmark 2 David R. Cheriton School of Computer Science, University of Waterloo, Canada Abstract We illustrate how Flix can be used as a powerful and expressive meta programming language for Datalog. 1. Introduction Flix is a functional-first, imperative, and logic programming language [1, 2, 3]. Flix has algebraic data types, pattern matching, higher-order functions, channel and process-based concurrency, parametric polymorphism, type classes, and higher-kinded types. The Flix type and effect system is based on Hindley-Milner and supports complete type inference. A unique feature of Flix is its support for Datalog programs as first-class values. Datalog values can be stored in local variables, passed as arguments to functions, returned from functions, stored into data structures, composed with other Datalog values, and have their minimal model queried. Thus Flix can be viewed as a powerful meta programming language for Datalog. Datalog expressions and their values are structurally typed. The type system ensures that predicate symbols, and their terms, have consistent types within the same Datalog program. Polymorphism enables programmers to write modular and reusable families of Datalog programs. The type system also ensures that every Datalog value constructed at runtime is stratified. Flix extends Datalog from constraints on relations to constraints on lattices. Lattice semantics makes it possible to declaratively express many applications, including program analyses and two-players games, which cannot readily be expressed in pure Datalog. Flix also supports stratified negation and a modularity mechanism which we call local predicates. During our work, we have discovered a new programming pattern, which we dub the inject- program-query pattern, where a function is implemented by turning its arguments into Datalog facts, solving a Datalog program, and returning a subset of the minimal model as the result. Flix embraces the essence of Datalog: Datalog program values are declarative, they look like ordinary Datalog clauses, they have a minimal model, and they are solvable by standard techniques such as semi-naïve evaluation. We believe that embedding Datalog inside a general-purpose programming language, such as Flix, enables programmers to use Datalog for the specific programming tasks where it really shines: to declaratively express and solve fixed-point computations on relations and lattices with the advantages of a static type system, a large standard library, and excellent IDE support. Datalog 2.0 2022: 4th International Workshop on the Resurgence of Datalog in Academia and Industry, September 05, 2022, Genova - Nervi, Italy $ magnusm@cs.au.dk (M. Madsen); jls@cs.au.dk (J. L. Starup); olhotak@uwaterloo.ca (O. Lhoták) © 2022 Copyright for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0). CEUR Workshop Proceedings http://ceur-ws.org ISSN 1613-0073 CEUR Workshop Proceedings (CEUR-WS.org) 202 Magnus Madsen et al. CEUR Workshop Proceedings 202–206 2. Programming with Datalog in Flix We illustrate, with a series of small examples, how Flix can be used as a typed meta programming language for Datalog. All examples are runnable Flix programs. Inject–Program–Query. Flix is primarily a functional programming language. Flix programs are structured around immutable data structures and functions that operate on them. But in Flix we can write a function whose internal implementation uses Datalog to compute its output from its inputs. For example, we can write a function that when given a graph computes its transitive closure: def clo (g: List [( Int32 , Int32 )]): List [( Int32 , Int32 )] = let db = inject g into Edge ; let pr = #{ Path (x , y) :- Edge (x , y ). Path (x , z) :- Path (x , y), Edge (y , z ). }; query db , pr select (x , y) from Path (x , y) The clo function takes a graph, represented as a list of edges, computes its transitive closure, and returns the result as a list of edges. The function is implemented as follows: First, we convert the list of edges into a set of Datalog Edge facts and we store the result in the local variable db. In Datalog terminology, we are constructing the EDB from the input list. This is needed because we have to associate the predicate symbol Edge with the list of tuples in g and to build an index on them. Second, we define a Datalog value which computes the transitive closure of the Edge relation and we store the program in the local variable pr. In Flix, the special syntax #{...} delimits a Datalog value. Third, and finally, we use the query construct to compose db and pr into one Datalog program value, to compute its minimal model, and to extract all pairs (x, y) from the Path relation and return them as a list. Note that the minimal model is not computed until we explicitly use query. We name this programming-style the inject-program-query pattern. We could have written the clo function in a functional– or imperative–style, but neither would have been as short, elegant, clear, obviously correct, and efficient as the Datalog style. That’s the power of declarative logic programming! Polymorphism. In the previous example, the signature of the clo function was: def clo (g: List [( Int32 , Int32 )]): List [( Int32 , Int32 )] but nothing in the implementation of the function requires the graph to have integer vertices. In Flix, Datalog program values may be polymorphic in their term types, and hence we could have used the more general signature: def clo (g: List [(t , t )]): List [(t , t )] with Eq [t], Order [t] This signature works for any graph provided that the polymorphic type t conforms to the Eq and Order type classes which enable the Datalog solver to compare terms for equality and to build indexes on them. With the polymorphic signature, we can reuse the clo function for graphs with different types of vertices. For the remainder of the paper, for brevity, we will omit the Eq and Order constraints, but they are of course required when polymorphic types are used. The Flix standard library include instances of these type classes for all base types and for data structures such as lists, maps, and sets. 203 Magnus Madsen et al. CEUR Workshop Proceedings 202–206 Integration with Lexical Scope. In Flix, Datalog values are not islands; they can reference the lexical scope of the functional language. In other words, a Datalog value can capture variables from the lexical scope. For example, we can write a function that when given a graph, a source vertex, and a destination vertex, computes if there is path from the source to the destination: def reach (g: List [(t , t )] , s: t , d: t ): Bool = let db = inject g into Edge ; let pr = #{ Path (x , y) :- Edge (x , y ). Path (x , z) :- Path (x , y), Edge (y , z ). Yes () :- Path (s , d ). }; query db , pr select () from Yes () |> nonEmpty The reach function takes a graph g, a source vertex s, and a destination vertex d as input, and returns returns true if there is a path from s to d in the graph g. The last rule in the Datalog program states that the nullary predicate Yes holds if there is a Path(s, d) fact where s and d are not Datalog variables, but rather Flix program variables bound by the lexical scope, i.e. the function arguments. Thus, at runtime, the Datalog program has concrete values for s and d. Integration with Expressions. Datalog values may also refer to expressions: def clo (g: List [(t , l , t )] , p: l -> Bool ): List [(t , t )] = let db = inject g into Edge ; let pr = #{ Path (x , y) :- Edge (x , w , y), if p(w ). Path (x , z) :- Path (x , y), Edge (y , w , z), if p(w ). }; query db , pr select (x , y) from Path (x , y) This clo function takes a labelled graph as input, represented as a list of triples, and returns its transitive closure as a list of edges. The function also takes a predicate function p : 𝑙 → Bool which determines when an edge is active. We use the predicate in the body of the two rules, writing if p(w), illustrating that Datalog rules may refer to Boolean expressions in the functional language of Flix. In particular, p is not a logical predicate, but a reference to a Flix expression. Datalog Programs as First-Class Values. We can refactor the first example into multiple functions for better modularity and reuse: def paths (): #{ Edge (t , t), Path (t , t) | r } = #{ Path (x , y) :- Edge (x , y ). Path (x , z) :- Path (x , y), Edge (y , z ). } Here the paths function returns a Datalog value. The (row) polymorphic type captures that the value contains Edge and Path predicate symbols that are of arity two and whose term types are of type t. The row variable r allows us to combine this value with other Datalog values that may contain additional predicate symbols. We can rewrite the clo function to use the new paths function: def clo (g: List [(t , t )]): List [(t , t )] = let db = inject g into Edge ; query db , paths () select (x , y) from Path (x , y) 204 Magnus Madsen et al. CEUR Workshop Proceedings 202–206 Local Predicates. We often want to write Datalog program values where some predicate symbols are local to the computation. Flix supports such local predicates, which are given unique, fresh names at runtime, allowing us to hide them from the type of a Datalog value. We can think of such local predicates as similar to local variables. For example, we can write: def unconnected (): #{ Edge (t , t), Unconnected (t , t) | r } = #( Edge , Unconnected ) -> #{ Node (x) :- Edge (x , _ ). Node (y) :- Edge (_ , y ). Path (x , y) :- Edge (x , y ). Path (x , z) :- Path (x , y), Edge (y , z ). Unconnected (x , y) :- Node (x), Node (y), not Path (x , y ). } The unconnected function returns a Datalog value which computes the set of pairs of vertices that are not connected by any path of edges. The Edge facts are the input and the Unconnected facts are the output of the Datalog value. The predicates Edge and Path are local. The key construct is #(Edge, Unconnected) -> e which is reminiscent of a lambda abstraction in that it evaluates 𝑒 to a Datalog value 𝑣 and then gives fresh names to every predicate symbol other than Edge and Unconnected in 𝑣. This renaming, which is similar to alpha renaming, hides the Node and Path predicates symbols ensuring that they are completely local to the Datalog value, i.e. they are given unique names that do not overlap with any other names. From Relations to Lattices. Datalog clauses are constraints on relations, but Flix also supports constraints on lattices. Given a lattice 𝐿 = (𝐸, ⊥, ⊑, ⊔, ⊓) implemented as instances of the type classes LowerBound, PartialOrder, JoinLattice, and MeetLattice we can associate one or more predicate symbols with the lattice. For example, we can define a new type enum N(Int32) where the bottom element is N(Int32.maxValue()) and the partial order is ≥. In other words, the lattice is turned “upside down”. We can use this lattice to write a Datalog program that computes the single-source shortest-distance (SSSD) from an origin vertex in a graph: def sssd (g: List [(t , Int32 , t )] , o: t ): Map [t , N] = let db = inject g into Edge ; let pr = #{ Dist (o; N (0)). Dist (y; add (d1 , d2 )) :- Dist (x; d1 ), Edge (x , d2 , y ). }; query db , pr select (x , d) from Dist (x; d) |> toMap The sssd function returns a map from every vertex in the graph g to its shortest distance from the origin vertex o. The use of the semi-colon in the head atoms indicates that the Dist predicate symbol is given a lattice interpretation where the lattice is determined by its type. The add function (omitted for brevity) simply adds an integer and a N(x) lattice element. Flix functions used in head atoms must be strict and monotone according to the lattice ordering. Intuitively, the program can be understood as follows: Initially, the distance to every vertex in the graph is ⊥ (i.e. Int32.maxValue()). The fact Dist(o; N(0)) asserts that the distance to the origin is zero (i.e. ⊤). The rule asserts that if a vertex x is reachable with distance d1 and there is an edge from x to y with distance d2 then the distance to y is at least the sum of the two distances d1 and d2. But since the lattice is upside down, “at least” is “at most”. Thus the program computes the shortest distance to every vertex from the origin. 205 Magnus Madsen et al. CEUR Workshop Proceedings 202–206 Type System. Datalog expressions are structurally typed, i.e. predicate symbols, and the types of their terms, never have to be declared; they are simply used and their types are inferred. Every Datalog expression is typed with a row type that keeps track of the term types of every predicate symbol. Thus, in every Datalog program value, every predicate symbol is used with consistent arity and term types. Minimal Models. While Flix is Turing-complete, and hence evaluation of an expression may diverge, if a Datalog expression reduces to a Datalog value then that value has a minimal model. Moreover, a Datalog program enriched with lattice semantics has a minimal model if the lattice components indeed form a lattice and if all operations on the lattice are strict and monotone. Stratified Negation. Flix supports stratified negation. The Flix type and effect system ensures — at compile-time — that any Datalog value constructed at runtime is stratified. If not, the Flix compiler rejects the program with a compilation error and reports the negative cycle in the dependency graph. Datalog Engine. Flix programs compile to Java bytecode and run on the JVM. The Flix Datalog engine is a Just-In-Time (JIT) compiler written in Flix itself. At runtime, a Datalog program value is compiled to the Relational Algebra Machine (RAM), an intermediate representation (IR) comparable to C with relational operators, which is then optimized and executed. 3. Ecosystem & Tooling Flix has a website that describes the language (flix.dev), online documentation (doc.flix.dev), API documentation à la Javadoc (api.flix.dev), an online playground (play.flix.dev), and a fully- featured Visual Studio Code extension. Flix comes with an extensive standard library that includes common functional data structures such as Option, Result, List, Set, and Map. The library spans more than 30,000 lines of code and offers more than 2,600 functions. Flix also has a Java FFI making it possible to reuse large parts of the Java Class Library. Flix is ready for use, open source, and freely available at: https://flix.dev/ 4. Conclusion Flix is a functional, imperative, and logic programming language with support for Datalog programs as first-class values. We have illustrated how Flix can be used as a powerful, expressive, and typed meta programming language for Datalog. We hope that Flix will help bring Datalog programming to a broader audience. References [1] M. Madsen, O. Lhoták, Fixpoints for the Masses: Programming with first-class Datalog Constraints, Proc. of the PACMPL (2020). [2] M. Madsen, J. van de Pol, Polymorphic Types and Effects with Boolean Unification, Proc. of the PACMPL (2020). [3] M. Madsen, M. Yee, O. Lhoták, From Datalog to Flix: A Declarative Language for Fixed Points on Lattices, in: Proc. of Programming Language Design and Implementation (PLDI), 2016. 206