<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>From Logic Programming to Programming in Logica: A First-Course in Declarative Data Science &amp; Engineering</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Evgeny Skvortsov</string-name>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Yilin Xia</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Shawn Bowers</string-name>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Bertram Ludäscher</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Gonzaga University, Department of Computer Science</institution>
          ,
          <addr-line>Spokane, WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Google LLC</institution>
          ,
          <addr-line>WA</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>University of Illinois Urbana-Champaign, School of Information Sciences</institution>
          ,
          <addr-line>IL</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>While imperative programming is prevalent in software engineering and education, the declarative nature of logic programming can play a vital role in helping students further develop problem-solving and conceptual-modeling skills. Logica, an open-source logic programming language, extends Datalog by incorporating support for numerical computations, including aggregation. It serves as a comprehensive execution environment, compiling programs into iterative SQL queries that can be executed locally via DuckDB or SQLite, or in the cloud through PostgreSQL and Google BigQuery. Logica can also be accessed from within Python and Jupyter Notebooks, allowing it to be used seamlessly within typical data-science tool chains. These intuitive features make Logica an accessible and practical tool for students to learn logic programming. In this paper, we propose a new course, complete with lecture materials centered around Logica, aimed at teaching students declarative data science and engineering while covering topics such as knowledge representation and reasoning, database queries, and constraint-based programming.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1. Introduction</title>
      <p>
        Relational databases and data warehouses are at the center of modern software engineering and data
science. These systems are omnipresent, from SQLite databases running on devices, to single remote
server systems like PostgreSQL, to serverless data warehouses like Google BigQuery for distributed
and parallel query evaluation over thousands of servers. However, it is often not immediately obvious
when learning how to access these systems via SQL, SQL-like languages, or programmatic libraries
(such as LINQ [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] and Pandas [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]), that there exists a deep connection and dependence on logic and
logic programming. In Codd’s seminal paper on the relational model [3], he wrote:
The adoption of a relational model of data, as described above, permits the development of a
universal data sublanguage based on an applied predicate calculus. [...] Such a language
would provide a yardstick of linguistic power for all other proposed data languages, and
would itself be a strong candidate [...]
Today, SQL is used as the de-facto language for accessing the vast majority of relational data, either
explicitly through writing SQL by hand, or indirectly through object-relational models and programmatic
library calls. However, the need to develop and deploy more complex data analysis tasks has grown
considerably over the last several years with the popularity and wide-scale adoption of machine learning
and data-science approaches. For many of these analytics problems, it can be dificult to use SQL, e.g.,
when performing graph analytics, large scale statistical analyses, and the development and tuning of
custom machine-learning models. The reason SQL is not ideal in these cases is that they often require
complex recursive (iterative) solutions involving numerical computations, including aggregation. Logic
programming, which has many similarities to SQL’s underlying predicate calculus model, however, can
provide a more natural syntax and framework for implementing these complex data analysis tasks.
      </p>
      <p>Logica1 [4] is an open-source logic programming language that compiles to (iterated) SQL and allows
its users to exploit the power of modern relational databases from the comfort of a full-featured logic
programming syntax. Logica can be viewed as a (bottom-up) Datalog-like language, but with support
for numerical computation, including aggregation through recursion. In our experience as educators,
students with at least some training in logic programming (either via Prolog, Datalog, or Answer-Set
Programming) are able to quickly learn Logica and have a significant productivity boost from using
the logic-based programming model in Logica when compared to SQL. We have also observed the
opposite: experienced engineers without training in logic programming have a much more dificult
time quickly switching from SQL syntax to the syntax of Logica (and similar languages). To help
address this discrepancy, we propose a set of eleven modules [5], including examples and exercises,
that provide basic training in logic programming through Logica. Taken together, the modules form a
fully-developed course that can be taught at the graduate or advanced undergraduate level. Some or
all of the modules can also be used as components within other courses as well, e.g., within a block
on logic programming within a typical undergraduate programming language concepts course [6],
within a course on logic programming (e.g., where a portion of the course can focus on data science
applications), within a course on data science and machine learning, or within a traditional algorithms
course (where Logica can serve as the implementation language).</p>
      <p>While a significant amount of work has been done in the area of logic programming education, e.g.,
see [7, 8], we believe that the modules proposed here can provide (at least incremental) benefit to this
work. In particular, because Logica is integrated with a number of widely-used database systems, can be
used within Python and Jupyter notebooks, and focuses on supporting complex data analytics tasks, it
provides a practical logic-based language for the data-to-day tasks of a data engineer and data scientist.
This emphasis can further attract students who may not have considered or see the benefit of logic
programming (and declarative languages more broadly) otherwise.</p>
      <p>The rest of this paper is organized as follows. Section 2 gives an overview of each of the eleven
course modules. Section 3 describes the potential benefits of learning Logica and logic-programming
for data science and similar fields. We conclude with future work in Section 4.</p>
    </sec>
    <sec id="sec-2">
      <title>2. Course Overview and Organization</title>
      <p>The current iteration of the course is made up of eleven distinct modules, each of which can be divided
into one or more lectures (e.g., depending on the pace of the course and if it is ofered at a graduate or
undergraduate level). At the end of each module (except for the first and last), homework exercises
are given. Additionally, labs are also assumed to follow each module, e.g., allowing students to solve
problems within groups and to ask questions concerning the homework.</p>
      <p>Figure 1 shows the general themes and sequencing of the modules. As shown, the first four modules
form a basic introduction to Logica, the next two modules focus on Logica semantics and its relationship
to SQL, the next two modules provide extended examples as well as some of the software development
features (e.g., unit testing) supported by Logica, and the final three modules focus on advanced aspects
of programming in Logica. In addition, each module emphasizes either specific features of Logica
(colored in yellow), programming techniques in Logica (colored in green), or how Logica works “under
the hood” (colored in blue).</p>
      <p>The course described here is targeted at graduate students and advanced undergraduate students
studying computer science or a related major such as Information or Data Science. Students are expected
to have proficiency in at least one imperative programming language, have completed a college-level
discrete math course (that covers formal logic, set theory, etc.), and possess an understanding of basic
data science concepts. Ideally, students should have also taken an introductory course in database
management. However, it is possible for students without knowledge of SQL to complete the modules.
In such cases, a basic introduction to SQL should be provided before starting module 6, or module 6 can
be skipped.</p>
      <p>Module 1: Introduction. This module provides a general overview and introduction to logic
programming and Logica, highlighting that Logica runs on modern SQL engines, which enable it to easily scale
to large volumes of data. Simple examples of logic programs and running Logica are also given. In
particular, it is shown that Logica programs can be run as scripts or from within Jupyter notebooks.
Module 2: Basics – Rules, Data, and Aggregation. This module begins by introducing the concepts
of facts and rules. This introduction includes learning about conjunction and disjunction as well as
atomic and composite datatypes (lists and records). This module also covers predicate-level aggregation.
The material covered allows students to write a large number of simple, but practical programs (e.g.,
expressed as queries) in Logica.</p>
      <p>Module 3: The Secret Sauce – Aggregating Expressions and Functions. This module covers the
more advanced Logica features of aggregating expressions and functions. Aggregating expressions
provides a mechanism for performing a large number of practical operations over composite data
types. Functions are “syntactic sugar” for making logic programming more ergonomic and easier to
understand for students with knowledge of conventional programming languages, where functions
play a central role. This modules also introduces notions of injectable and concrete predicates. Logica
use a hybrid combination of top-down and bottom-up evaluation strategies. Predicates that correspond
to concrete tables are evaluated bottom-up and are called concrete predicates. Predicates that are like
function evaluations are essentially computed top-down. Technically, this is done by injecting them, i.e.,
replacing their call with their definition. Hence the name injectable predicates.</p>
      <p>Module 4: Imperatives and Database Connections. In natural language, linguists diferentiate
between indicative and imperative moods [9]. Sentences in indicative mood state how things are
whereas sentences in imperative mood instruct the listener to do something. We use this diferentiation
in Logica where common predicates that contain the data are in an indicative mood, while predicates in
an imperative mood tell the compiler directly to do something. Most importantly, this module details the
imperative predicate @Ground(P), which associates the predicate P to a physical table in the database.
This predicate provides a means for writing Logica program results back into the database where it can
be accessed from other systems. The module also describes imperative predicates that allow artificially
limiting the number of rows in a table as well as syntax for sorting answers (rows) via Logica. We note
that the material in this module is simpler than in modules 2 and 3. While the module covers how
predicates are connected to the real world (through database), it also gives them some “breathing room”
to internalize logic programming that they were thrown into in the previous two modules.
Module 5: Formal Semantics. Because many logic programs consist of a simple (high-level) syntax
forming a set of relatively simple (if-then style) rules, we expect that students will be able to solve a
large number of basic problems in Logica without fully understanding its underlying formal semantics.
However, exposing students to the formal semantics of Logica is helpful in that they can confidently
reason about complex problems. In this module, we describe multi-sets and how predicate expressions
construct new multi-sets from existing ones. Note that, however, at this point in the course, we are not
yet considering recursive programs (which introduce additional complexity to the underlying semantics,
and described in later modules).</p>
      <p>Module 6: Logica Compilation to SQL. As mentioned above, the course assumes that students
are familiar with database concepts and SQL. In this module, students are taught how non-recursive
rules are compiled to SQL by Logica. Compilation of a single non-aggregating conjunctive rule is
straightforward, resulting in a single (trivial) SQL statement. Logica programs involving disjunctive
rules and aggregations result in (slightly) more complicated SQL. The translation of Logica to SQL for
these classes of rules is simple enough to explain how it is performed by Logica within a single lecture.
For courses where the majority of students do not have experience in SQL, additional material can
be added (e.g., as a separate lecture prior to starting the module) and/or additional lab time can be
spent helping students to learn the basics of SQL. Alternatively, module 6 can be safely skipped without
impacting the rest of the course. Note that one purpose of this module is to connect the dots and show
to students the strong connection between logic programming and relational databases, which is often
only done in graduate-level database theory courses.</p>
      <p>Module 7: Functors and Unit Testing. While functors and unit testing are distinct topics, they are
both relatively small and the unit testing features in Logica rely on functors. Thus, it is convenient
to combine them into a single module. Functors in Logica are second-order functions that allow
for the reuse of large chunks of logic among tasks. A functor is similar to the notion of first-class
functions supported today in most modern programming languages in which functions can be passed as
parameter values to other functions (and then be called within the receiving function). While first-class
functions support a greater degree of abstraction and code reuse, they can make debugging and design
more complicated. Logica takes a middle ground: In a functor call, predicates can be used as values,
however, there are limitations imposed on functor calls prohibiting their use with recursion. Thus, while
functors provide less expressive power than typical first-class functions, they are still able to cover many
practical situations for logic reuse. Like design and reuse, unit testing is also an important software
engineering practice that is critical for establishing basic correctness and maintaining (e.g., through
regression testing) large reliable code bases. Logica supports unit testing natively using additional
syntax. This module first describes the notion of functors, their support in Logica, and then unit testing
with examples.</p>
      <p>Module 8: Recursive Programming. This module begins by describing the use, benefits, and examples
of recursion in logic programming. The module starts by explaining how to write recursive programs
in Logica, and then takes a hands-on approach by walking students through a number of more involved
problems and example solutions leveraging recursive programs. Discussion of how Logica performs
evaluation is provided in the next module.</p>
      <p>Module 9: Recursion in Depth. This module describes how Logica evaluates recursive predicates.
Recursive predicates in Logica are required to be concrete and thus they are always evaluated bottom
up. The bottom-up evaluation is carried out in Logica through a simple iteration of rule application.
The explanation of recursive evaluation in Logica is expected to take approximately two thirds of
the lecture(s) devoted to the module. This includes going over additional examples of recursive logic
programs. While less commonly used for basic data analytics tasks, this deep dive into logical recursion
is useful as it helps build up problem solving skills and helps students become more eficient in solving
easier query-like problems. The techniques in Modules 8 and 9 are also beneficial for encoding more
advanced analytical algorithms and solutions that are becoming increasingly more common in modern
day data science and engineering.</p>
      <p>Module 10: Running Logica on BigQuery. In this module, students are shown how to get started
using Google Cloud Platform (GCP), the basics of BigQuery, and how to run Logica programs using
BigQuery via the GCP free tier. With little work students are able to run their programs over a large
number of machines, taking advantage of the serverless features of BigQuery. Like Module 4, this
module is not as involved technically as the previous modules, providing students with additional
“breathing room”.</p>
      <p>Module 11: Computational Complexity. This module describes how to estimate the computational
complexity of the programs written in Logica. A possible source of frustration when using the declarative
programming paradigm can be the dificulty in estimating the time-complexity of more-involved
programs as well as making the desired program run in a reasonable amount of time. This module helps
students to determine the space and time complexity of their programs and how to avoid common
programming patterns that can cause ineficiencies.</p>
    </sec>
    <sec id="sec-3">
      <title>3. Potential Benefits of Learning Logic Programming via Logica</title>
      <p>In this section, we further discuss how teaching declarative languages, logic programming, and Logica in
particular, can benefit students interesting in pursuing careers in computing, data science, and software
engineering.</p>
      <p>Preparing Students for Future Applications. Over the last several years there has been a resurgence
of interest in declarative data-oriented languages including Datalog, Prolog, SQL-like data access
languages for NoSQL systems, and graph-based languages such as SPARQL [10] and Neo4J’s Cypher
[11]. Researchers have also explored new and exciting areas of logic programming for a variety of
applications, such as data engineering and migration [12], causal reasoning [13], graph queries [14],
and recursive computations [15, 16] As the role of data analytics and data science increases within the
computer-science industry, and as the problems tackled continue to evolve and increase in complexity, by
learning data-oriented declarative languages, students will more easily be able to familiarize themselves
and use these (and future) languages throughout their career.</p>
      <p>Inclusive Programming Learning Environment. Students in the broader field of information
science, who originate from multiple disciplines such as psychology, social science, and traditional
library &amp; information science, typically have varying levels of programming expertise, which makes
it dificult for instructors to choose an appropriate programming language for their courses. Many
imperative programming languages, which seek to provide suficient power for arbitrary programming
tasks, make data-specific programming unnecessarily dificult (or at least act as an additional “hurdle”)
for students without formal training. In contrast, declarative languages are more inclusive because
they have typically been simplified to avoid syntactic elements, making them more similar to natural
languages, and designed to prevent users from becoming overwhelmed by syntax errors [17]. Much like
SQL for basic data management and analytical tasks, learning Logica can empower students to more
quickly solve complex data science tasks when compared to having to learn the details of imperative
languages first.</p>
      <p>Natural Method for Learning Knowledge Representation and Reasoning. Knowledge
representation and reasoning enables artificial intelligence systems to display a wide variety of knowledge
structures and relationships, making them crucial to information science and data science. As illustrated
by Levesque and Brachman [18], knowledge bases can typically be represented in first order logic,
allowing declarative languages such as Datalog to represent complex relationships, rules, and constraints
in a more natural manner. Despite the application contexts being in chemistry and physics, the cases
presented by Zhang et al. [19] demonstrated that declarative programming languages make it easy and
natural for students to transform domain-specific knowledge into logic programming rules and answer
questions based on the rules. This suggests that declarative languages have the potential to efectively
teach knowledge representation and reasoning skills for information and data science students.
Improve Problem Solving Skills. To implement solutions in declarative languages, students will need
to identify the problem’s fundamental objects, relationships, and constraints. Regarding the food chain
example provided by Zhang et al. [19], students were given four species including eagle, snake, rabbit
and carrot, along with two types of relationship—feedsOn(, ) and extinct(). The objective was to
identify all extinct species. Students must deduce the potential relationship between feedsOn(, ) and
extinct() from the descriptions provided to solve the problem. As the study later shows, declarative
languages indeed help students better understand science topics, which shows possibility that they
could assist students in the data sciences as well.</p>
      <p>Better Understanding of Logic and Formal Methods. Declarative languages are based on formal
logic principles. By studying these languages, students are exposed to concepts such as conjunction,
disjunction, negation, etc., which are necessary for comprehending formal methods. In addition,
declarative programming languages, such as SQL, Datalog, and Answer Set Programming, require
students to concentrate on defining the desired outcome rather than the process of accomplishing it
[20]. This strategy promotes critical thinking and enables students to gain a deeper understanding of
problems and their solutions (without getting bogged down in syntax or how to implement the solution
using imperative constructs), which is consistent with the fundamental principles of formal methods.
On top of that, declarative languages can help students develop a systematic and rigorous approach to
problem solving, which is essential for working with formal methods in computer science and prepares
students in the data sciences for success within their careers.</p>
      <p>Ease of Maintenance and Sharing. Declarative languages are typically more concise and simpler to
maintain than imperative languages, as they describe the desired outcome as opposed to the specific
steps required to achieve it. This attribute can result in code that is more maintainable, which is
advantageous in a professional setting [17]. Therefore, students studying the data sciences can use
these languages to more efectively communicate their problem-solving methods and collaborate with
individuals of varying levels of expertise.</p>
    </sec>
    <sec id="sec-4">
      <title>4. Conclusion and Future Work</title>
      <p>We propose a set of open-source educational modules for teaching logic programming using Logica, a
full-featured declarative (logic programming) language designed to leverage modern SQL engines and
to support data analytics and data science applications. The modules can be used within a self-contained
course on logic programming for (declarative) data science, and have the potential to be reused within a
number of other types of courses. An overview of the modules is provided and a brief discussion of the
potential benefits of using Logica to teach logic programming within the context of information and
data sciences are presented. As future work, we plan to leverage the modules in our existing courses,
collect data on their efectiveness, and continue to update and improve them based on our experience
and results.</p>
      <p>Another possible venue of the future work is using Logica in advanced courses about Logic
Programming and non-monotonic reasoning. In particular Logica can be viewed as a general non-monotonic
reasoning engine. We believe that theoretical connection of the extension of predicate caclulus with
aggregation and the non-monotonic reasoning could can be further improved by using Logica as a
formalism.
[3] E. F. Codd, A relational model of data for large shared data banks, Communications of the ACM
13 (1970) 377–387.
[4] E. Skvortsov, Y. Xia, B. Ludäscher, Logica: Declarative data science for mere mortals, in:
Proceedings of the International Conference on Extending Database Technology (EDBT), 2024, pp.
842–845.
[5] E. Skvortsov, Y. Xia, S. Bowers, B. Ludäscher, Logica Education Course, 2024. URL: https://tinyurl.</p>
      <p>com/LogicaCourse2024.
[6] A. N. Kumar, R. K. Raj, S. G. Aly, M. D. Anderson, B. A. Becker, R. L. Blumenthal, E. Eaton, S. L.</p>
      <p>Epstein, M. Goldweber, P. Jalote, D. Lea, M. Oudshoorn, M. Pias, S. Reiser, C. Servin, R. Simha,
T. Winters, Q. Xiang, Computer Science Curricula 2023, ACM, 2024.
[7] V. Dahl, D. Cukierman, G. Bel-Enguix, M. D. Jiménez-López, Logic programming: Teaching
strategies for students with no programming background, Proceedings of the 15th Western
Canadian Conference on Computing Education (WCCCE) (2010).
[8] S. Yang, M. Joy, Approaches for learning prolog programming, Innovation in Teaching and</p>
      <p>Learning in Information and Computer Sciences 6 (2007) 88–107.
[9] C. Han, The Structure and Interpretation of Imperatives: Mood and Force in Universal Grammar,</p>
      <p>Outstanding dissertations in linguistics, Garland Science, 2000.
[10] J. Pérez, M. Arenas, C. Gutierrez, Semantics and complexity of sparql, ACM Transactions on</p>
      <p>Database Systems (TODS) 34 (2009) 1–45.
[11] N. Francis, A. Green, P. Guagliardo, L. Libkin, T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg,
P. Selmer, A. Taylor, Cypher: An evolving query language for property graphs, in: Proceedings of
the ACM SIGMOD/PODS International Conference on the Management of Data, 2018, p. 1433–1445.
[12] Y. Wang, R. Shah, A. Criswell, R. Pan, I. Dillig, Data migration using datalog program synthesis,</p>
      <p>Proceedings of the VLDB Endowment 13 (2020) 1006–1019.
[13] B. Salimi, H. Parikh, M. Kayali, L. Getoor, S. Roy, D. Suciu, Causal relational learning, in:
Proceedings of the ACM SIGMOD/PODS International Conference on Management of Data, 2020,
pp. 241–256.
[14] L. Jachiet, P. Genevès, N. Gesbert, N. Layaïda, On the optimization of recursive relational queries:
Application to graph queries, in: Proceedings of the ACM SIGMOD/PODS International Conference
on Management of Data, 2020, pp. 681–697.
[15] M. A. Khamis, H. Q. Ngo, R. Pichler, D. Suciu, Y. Remy Wang, Datalog in wonderland, ACM</p>
      <p>SIGMOD Record 51 (2022) 6–17.
[16] Y. R. Wang, M. Abo Khamis, H. Q. Ngo, R. Pichler, D. Suciu, Optimizing recursive queries with
progam synthesis, in: Proceedings of the ACM SIGMOD/PODS International Conference on
Management of Data, 2022, p. 79–93.
[17] C. Kelleher, R. Pausch, Lowering the barriers to programming: A taxonomy of programming
environments and languages for novice programmers, ACM Computing Surveys (CSUR) 37 (2005)
83–137.
[18] H. J. Levesque, R. J. Brachman, Expressiveness and tractability in knowledge representation and
reasoning 1, Computational intelligence 3 (1987) 78–93.
[19] Y. Zhang, J. Wang, F. Bolduc, W. G. Murray, W. Stafen, A preliminary report of integrating science
and computing teaching using logic programming, in: Proceedings of the AAAI Conference on
Artificial Intelligence, 2019, pp. 9737–9744.
[20] M. Hanus, Multi-paradigm declarative languages, in: Proceedings of the International Conference
on Logic Programming (ICLP), 2007, pp. 45–75.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>E.</given-names>
            <surname>Meijer</surname>
          </string-name>
          ,
          <string-name>
            <given-names>B.</given-names>
            <surname>Beckman</surname>
          </string-name>
          , G. Bierman, Linq:
          <article-title>Reconciling object, relations and xml in the .net framework</article-title>
          ,
          <source>in: Proceedings of the ACM SIGMOD/PODS International Conference on the Management of Data</source>
          ,
          <year>2006</year>
          , p.
          <fpage>706</fpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Wes</surname>
            <given-names>McKinney</given-names>
          </string-name>
          ,
          <article-title>Data Structures for Statistical Computing in Python</article-title>
          ,
          <source>in: Proceedings of the Python in Science Conference</source>
          ,
          <year>2010</year>
          , pp.
          <fpage>56</fpage>
          -
          <lpage>61</lpage>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>