<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Adopting Markov Logic Networks for Big Spatial Data and Applications</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>Ibrahim Sabek Supervised by: Mohamed F. Mokbel Department of Computer Science and Engineering University of Minnesota</institution>
          ,
          <addr-line>MN</addr-line>
          ,
          <country country="US">USA</country>
        </aff>
      </contrib-group>
      <pub-date>
        <year>2019</year>
      </pub-date>
      <abstract>
        <p>Spatial data has become ubiquitous everywhere, e.g., GPS data, medical data, with increasingly sheer sizes. This raises the need for ecient spatial machine learning and analysis solutions to extract useful insights from such data. Meanwhile, Markov Logic Networks (MLN) have emerged as powerful framework for building usable and scalable machine learning tools. Unfortunately, MLN is ill-equipped for spatial applications because it ignores the distinguished spatial data characteristics. This paper describes SMLN, the first full-fledged MLN framework with native support for spatial data. SMLN comes with a high-level datalog-like language with spatial constructs, and spatially-equipped grounding, inference and learning modules. We show the e↵ectiveness of SMLN by illustrating three systems, namely, Sya, TurboReg, and Flash, that are already built using SMLN. 1This work is partially supported by the National Science Foundation Grants IIS-1525953, and CNS-1512877, USA.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>INTRODUCTION</title>
      <p>
        Data scientists and developers have been spending
significant e↵orts applying machine learning and artificial
intelligent methods, e.g., deep learning, to analyze and turn their
massive data into useful insights. However, the expertise
skills and e↵orts needed to deploy these methods become a
major blocking factor in having a wide deployment of
machine learning applications. As a result, Markov Logic
Network (MLN) [
        <xref ref-type="bibr" rid="ref19">19</xref>
        ] was recently introduced to reduce this gap.
In particular, MLN combines first-order logic [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] with
probabilistic graphical models to eciently represent statistical
learning and inference problems with few logical rules (e.g.,
rules with imply and bit-wise AND predicates) instead of
thousands of lines of code. With MLN, data scientists and
developers do not need to worry about the underlying
machine learning work. Instead, they will focus their e↵orts on
developing the rules that represent their applications.
Recently, MLN has been widely adopted as a research vehicle
for deploying machine learning in various applications,
including knowledge base construction [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], data cleaning [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ],
genetic analysis [
        <xref ref-type="bibr" rid="ref23">23</xref>
        ], among others.
      </p>
      <p>
        Meanwhile, in recent years, there has been a
proliferation in the amounts of spatial data produced from several
devices such as satellites, space telescopes, and medical
devices. Various applications and agencies need to analyze
these unprecedented amounts of spatial data. For example,
epidemiologists use spatial analysis techniques to track
infectious disease [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ]. News reporters use geotagged tweets for
event detection and analysis [
        <xref ref-type="bibr" rid="ref24">24</xref>
        ]. Unfortunately, researchers
never take advantage of the recent advances of Markov Logic
Networks (MLN) to boost the usability, scalability, and
accuracy of spatial machine learning tasks (e.g., spatial
regression [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) used in these applications. Furthermore,
MLNbased applications (e.g., knowledge base construction [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ])
would miss important results and have less accuracy when
dealing with spatial data. The main reason is that MLN
is oblivious to the spatial data. The only way to support
spatial data in MLN is to simply ignore its distinguished
properties (e.g., spatial relationships among objects) and
deal with it as non-spatial data. While this would work to
some extent, it will result in a sub-par performance.
      </p>
      <p>
        The goal of our work is to provide the first full-fledged
MLN framework with a native support for spatial data,
called Spatial Markov Logic Networks (SMLN). In
particular, SMLN pushes the spatial awareness inside the internal
data structures and core learning and inference
functionalities of MLN, and hence inside all MLN-based machine
learning techniques and applications. SMLN consists of
four main modules, namely, language, grounding, inference
and learning. The language module extends the DDlog
language [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ] with spatial data types and predicates to express
spatial semantics when writing rules. The grounding
module constructs a spatial variation of the factor graph [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ],
namely Spatial Factor Graph, to eciently represent SMLN
graphical models. The inference module provides a novel
algorithm of Gibbs Sampling [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] that combines the
Conclique [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] concept from spatial statistics with query-driven
independent Metropolis-Hastings approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ]. The
learning module employs a new distance-based optimization
technique based on the gradient descent method [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] to
eciently learn SMLN model parameters.
      </p>
      <p>Users of SMLN would be able to seamlessly build a
myriad of scalable spatial applications, without worrying about
the underlying spatial machine learning and computation
details. We show three case studies that use SMLN as a
backbone for their computation. These case studies include</p>
      <p>
        Sya [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], a system for spatial probabilistic knowledge base
construction, TurboReg [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], a framework for scaling up
spatial autologistic regression models, and Flash [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ], a
framework for scalable spatial data analysis.
2.
      </p>
    </sec>
    <sec id="sec-2">
      <title>SMLN OVERVIEW</title>
      <p>
        Figure 1 provides an overview of SMLN architecture.
There are three types of users who interact with SMLN,
namely, developers, casual users, and administrators.
Developers should have expertise with MLN and use the
provided high-level language by SMLN to create new
applications. We briefly review three example applications
including Sya [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ], TurboReg [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ], and Flash [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ] in sections 3, 4
and 5, respectively. Casual users can either use standard
querying or visualization APIs to perform inference and
learning queries over the built applications (e.g., what is
the probability of a specific event to happen?).
Administrators can monitor the system and tune up the inference
and learning configurations. SMLN adopts an extensible
approach, where it injects the spatial awareness inside the four
main modules of MLN, namely, language, grounding,
inference, and learning. In the rest of this section, we highlight
our contributions in each of these four modules.
2.1
      </p>
    </sec>
    <sec id="sec-3">
      <title>The Language Module</title>
      <p>
        SMLN employs a high-level language to help users write
on-top applications as a set of rules and save huge coding
e↵orts. Instead of providing a completely new language,
SMLN extends DDlog [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], a datalog-like language for
defining MLN rules, with spatial data types and predicates that
conform to the Open Geospatial Consortium (OGC)
standard [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ]. Such extensions allow users to express their
spatial semantics without the need for re-implementing
userdefined functions in each application. For example, SMLN
adds four spatial data types, namely, point, rectangle,
polygon, and linestring, to the schema declaration of
relations in DDlog. Figure 2 shows an example of two
schema declarations S1 and S2 with point spatial
attributes. In addition, SMLN extends the derivation,
supervision and inference rules in DDlog with spatial
predicates (e.g., overlaps, within, and distance) and functions
(e.g., union and buffer) to eciently evaluate the
relationships between spatial objects. Such predicates and functions
can be composed. For example, the inference rule R1 in
Schema Declaration
S1: County (id bigint, location point, hasLowSanitation bool).
      </p>
      <p>S2: HasEbola? (id bigint, location point).</p>
      <p>Inference Rule
R1: HasEbola(C1, L1) =&gt; HasEbola(C2, L2)
:</p>
      <p>County(C1, L1, -), County(C2, L2, S2)</p>
      <p>[distance(L1, L2) &lt; 2.5, within(liberia_geom, L1), S2 = true].</p>
      <p>
        Grounding is an essential operation in the MLN
execution pipeline, where it constructs a data structure called
factor graph [
        <xref ref-type="bibr" rid="ref28">28</xref>
        ] that will be used later to perform
inference and learning operations on. Such factor graph is
efficiently constructed by evaluating the compiled rules from
the language module as a sequence of SQL queries (e.g.,
[
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]). To accommodate for the newly introduced spatial
constructs, e.g., distance, in rules, SMLN adapts a scalable
in-database grounding technique from [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ] to translate and
evaluate rules with these constructs as a set of spatial SQL
queries (e.g., range query and spatial join). The generated
queries are then executed through standard spatial database
engines, e.g. PostGIS, to produce a spatial variation of the
factor graph, namely, Spatial Factor Graph, that consists
of: (1) logical and spatial random variables. (2) logical and
spatial correlations among these variables.
      </p>
      <p>
        SMLN provides two e↵ective optimizations in the
grounding process: (1) It supports creating on-fly spatial indices
(e.g., R-tree [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]) on relations with spatial attributes, making
the evaluation of complex predicates (e.g., overlap) more
ecient. (2) It provides a simple heuristic query optimizer
that re-orders the execution of nested spatial queries that
come from rules with multiple spatial predicates. Moreover,
SMLN provides an abstract database driver that supports
defining spatial storage, functions and query capabilities.
Such abstract can be extended by users to run their spatial
database engine choice inside SMLN.
2.3
      </p>
    </sec>
    <sec id="sec-4">
      <title>The Inference Module</title>
      <p>
        The main objective of the inference module in MLN is to
infer the values of variables in the constructed factor graph
and compute their associated probabilities in an ecient
and scalable manner [
        <xref ref-type="bibr" rid="ref14">14</xref>
        ]. Gibbs Sampling [
        <xref ref-type="bibr" rid="ref30">30</xref>
        ] is
considered the most widely used inference algorithm in MLN
systems, mainly because its simplicity and eciency. However,
there are two main limitations in using the existing Gibbs
sampling techniques when inferring the values of the spatial
factor graph variables. First, these techniques infer values
that maximize the satisfaction of the logical semantics (e.g.,
imply) encoded in the factor graph. Therefore, in case of
spatial factor graph, these inferred values will be suboptimal
because they never consider the spatial correlation between
variables. Second, these techniques require a large
number of sampling iterations (i.e., slow convergence) to obtain
an acceptable output in case there are spatial correlations
among variables, because they perform sequential sampling
over the factor graph nodes [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ].
      </p>
      <p>
        To overcome the above two limitations, SMLN employs
a novel Gibbs Sampling algorithm, namely Spatial Gibbs
Sampling, that can eciently perform inference on the
spatial factor graph coming from the grounding module. To
take the spatial correlations into account, the proposed
sampling algorithm adapts a new variation of the query-driven
independent Metropolis-Hastings approach [
        <xref ref-type="bibr" rid="ref12">12</xref>
        ] that uses
inverse-distance method [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ] to spatially weigh the
correlations among variables in the spatial factor graph, and hence
yields more accurate inferred values.
      </p>
      <p>
        To alleviate the slow convergence issue, a straightforward
solution is to randomly partition the variables into a set
of buckets and then sample these buckets in parallel. Even
though this solution will finish the sampling iterations faster
than the sequential one, it may not converge to an
acceptable solution as spatially-dependent variables might run in
parallel (i.e., independent of each other). This will force the
sampler to run additional sampling iterations to converge,
and hence incur a significant latency overhead. As a
result, SMLN employs an approach that combines in-memory
spatial partitioning technique, namely pyramid index [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ],
with a well-known spatial statistics concept, namely
concliques [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ], to heuristically partition the spatial factor graph
into a set of spatially-independent partitions, and sample
them in parallel. Defining concliques ensures the
neighbouring independence between nodes in the same conclique set,
and hence these nodes can be eciently sampled in parallel.
2.4
      </p>
    </sec>
    <sec id="sec-5">
      <title>The Learning Module</title>
      <p>
        In general, the learning module in MLN mainly focuses on
optimizing the weights of correlations defined in the factor
graph. However, in case of spatial factor graph, the
relative distance between spatial variables participating in any
correlation should be considered as well to learn optimal
weights. In particular, correlations between spatially close
variables should have higher e↵ect on learned weights than
correlations between distant variables. We refer to this
concept as correlation locality. Recently, the gradient descent
optimization [
        <xref ref-type="bibr" rid="ref31">31</xref>
        ] has been widely used in optimizing the
weights in MLN models that use Gibbs sampling inference
(e.g., [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ]). However, standard gradient descent
optimization techniques fall short in supporting the correlation
locality concept. As a result, SMLN introduces a new variation
of gradient descent optimization that applies distance-based
weighing technique. Given a correlation c, defined over m
spatial variables v1, v2, ..., vm in the spatial factor graph, its
weight wc is optimized as follows:
      </p>
      <p>m(m 1)
wc = wc + 2 Pm 1 Pm
i=1 j=i+1 d(vi, vj )
↵g
(1)
where g is the gradient sign (either 1 or -1), ↵ is the
step size, and d(vi, vj ) is the Euclidean distance between
the variables vi and vj .</p>
    </sec>
    <sec id="sec-6">
      <title>SYA SYSTEM</title>
      <p>Knowledge base construction (KBC) has been an active
area of research over the last two decades with several
system prototypes coming from academia and industry, along</p>
      <p>TurboReg
ngspatial
21k 84k
(a) Sya Accuracy</p>
      <p>
        (b) TurboReg Latency
with many important applications, e.g., web search,
digital libraries, and health care. Most recently, KBC systems
employed the idea of MLN to associate each extracted
relation with a probability of how confident is the system that
this relation is factual. DeepDive [
        <xref ref-type="bibr" rid="ref25">25</xref>
        ], an MLN-based
system, has emerged as one of the most popular probabilistic
KBC systems, applied in di↵erent domains (e.g., law
enforcement [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], geology [
        <xref ref-type="bibr" rid="ref29">29</xref>
        ], and paleontology [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]).
      </p>
      <p>Unfortunately, DeepDive does not fully utilize the
underlying spatial information, which results in less accuracy in
the factual scores. This is because of two reasons: (1)
DeepDive treats any predicate in the knowledge base inference
rules as a boolean function, which yields either true or false
(i.e., satisfied or not). So, although one can define spatial
predicates in DeepDive, internally DeepDive and its
inference engine do not do anything special for any spatial
predicate. (2) DeepDive estimates the factual scores of extracted
relations only based on how much support for these
relations in training data. However, in case spatial information
exists, the relative distance between entities participating in
the extracted relations should be considered as well.</p>
      <p>
        To overcome the above limitations in DeepDive, we
proposed Sya [
        <xref ref-type="bibr" rid="ref21">21</xref>
        ]; a spatial MLN-based KBC system, built
using our SMLN framework. We re-implemented the
existing grounding, inference and learning modules in DeepDive
using the corresponding modules in SMLN. We initially
evaluated Sya through building two real spatial knowledge bases
about: (1) the water quality in Texas [
        <xref ref-type="bibr" rid="ref27">27</xref>
        ], namely, GWDB,
and (2) the air pollution in New York city [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], namely,
NYCCAS. Figure 3(a) shows the quality (i.e., accuracy) for
both Sya and DeepDive, measured by the F1-score, when
building these two knowledge bases. Sya has an
improvement of 120% and 27% over DeepDive in GWDB and
NYCCAS, respectively.
4.
      </p>
    </sec>
    <sec id="sec-7">
      <title>TURBOREG SYSTEM</title>
      <p>
        Autologistic regression [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] is an important statistical tool
for predicting spatial phenomena. Unlike standard
logistic regression that assumes predictions of spatial
phenomena over neighbouring locations are completely
independent of each other, autologistic regression takes into
account the spatial dependence between neighbouring
locations while building and running the prediction model (i.e.,
neighbouring locations tend to systematically a↵ect each
other). Myriad applications require the autologistic
regression model to be built over large spatial data. However,
existing methods for autologistic regression (e.g., see [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]) are
prohibitively computationally expensive for large grid data,
e.g., fine-grained satellite images, and large spatial
epidemiology datasets. It could take about week to infer the model
parameters using training data of only few gigabytes [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
      </p>
      <p>
        To solve this issue, we introduced TurboReg [
        <xref ref-type="bibr" rid="ref20">20</xref>
        ]; a
scalable framework for building autologistic models with large
number of prediction and predictor variables. In TurboReg,
we employed the inference and learning modules of SMLN
to estimate the autologistic model parameters in an
accurate and ecient manner. Basically, TurboReg provides
an equivalent first-order logic [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] representation to
dependency relations among neighbours in autologistic models. In
particular, TurboReg transforms each neighbouring
dependency relation into a predicate with bitwise-AND operation
on all variables involved in this relation. For example, a
binary dependency relation between neighbouring variables
C1, and C2 is transformed to C1 ^ C2. This simple
logical transformation allows non experts to express the
dependency relations within autologistic models in a simple way
without needing to specify complex details.
      </p>
      <p>
        We experimentally evaluated TurboReg using a real
dataset of the daily distribution of bird species [
        <xref ref-type="bibr" rid="ref26">26</xref>
        ], and
compared its scalability performance to a state-of-the-art
method, namely ngspatial [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]. Figure 3(b) shows the
running time for both systems while building an autologistic
model over grid sizes ranging from 250 to 84k cells.
TurboReg has at least three orders of magnitude reduction in
the running time over ngspatial, while preserving the same
accuracy in estimating the model parameters.
      </p>
    </sec>
    <sec id="sec-8">
      <title>5. FLASH FRAMEWORK</title>
      <p>
        Same as MLN made it possible for data scientists and
developers to embrace the diculty of deploying machine
learning techniques, we aim to use our proposed SMLN
framework as a backbone infrastructure to support long
lasting spatial analysis techniques that lack scalability as well
as su↵er from diculty of deployment (e.g., [
        <xref ref-type="bibr" rid="ref3 ref4">3, 4</xref>
        ]).
      </p>
      <p>
        To that end, we proposed Flash [
        <xref ref-type="bibr" rid="ref22">22</xref>
        ]; a framework for
generic and scalable spatial data analysis. Flash focuses on
building a major class of spatial analysis techniques, called
spatial probabilistic graphical modelling (SPGM) (e.g., [
        <xref ref-type="bibr" rid="ref3 ref4 ref6">3, 4,
6</xref>
        ]), which uses probability distributions and graphical
representations to describe spatial phenomena and make
predictions about them. In Flash, we built a generic
transformation module that is responsible to generate a set of weighted
logical rules representing any SPGM input. The generated
rules are then executed normally through the SMLN
framework, where the weights of these rules represent the
parameters of the original SPGM and will be inferred using the
SMLN inference and learning modules. Currently, Flash
supports transformation for three spatial graphical
models; spatial Markov random fields [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ], spatial hidden Markov
models [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and spatial Bayesian networks [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ].
      </p>
    </sec>
    <sec id="sec-9">
      <title>6. CONCLUSION</title>
      <p>In this paper, we introduce SMLN; a framework that
injects the spatial awareness inside the core functionality of
Markov Logic Networks (MLN). SMLN equips the MLN
framework with a spatial high-level language, and
spatiallyoptimized grounding, inference and learning modules. Data
scientists and developers can exploit SMLN to build
numerous scalable and ecient spatial machine learning and
analysis applications. We showed three on-top applications; Sya,
a system for spatial probabilistic knowledge base
construction, TurboReg, a framework for scaling up spatial
autologistic regression models, and Flash, a framework for scalable
spatial data analysis.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <given-names>W. G.</given-names>
            <surname>Aref</surname>
          </string-name>
          and
          <string-name>
            <given-names>H.</given-names>
            <surname>Samet</surname>
          </string-name>
          .
          <article-title>Ecient Processing of Window Queries in the Pyramid Data Structure</article-title>
          .
          <source>In PODS</source>
          ,
          <year>1990</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <given-names>A.</given-names>
            <surname>Auchincloss</surname>
          </string-name>
          et al.
          <article-title>A Review of Spatial Methods in Epidemiology</article-title>
          .
          <source>Annual Review of Public Health</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <given-names>bnspatial: Spatial</given-names>
            <surname>Bayesian Networks</surname>
          </string-name>
          ,
          <year>2019</year>
          . cran.r-project.org/web/packages/bnspatial.
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          <article-title>[4] gamlss</article-title>
          .
          <source>spatial: Gaussian Markov Random Fields</source>
          ,
          <year>2019</year>
          . cran.r-project.org/web/packages/gamlss.spatial.
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <given-names>M.</given-names>
            <surname>Genesereth</surname>
          </string-name>
          and
          <string-name>
            <given-names>N.</given-names>
            <surname>Nilsson</surname>
          </string-name>
          .
          <source>Logical Foundations of Artificial Intelligence</source>
          . Morgan Kaufmann Publishers,
          <year>1987</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <given-names>P. J.</given-names>
            <surname>Green</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Richardson</surname>
          </string-name>
          .
          <source>Hidden Markov Models and Disease Mapping. JASA</source>
          ,
          <year>2002</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <given-names>A.</given-names>
            <surname>Guttman.</surname>
          </string-name>
          R-trees:
          <article-title>A Dynamic Index Structure for Spatial Searching</article-title>
          .
          <source>SIGMOD Record</source>
          , pages
          <fpage>47</fpage>
          -
          <lpage>57</lpage>
          ,
          <year>1984</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hughes</surname>
          </string-name>
          .
          <article-title>ngspatial: A Package for Fitting the Centered Autologistic and Sparse Spatial Generalized Linear Mixed Models for Areal Data</article-title>
          .
          <source>The R Journal</source>
          ,
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <given-names>J.</given-names>
            <surname>Hughes</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Haran</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. C.</given-names>
            <surname>Caragea</surname>
          </string-name>
          .
          <source>Autologistic Models for Binary Data on a Lattice. Environmetrics</source>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Human</surname>
            <given-names>Tracking</given-names>
          </string-name>
          : Forbes. http://www.forbes.com/sites/thomasbrewster/2015/04/17/ darpa-nasa
          <article-title>-and-partners-show-off-memex/.</article-title>
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          [11]
          <string-name>
            <given-names>M.</given-names>
            <surname>Kaiser</surname>
          </string-name>
          et al.
          <article-title>Goodness of Fit Tests for a Class of Markov Random Field Models</article-title>
          .
          <source>The Annals of Statistics</source>
          ,
          <year>2012</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          [12]
          <string-name>
            <given-names>K.</given-names>
            <surname>Li</surname>
          </string-name>
          et al.
          <article-title>In-database Batch and Query-time Inference over Probabilistic Graphical Models using UDA-GIST</article-title>
          .
          <source>VLDB Journal</source>
          ,
          <volume>26</volume>
          (
          <issue>2</issue>
          ):
          <fpage>177</fpage>
          -
          <lpage>201</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          [13]
          <string-name>
            <given-names>G. Y.</given-names>
            <surname>Lu</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. W.</given-names>
            <surname>Wong</surname>
          </string-name>
          .
          <article-title>An Adaptive Inverse-distance Weighting Spatial Interpolation Technique</article-title>
          .
          <source>Journal of Computer and GeoSciences</source>
          ,
          <volume>34</volume>
          (
          <issue>9</issue>
          ):
          <fpage>1044</fpage>
          -
          <lpage>1055</lpage>
          , Sept.
          <year>2008</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          [14]
          <string-name>
            <given-names>F.</given-names>
            <surname>Niu</surname>
          </string-name>
          et al.
          <article-title>Tu↵y: Scaling Up Statistical Inference in Markov Logic Networks Using an RDBMS</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>4</volume>
          (
          <issue>6</issue>
          ):
          <fpage>373</fpage>
          -
          <lpage>384</lpage>
          ,
          <year>2011</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          [15]
          <article-title>NYC OpenData</article-title>
          . https://data.cityofnewyork.us/Environment/ NYCCAS-Air-Pollution-Rasters/
          <fpage>q68s</fpage>
          -
          <lpage>8qxv</lpage>
          .
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>[16] Open Geospatial Consortium. http://www.opengeospatial.org/.</mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          [17]
          <string-name>
            <given-names>S. E.</given-names>
            <surname>Peters</surname>
          </string-name>
          ,
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Livny</surname>
          </string-name>
          , and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>R´e. A Machine Reading System for Assembling Synthetic Paleontological Databases</article-title>
          .
          <source>PLoS One</source>
          ,
          <volume>9</volume>
          (
          <issue>12</issue>
          ),
          <year>2014</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          [18]
          <string-name>
            <given-names>T.</given-names>
            <surname>Rekatsinas</surname>
          </string-name>
          et al.
          <article-title>HoloClean: Holistic Data Repairs with Probabilistic Inference</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>10</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1190</fpage>
          -
          <lpage>1201</lpage>
          ,
          <year>2017</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref19">
        <mixed-citation>
          [19]
          <string-name>
            <given-names>M.</given-names>
            <surname>Richardson</surname>
          </string-name>
          and
          <string-name>
            <given-names>P. M.</given-names>
            <surname>Domingos</surname>
          </string-name>
          .
          <article-title>Markov Logic Networks</article-title>
          .
          <source>Machine Learning</source>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref20">
        <mixed-citation>
          [20]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sabek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Musleh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M.</given-names>
            <surname>Mokbel</surname>
          </string-name>
          .
          <article-title>TurboReg: A Framework for Scaling Up Spatial Logistic Regression Models</article-title>
          .
          <source>In SIGSPATIAL</source>
          , pages
          <fpage>129</fpage>
          -
          <lpage>138</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref21">
        <mixed-citation>
          [21]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sabek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Musleh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Mokbel</surname>
          </string-name>
          .
          <article-title>A Demonstration of Sya: A Spatial Probabilistic Knowledge Base Construction System</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>1689</fpage>
          -
          <lpage>1692</lpage>
          ,
          <year>2018</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref22">
        <mixed-citation>
          [22]
          <string-name>
            <given-names>I.</given-names>
            <surname>Sabek</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Musleh</surname>
          </string-name>
          , and
          <string-name>
            <given-names>M. F.</given-names>
            <surname>Mokbel</surname>
          </string-name>
          .
          <article-title>Flash in Action: Scalable Spatial Data Analysis Using Markov Logic Networks</article-title>
          .
          <source>In VLDB</source>
          ,
          <year>2019</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref23">
        <mixed-citation>
          [23]
          <string-name>
            <given-names>N. A.</given-names>
            <surname>Sakhanenko</surname>
          </string-name>
          and
          <string-name>
            <given-names>D. J.</given-names>
            <surname>Galas</surname>
          </string-name>
          .
          <article-title>Markov Logic Networks in the Analysis of Genetic Data</article-title>
          .
          <source>Journal of Computational Biology</source>
          ,
          <volume>17</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1491</fpage>
          -
          <lpage>1508</lpage>
          , Nov.
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref24">
        <mixed-citation>
          [24]
          <string-name>
            <given-names>J.</given-names>
            <surname>Sankaranarayanan</surname>
          </string-name>
          et al.
          <article-title>TwitterStand: News in Tweets</article-title>
          .
          <source>In SIGSPATIAL</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref25">
        <mixed-citation>
          [25]
          <string-name>
            <given-names>J.</given-names>
            <surname>Shin</surname>
          </string-name>
          et al.
          <article-title>Incremental Knowledge Base Construction Using DeepDive</article-title>
          . PVLDB,
          <volume>8</volume>
          (
          <issue>11</issue>
          ):
          <fpage>1310</fpage>
          -
          <lpage>1321</lpage>
          ,
          <year>2015</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref26">
        <mixed-citation>
          [26]
          <string-name>
            <given-names>B.</given-names>
            <surname>Sullivan</surname>
          </string-name>
          et al.
          <article-title>eBird: A Citizen-based Bird Observation Network in the Biological Sciences</article-title>
          .
          <source>Biological Conservation</source>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref27">
        <mixed-citation>
          [27]
          <string-name>
            <given-names>Texas</given-names>
            <surname>Ground Water Database</surname>
          </string-name>
          . http://www.twdb.texas.gov/groundwater/data/.
        </mixed-citation>
      </ref>
      <ref id="ref28">
        <mixed-citation>
          [28]
          <string-name>
            <given-names>M.</given-names>
            <surname>Wick</surname>
          </string-name>
          et al.
          <article-title>Scalable Probabilistic Databases with Factor Graphs and MCMC</article-title>
          .
          <source>PVLDB</source>
          ,
          <volume>3</volume>
          (
          <issue>1</issue>
          -2):
          <fpage>794</fpage>
          -
          <lpage>804</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref29">
        <mixed-citation>
          [29]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          et al.
          <article-title>GeoDeepDive: Statistical Inference Using Familiar Data-processing Languages</article-title>
          .
          <source>In SIGMOD</source>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref30">
        <mixed-citation>
          [30]
          <string-name>
            <given-names>C.</given-names>
            <surname>Zhang</surname>
          </string-name>
          and
          <string-name>
            <surname>C.</surname>
          </string-name>
          <article-title>R´e. Towards High-throughput Gibbs Sampling at Scale: A Study Across Storage Managers</article-title>
          .
          <source>In SIGMOD</source>
          , pages
          <fpage>397</fpage>
          -
          <lpage>408</lpage>
          ,
          <year>2013</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref31">
        <mixed-citation>
          [31]
          <string-name>
            <given-names>M.</given-names>
            <surname>Zinkevich</surname>
          </string-name>
          et al.
          <article-title>Parallelized Stochastic Gradient Descent</article-title>
          .
          <source>In NIPS</source>
          , pages
          <fpage>2595</fpage>
          -
          <lpage>2603</lpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>