Mv-Index: An Efficient Index
                               for Graph-Query Containment

                  Theofilos Mailis1,2 , Yannis Kotidis3 , Vaggelis Nikolopoulos1,2 ,
                  Evgeny Kharlamov4,5 , Ian Horrocks6 , and Yannis Ioannidis1,2
                                       1
                                         Athena Research Centre, Greece
                           2
                              National and Kapodistrian University of Athens, Greece
                            3
                              Athens University of Economics and Business, Greece
                              4
                                 Bosch Center for Artificial Intelligence, Germany
                                         5
                                           University of Oslo, Norway
                                   6
                                      University of Oxford, United Kingdom


           Abstract. Query containment is a fundamental operation used to expedite query
           processing in view materialization and query caching techniques. Since query
           containment is NP-complete for arbitrary conjunctive queries on RDF graphs,
           we have introduced a simpler form of conjunctive queries that we name f-graph
           queries. During the demo, we will show why containment checking for f-graph
           queries can be solved in polynomial time. We will present the mv-index, a novel
           indexing structure that allows for fast containment checking between a single
           RDF-conjunctive query and an arbitrary number of stored queries. The mv-index
           structure takes advantage of the interesting properties of f-graph queries. With the
           mv-index usage, the containment test against hundreds of thousands of queries
           that are indexed within our structure is performed in microseconds or less.


1      Introduction
The growing popularity of graph-structured data in many real-world applications such
as Oil and Gas [7], Energy sector [9], E-commerce [4], industrial monitoring [10], fac-
tory automation [6], industrial analytics [8], and intelligent querying and search [17,1]
has led to a renaissance of research on graph data management. RDF [2] and SPARQL
[15] are promising examples of a graph data model and the corresponding query lan-
guage that have gained a lot of attraction. In order to handle the burst of RDF data that is
available on the Web, much research has been devoted on scalable techniques for RDF
processing such as indexing, caching, and view materialization in order to accelerate
the execution time of SPARQL queries.
    View materialization is directly related to the problem of query containment [11]
that has been proved to be NP-complete for arbitrary conjunctive queries [3] and unions
of conjunctive queries [16] over relational databases. The same results also apply for
conjunctive queries on RDF graphs and their SPARQL counterparts [14,5].
    In this demo, we will present f-graph queries, a restricted form of conjunctive
queries that allows to solve the containment checking problem in PTime. Additionally
we will present an efficient indexing structure, the mv-index, for checking the contain-
ment relation between a single f-graph query Qf and a set of indexed queries W in
PTime w.r.t. the combined size of the query and the index. We will further show how
Copyright c 2019 for this paper by its authors. Use permitted under Creative Commons License Attribution 4.0 International (CC BY 4.0).
2        Mailis et al.

we have extended mv-indices to evaluate containment for arbitrary conjunctive queries
on RDF graphs. This translates to microseconds or less for the containment test against
hundreds of thousands of queries that are indexed within the structure. The latter makes
mv-indices the perfect candidate to be combined with existing and novel materialization
and techniques. The corresponding demo is based on the work presented in [12].


2   The Mv-Index Structure
We will now describe the fundamentals of our indexing structure that is build upon the
f-graph class of queries.
F-Graph Queries & Query Containment. To solve the containment problem and
build the corresponding indexing structure, we initially focus on its variation Qf v W
where Qf belongs to a special class of conjunctive queries that we name f-graph queries
and W belongs to the class of conjunctive queries that have only IRIs as predicates.
What motivates the choice of f-graph queries in the left-hand side of a query contain-
ment is that: (i) containment for f-graph queries can be solved in PTime; (ii) f-graph
queries appear in real- world query workloads; (iii) f-graph queries can be employed as
representatives of arbitrary conjunctive queries.
     An f-graph query Qf is a conjunctive query for which: (i) For every pair of terms
o1 , o2 such that o1 6= o2 , the triple patterns (s, p, o1 ), (s, p, o2 ) cannot both appear in
Qf ; (ii) For every pair of terms s1 , s2 ∈ I ∪ X such that s1 6= s2 , the triple patterns
(s1 , p, o), (s2 , p, o) cannot both appear in Qf . We name these queries f-graphs because
of the functional and inverse functional characteristics of their predicates. Note that
containment of such queries is polynomial due to the strong requirements for the f-
graph structure. Once a variable v in W has been mapped to a term v 0 in Q, there is a
single deterministic choice for mapping the remaining variables appearing in W .
MV-Indices. In the case that we want to check for containment between a single f-
graph query Qf and a set of conjunctive queries W, it would be inefficient to make
each and every comparison. For that reason, we have introduced the “Materialized-
View Index” structure, denoted with mv-index, that allows to store a set of queries W
and use it to check for containment. Our structure is based on Radix trees, ordered tree
data structures that are used in string matching [13].
    An mv-index M is a tree structure (V, E, L) where: (i) V is a set of vertices; (ii) E v
V 2 is a finite set of edges; (iii) L is a labelling function that maps each edge to a non-
empty ordered list of distinct elements (IRIs, literals, and parenthesis symbols) and each
vertex to the serialized form of an f-graph query.
    The intuition for this form of representation is that queries are represented by their
serialized form in the mv-index structure, either as intermediate or leaf vertices, using
the labelling function L. For a vertex α in the mv-index structure, L(α) is its corre-
sponding query in serialized form. The serialized form of the query represented by a
vertex can be also obtained by following the path from the root of the mv-index to
the specific vertex and concatenating the corresponding edge labels. Therefore, in our
actual implementation we only store edge labels.
    During the insertion phase, mv-indices are treated as regular Radix trees that instead
of strings or numbers are used to represent queries in their serialized form. Therefore,
instead of characters within a string, or digits within a number, mv- indices use IRIs,
                         Mv-Index: An Efficient Index for Graph-Query Containment              3


Fig. 1. A simple mv-index. The initial letters f A, n, a, M A, C depict the IRIs fromAlbum, name,
artist, MusicalArtist, and Composer respectively.

literals, variables and separators such as parenthesis symbols in order to represent seri-
alized queries. More information on how insertion works in Radix trees can be found
in the literature [12]. To check for query containment using mv-indices, we have de-
vised an algorithm that takes advantage of the properties of an f-graph. The intuition
underlying the algorithm is that each time an edge of the mv-index is examined, a cor-
responding containment mapping from the corresponding queries in the mv-index to
the examined query is created.
F-Graph Witnesses for Conjunctive Queries. For our algorithm to represent more
expressive conjunctive queries, we have introduced f-graph witnesses. The intuition is
that each conjunctive query can be represented in the form of an f-graph when checking
for query containment.
     For a conjunctive query Q, its corresponding f-graph witness can be obtained by
merging terms that violate conditions (i), (ii) in the definition of f-graph queries. To
perform the aforementioned task, we initially define the equivalence relation ∼ on vari-
ables, IRIs, and literals in Q such that o1 ∼ o2 when there exists a term s for which
either the triple patterns (s, p, o1 ) and (s, p, o2 ) both appear in Q, or the triple patterns
(o1 , p, s) and (o2 , p, s) both appear in Q. For a term s in Q, [s] denotes its equivalence
class on the ∼ relation that contains all the terms that are merged with s. The f-graph
witness Qw of the query Q is obtained by replacing each triple pattern (s, p, o) in the
body of Q with a triple pattern ([s], p, [o]) where s, o are terms, [s], [o] their correspond-
ing equivalence classes, and p is a predicate.
     For a conjunctive query Q, its corresponding f-graph witness Qw and a conjunctive
query W , the following implication applies: Q v W ⇒ Qw v Q. The previous
proposition conveys that we need to check for containment Q v W only when the
containment relation for the witness of Q is satisfied, i.e., Qw v W . Checking Qw v
W can be performed in PTime, while checking Q v W is in the worst case a NP-
complete problem. Therefore, we pay a PTime budget to solve specific instances of a
NP-complete problem by “postponing” non-deterministic checks in favor of a proof,
computed in PTime, that Q v W does not apply.


3    Demo
In the demo we will show how we have implemented our novel structures and algo-
rithms and test their efficiency in a combined query workload consisting of DBPedia,
WatDiv, BSBM, LUBM, and LDBC queries. We will evaluate insertion and contain-
ment performance with respect to different query and mv-index properties. The average
4        Mailis et al.

time for query containment against an mv-index containing 397,507 distinct queries
from all 5 workloads was between 0.0093 msec and 0.041 msec. A presentation of our
work is available online.

Acknowledgements. This work was partially funded by the SIRIUS Centre, Norwegian
Research Council project number 237898. T. Mailis was financed by EU Horizon2020,
“DARE” project, Grant Agreement nr. 777413. Y. Kotidis was financed by the Research
Centre of Athens University of Economics and Business, in the framework of the project
entitled Original Scientific Publications.

References
 1. Arenas, M., Grau, B.C., Kharlamov, E., Marciuska, S., Zheleznyakov, D.: Faceted search
    over rdf-based knowledge graphs. J. Web Semant. 37-38, 55–74 (2016)
 2. Brickley, D., Guha, R.V.: Rdf vocabulary description language 1.0: Rdf schema (2004)
 3. Chandra, A.K., Merlin, P.M.: Optimal implementation of conjunctive queries in relational
    data bases. In: STOC. ACM (1977)
 4. Charron, B., Hirate, Y., Purcell, D., Rezk, M.: Extracting semantic information for e-
    commerce. In: ISWC. pp. 273–290 (2016)
 5. Gutierrez, C., Hurtado, C.A., Mendelzon, A.O., Pérez, J.: Foundations of semantic web
    databases. JCSS 77(3), 520–541 (2011)
 6. Ho, V.T., Stepanova, D., Gad-Elrab, M.H., Kharlamov, E., Weikum, G.: Rule Learning from
    Knowledge Graphs Guided by Embedding Models. In: ISWC. pp. 72–90 (2018)
 7. Kharlamov, E., Hovland, D., Skjæveland, M.G., Bilidas, D., Jiménez-Ruiz, E., Xiao, G.,
    Soylu, A., Lanti, D., Rezk, M., Zheleznyakov, D., Giese, M., Lie, H., Ioannidis, Y.E., Kotidis,
    Y., Koubarakis, M., Waaler, A.: Ontology Based Data Access in Statoil. J. Web Semant. 44,
    3–36 (2017)
 8. Kharlamov, E., Kotidis, Y., Mailis, T., Neuenstadt, C., Nikolaou, C., Özçep, Ö.L., Svingos,
    C., Zheleznyakov, D., Ioannidis, Y.E., Lamparter, S., Möller, R., Waaler, A.: An ontology-
    mediated analytics-aware approach to support monitoring and diagnostics of static and
    streaming data. J. Web Semant. 56, 30–55 (2019)
 9. Kharlamov, E., Mailis, T., Mehdi, G., Neuenstadt, C., Özçep, Ö.L., Roshchin, M., Solo-
    makhina, N., Soylu, A., Svingos, C., Brandt, S., Giese, M., Ioannidis, Y.E., Lamparter, S.,
    Möller, R., Kotidis, Y., Waaler, A.: Semantic access to streaming and static data at Siemens.
    J. Web Semant. 44, 54–74 (2017)
10. Kharlamov, E., Mehdi, G., Savkovic, O., Xiao, G., Kalayci, E.G., Roshchin, M.:
    Semantically-enhanced rule-based diagnostics for industrial internet of things: The SDRL
    language and case study for siemens trains and turbines. J. Web Semant. 56, 11–29 (2019)
11. Levy, A.Y., Mendelzon, A.O., Sagiv, Y.: Answering queries using views. In: PODS. pp. 95–
    104. ACM (1995)
12. Mailis, T., Kotidis, Y., Nikolopoulos, V., Kharlamov, E., Horrocks, I., Ioannidis, Y.E.: An
    efficient index for RDF query containment. In: SIGMOD. pp. 1499–1516 (2019)
13. Morrison, D.R.: Patricia—practical algorithm to retrieve information coded in alphanumeric.
    JACM 15(4), 514–534 (1968)
14. Polleres, A.: From sparql to rules (and back). In: WWW. pp. 787–796. ACM (2007)
15. Prud, E., Seaborne, A., et al.: Sparql query language for rdf. https://www.w3.org/
    TR/rdf-sparql-query/ (2006)
16. Sagiv, Y., Yannakakis, M.: Equivalences among relational expressions with the union and
    difference operators. JACM 27(4), 633–655 (1980)
17. Soylu, A., Kharlamov, E., Zheleznyakov, D., Jiménez-Ruiz, E., Giese, M., Skjæveland, M.G.,
    Hovland, D., Schlatte, R., Brandt, S., Lie, H., Horrocks, I.: Optiquevqs: A visual query sys-
    tem over ontologies for industry. Semantic Web 9(5), 627–660 (2018)