=Paper=
{{Paper
|id=None
|storemode=property
|title=SNQL: A Social Networks Query and Transformation Language
|pdfUrl=https://ceur-ws.org/Vol-749/paper18.pdf
|volume=Vol-749
|dblpUrl=https://dblp.org/rec/conf/amw/MartinGW11
}}
==SNQL: A Social Networks Query and Transformation Language==
SNQL: A Social Network Query and
Transformation Language
Mauro San Martı́n1 , Claudio Gutierrez2 , and Peter T. Wood3
1
Dept. of Mathematics, U. de La Serena - 2 Dept. of Computer Science, U. de Chile
3
Dept. of Computer Science and Information Systems, Birkbeck, Univ. of London
Abstract. Social Network (SN) data has become ubiquitous, demand-
ing advanced and flexible means to represent, transform and query such
data. In addition to the intrinsic challenges of querying graph data is the
requirement that networks be restructured, and thus that new values be
created. To address these, we introduce a dedicated data model and query
language, SNQL, founded on previous research on graph databases and
on the experience of SN researchers. Technically, it is based in GraphLog
and second-order tuple generating dependencies, allowing expressiveness
for graph querying and node creation as required by SN, while keeping
the complexity of query evaluation in NLOGSPACE.
1 Introduction
The widespread embracing of social networking services—such as Facebook,
MySpace and LinkedIn—as the indispensable tools to manage people’s digital
and physical lives has resulted in rapidly growing amounts of social activity data.
Furthermore, the social information represented by many sites usually includes
not only people but a great variety of objects (usually referred to generically as
actors in the social networking literature) and relations: photographs (Flickr),
other sites (del.icio.us), places (Yahoo! Travel), goods (Amazon), and so on.
This diversity produces complex social networks (SN) which require managing,
querying and transforming.
In the context of these new applications, users have access to an increas-
ing amount of SN data, and consequently the need arises for them to manage
and build their own networks based on relevant “pieces” of the huge networks
available. For example, it is common to find applicable online information about
published research linked with departmental and local scientific networks. This
calls for more advanced and flexible means to query and transform SN data.
Along the same lines, scientific experimentation with social network analysis
(SNA) tools (e.g. Pajek [9]) calls for data management tools to extract parts of
SN from different environments and to integrate, filter and transform them.
A further requirement is that SN data representation needs to be flexible
enough to incorporate on-the-fly attributes (e.g. for data curation). In addition,
data is used and seen from diverse points of view by different users. Hence
classical modeling in terms of a fixed set of entities, attributes and relationships
does not work well. For example, in SNA it is common for attributes to become
actors, for aggregated data of actors to become attributes, and for relations to
have arity greater than two.
a) Friendship Network among Persons c) Friendship Network between Cities
Central Capital
2 1 1
City City
city city
a1 a2 inhabitants number inhabitants
friend r1 friend
name name
person friendship person a5 a4
John friends r5 friends
Mary city
introducer friendships city
name name
Central _between_
City city a3 name
Ann cities
Central Capital
person City City
b) Friendship Network among Persons (Cities Promoted from Attributes to Actors)
Capital
Central r3 City
inhabitant a1 name Mary
City John
place lives-in friend name name
name person
a5 r1 friend a2 inhabitant r2 place a4
city
city place friendship person lives-in
introducer
r4 inhabitant a3 name Ann
lives-in person
Fig. 1. Friendship Network and Transformations. a) A social network representing the
friendship relation (square node) between Mary and John, who were introduced by
Ann (actors as round nodes, and attribute values as grey dots). b) The same social
network after promoting city attributes to actors. c) The social network result after
grouping persons by city and computing aggregate attributes: inhabitants of each city,
and number of friendships between cities.
Example 1. Consider a SN of friendship relations among people, including which
other person, if any, introduced them; this implies having relations of variable
arity (solved by representing relations as nodes). People are described by the
attributes ‘name’ and ‘city’ (see Fig. 1(a)). The study of the relevance of city of
residence to friendship might require promoting cities to actors and linking peo-
ple and cities with a new type of relation, e.g. ‘lives-in’ (see Fig. 1(b)). Another
type of transformation would be to group people by city of residence, thus defin-
ing a network of cities, where relations summarize friendships among residents
of cities. Additionally, one might like to describe in the network the population
(person count) of each city, and label the relations between them with the num-
ber of friendship-relations between people (see Fig. 1(c)). t
u
The requirements suggested by Example 1 together with the integration of
diverse types of information as described above call for a simple and flexible
model of data for SN. Furthermore, due to the growing volume of SN data,
any transformation and query language for SN should be scalable. This implies
that a transformation and query language should conform, from a theoretical
perspective, to low complexity bounds, and, because of practical concerns, be
simple and modular while being sufficiently expressive.
Several of these challenges have been already voiced. Fifteen years ago Free-
man defined the maximal structure experiment that extended the basic network
representation to include attributes as well as to accommodate changes over
time [12]. More recently, the need to improve both network data formats in the
context of the social web as well as data management services for large and
dynamic social networks has been identified [7, 15].
To date, the above challenges have been addressed with ad-hoc approaches,
and to the best of our knowledge there is no generic data management solu-
tion in spite of the wide agreement on the urgent necessity of addressing this
problem [14, 18]. Some proposals for graph databases (e.g. GraphDB [13]) have
features to deal with SN data, but with ad-hoc developer-oriented languages. In
the same spirit some SN services provide APIs (e.g. Facebook’s Graph API and
Open Graph Protocol1 ).
The three most comprehensive proposals on the challenges presented above
are BiQL, SocialScope, and SoQL. SocialScope [3] is a logical architecture for
discovering and managing social information, which includes an algebraic query
language. It does not provide the capability to construct new data nor deal
with the complexity of data dynamics (particularly transformation of actors
into attributes). SoQL [16] is an SQL-like query language for SN, focused on
identifying groups and paths over a classical network. It does not incorporate
data transformation features. Finally, BiQL [10] is an SQL-like language that
does allow for transformation and object creation. Object creation is performed
by using a single function, which is in general insufficient for SN use cases, and
translating queries to Prolog. None of these proposals analyze the computational
complexity of their languages.
Our transformation and query language, named SNQL, is inspired by two
earlier languages: GraphLog [8] and second-order tuple-generating dependen-
cies (SO tgds) [11]. SNQL comprises, both syntactically and conceptually, two
modules, one for SN matching and one for SN construction, which essentially
correspond to GraphLog and SO tgds, respectively.
GraphLog was a seminal query language for graph data, designed to be ex-
pressive while at the same time having low computational complexity. Apart
from standard features, it includes aggregation and transitive closure making it
suitable for many SN queries. However, GraphLog does not provide functionality
to create new objects/actors, a crucial requirement for SN.
Example 2. Consider again the SN from Example 1. The following SNQL query
produces the network depicted in Fig. 1(b) from that in Fig. 1(a) by promoting
the ‘city’ attribute to a new type of actor (city) and producing a new type of
relation (lives-in) to associate people with cities.
CONSTRUCT CP IF R2 = f(A1, A2) AND A2 = g(L1)
WHERE EP
FROM FriendshipNetwork
Patterns EP and CP, depicted in Fig. 2, denote an extraction pattern and a con-
struction pattern, respectively. FriendshipNetwork is the SN shown in Fig. 1(a).
Note that in the result, cities become hubs that connect all people living in
each of them, and that the new actors require creation of new ids from the data:
1
http://developers.facebook.com/docs/
Extraction Pattern: EP Contruction Pattern: CP
L2 L2
L1 name L1 name
city A1 P1 R1 name A2 place R2 inhabitant A1 P1 R1
person friendship city lives-in person friendship
Fig. 2. Attribute Promotion. Patterns EP and CP transform attribute ‘city’ to a new
actor whose id is functionally created from its literal value (bound to variable L1). Also
new relations are required to link persons to the newly created city actors.
the oids of cities are produced by applying a function g to the literal values
bound to variable L1. Similarly, new relation identifiers for the ‘lives-in’ relation
are created using f , one for each (person,city) pair matched by (A1,A2). t
u
Creation of values was addressed by database query languages such as IQL [2]
and ILOG [6], that allowed oid or value invention. By relying on rules that use
both recursion and oid invention, it can be shown that such languages can express
all computable database queries [6]. However, it turns out that when recursion
and invention are not allowed to interact, as in SNQL, queries can be evaluated
in PTIME [2]. In SNQL, we consider another formalism used to invent values,
namely, SO tgds [11]. SO tgds use existentially quantified function symbols (and
equalities) to specify the composition of schema mappings, where values in a
target schema (output) may need to be existentially quantified (invented). Since
SO tgds are not recursive, materializing the result of a schema mapping (which
corresponds to creating an SN in our setting) can be done in PTIME [11].
Our contributions are as follows:
1. A data structure capable of representing the informational richness and mal-
leability of social networks.
2. A transformation and query language satisfying the data management re-
quirements of social networks with good properties: adequate expressiveness,
and accessible to social networks field practitioners.
3. A query language that includes object creation but maintains low complexity.
4. A corresponding evaluation algorithm whose complexity scales adequately.
The outline of the rest of the paper is as follows. Section 2 briefly presents
the requirements and SNA use cases. Section 3 defines the syntax (via exam-
ples) and semantics of SNQL. Section 4 presents results on the expressiveness
and complexity of SNQL. We included an Appendix with complete syntax and
extended related work.
2 SNQL Requirements and Use Cases
To identify the general requirements and operations needed for SNQL, we col-
lected use cases and archetypical operations from standard SNA literature [20,
17, 9], surveyed relevant publications, mainly from the journal Social Networks,
and studied the operations available in SNA software tools such as Pajek [9].
Use Case Description Required Query
Features
1. Selecting Select a subnetwork of actors and relations that Pattern match-
Groups satisfy conditions on their attribute values and/or ing, filtering by
participation in certain relations. attributes values.
2. Pro- From an actor A1 and one of its attributes (att, v) Pattern matching,
moting produce a new actor A2 = f (v) and a new relation creation of new ob-
Attributes R = g(A1 , v) (f and g functions): all actors with jects, pattern pro-
to Actors value v for att will be connected to A2 . duction.
3. Identify- For each characteristic brokerage pattern found, Pattern matching,
ing Brokers label the broker in the output accordingly. Some negation, pattern
patterns require certain relations do not exist. production.
4. Counting Select all relations of a given type, group by par- Pattern matching,
Binary Rela- ticipant actors, count. Produce only one relation aggregation, pat-
tions per group with the new attribute count. tern production.
5. Ego- Select an actor along with all its direct neighbors, Pattern matching,
network and the relations between them. induced subgraphs.
selection
6. k- Same as above but to distance k instead of one. Pattern matching,
neighbor- transitive closure,
hood induced subgraphs.
Table 1. Selected social network data management use cases.
2.1 Data structure requirements and definition
The natural and traditional choice for representing social networks is to use
graphs where actors are nodes and relations are edges. However, doing so lim-
its the representation power to that of binary relations and forbids attributes
on relations. Thus, our logical data structure, the social networks data model
(SNDM), is a graph where actors, relations and attributes are all modeled as
nodes, and edges associate attributes with the actors or relations they describe,
and actors with the relations in which they participate. This structure is imple-
mented using three sets of triples:
– A typing set N. Each triple (oid, [isa|isr], family) ∈ N indicates that the
actor (isa) or relation (isr) oid belongs to (is of type) family.
– A set R indicating roles. Each triple (a oid, role, r oid) ∈ R indicates that
the actor a oid participates with role in the relation r oid.
– A set M describing attributes. Each triple(oid, pred, v) ∈ M indicates that
the actor or relation oid has the attribute pred with value v.
2.2 Requirements of SNQL
Table 1 summarizes the use cases from the list gathered from the sources men-
tioned above. Each use case is selected for its relevance and justifies the inclusion
of a query language feature.
Social networks operations can be divided into two groups: data management
operations that return a social network, and measure operations that return
values or sets of values, such as centrality. Today there are various tools that
deal with measure operations (e.g. Pajek, R, and UCINET) and clearly belong
more to the SNA tools field than to the data management one.
SNQL focuses on the first type, data management operations, which produce
networks from networks. Through the use of aggregate functions, we assume
that node, group and network measures are available to the language but do not
need to be expressible in the language itself. Brandes [4] offers a survey of such
measures and the corresponding evaluation algorithms.
Each SNQL query must be able to filter and/or transform a given SN into a
new SN. Filter queries are used to reduce the size of a SN and to focus on rele-
vant groups. Transformations produce a new SN where some implicit structural
element has been made explicit.
3 The SNQL Query Language
The design of SNQL addresses the following issues: to be expressive enough, and
keep the evaluation cost under practical bounds. The challenge was to identify a
few generic operations to cover the required expressiveness, and to be closed, that
is, to be able to construct (and transform) social networks into social networks.
3.1 Query Syntax
The language should be friendly enough for both the lay-user and the program-
mer. For the former, a visual language close to the SN graphical representation
is ideal: in the simpler cases, one extraction pattern and one construction pat-
tern cover many use cases; in the general case, the extraction pattern should
resemble the DAG of query graphs that exists in GraphLog [8]. For the latter,
an SQL-like language would be familiar to developers and advanced database
users (for searching text, writing, pasting, debugging, etc.) Thus, our language
has both syntaxes.
At the abstract level, and for the purpose of formally studying and analyzing
its semantics and complexity, we use a translation to a more formal representa-
tion, based on Datalog/GraphLog [1, 8], and the data exchange formalism called
second-order tuple generating dependencies [11].
An SNQL construct query Q follows the standard SELECT|CONSTRUCT –
WHERE – FROM structure of languages like SQL and SPARQL. It receives as
input social networks (the FROM clause), extracts information using patterns
(the WHERE clause), and outputs a new social network, possibly with new val-
ues, using the CONSTRUCT clause. For space reasons, we present the syntax of
SNQL by example2 , using the SN of friendship relations among people presented
in Example 1 (see Fig. 1).
2
The complete syntax can be found in the tech report TR/DCC-2011-5 at
http://www.dcc.uchile.cl/reportes
Extraction Pattern 1: EP1 Extraction Pattern 2: EP2 Contruction Pattern 1: CP1 Contruction Pattern 2: CP2 L5
L1 L2 L3 L1
city
A1 name number
person city city A4
A5 friend R2 friend A6
L4
A2 friend R1 friend A3 city inhabitants city friendship city
person friendship person between cities
Fig. 3. Grouping and Aggregation. Patterns EP1 and CP1 group people by ‘city’ and
count the number of inhabitants in each city. Patterns EP2 and CP2 group and count
friendship relations between pairs of cities.
Example 3. (Promoting Attributes to Actors) Recall the query from Example 2,
where the patterns EP and CP were depicted in Fig. 2. Expanding the patterns
as lists of triples, the query is as follows:
CONSTRUCT {(A1, isa, person), (A2, isa, city), (R1, isr, friendship),
(R2, isr, lives-in), (A1, inhabitant, R2), (A1, P1, R1),
(A1, name, L2), (A2, place, R2), (A2, name, L1)}
IF R2=f(A1, A2) AND A2=g(L1)
WHERE {(A1, isa, person), (R1, isr, friendship),
(A1, city, L1), (A1, P1, R1), (A1, name, L2)}
FROM FriendshipNetwork
Example 4. (Grouping and Aggregation) The following SNQL query produces
the network depicted in Fig. 1(c) from that in Fig. 1(a), grouping people by city
and counting friendship relations between cities.
CONSTRUCT CP1 IF A4 = f(L1) AS SN1
WHERE AGG({L1}, COUNT AS L4, EP1)
FROM FriendshipNetwork
UNION
CONSTRUCT CP2 IF A5 = f(L2) AND A6 = f(L3) AND R2 = g(A5, A6) AS SN2
WHERE AGG({L2,L3}, COUNT AS L5, EP2 FILTER (L2 != L3))
FROM FriendshipNetwork
Patterns EP1, EP2, CP1, and CP2 are depicted in Fig. 3. Note that each new
group (actor) requires a new oid functionally produced from the value of at-
tribute ‘city’. Also the number of inhabitants bound to L4, and the number
of friendship-between-cities bound to L5, must be computed with the aggre-
gate function COUNT. The first argument of AGG is the set of grouping variables,
the second is the aggregation function required, and the third is an extraction
pattern. The results of the two construct queries are combined using UNION to
produce the desired result. t
u
Example 5. (Transitive Closure) The following SNQL query (whose patterns are
shown in Fig. 4) produces the network comprising an actor that meets a given
criterion (his name is ‘John’), along with all other actors that can be reached
transitively by matching the given pattern (of friendship relations). Additionally
all induced relations between pairs of reachable actors are included in the result.
Extraction Pattern 1: EP1 (TC) Extraction Pattern 2: EP2 Extraction Pattern 3: EP3 (TC) Contruction Pattern 1: CP1
L1 L2 L3 L4 L1 L5 L3 L4
name name name name name name name
name
A1 friend R1 friend A2 R2 friend A4 A1 friend R3 friend A5 R2 friend A4
A3 friend A3 friend
person friendship person person friendship person person friendship person person friendship person
Fig. 4. Transitive Closure. Patterns EP1 and EP3 identify all transitively reachable
actors from person named ‘John’ through ‘friendship’ relations. Patterns EP2 and CP1
produce the relations induced by pairs of actors in the set of reachable actors.
CONSTRUCT CP1
WHERE EP2 FILTER ((A3 != A4) AND (A3 = A1 OR A3 = A2) AND
(A4 = A1 OR A4 = A5))
AND (TC(A1, A2, EP1) WITH L1=’John’)
AND (TC(A1, A5, EP3) WITH L1=’John’)
FROM FriendshipNetwork
TC returns the transitive closure of the binary relation formed by all instantia-
tions of the variables appearing as its first and second arguments when matching
the extraction pattern of its third argument. A starting condition is specified af-
ter WITH. Hence variables A2 and A5 are bound to the people reachable from John
through transitive friendship relations. Pattern EP2, along with the FILTER con-
ditions above, is then used to match all the induced friendship relations between
distinct pairs of people reachable from ‘John’ (along with ‘John’ himself). t
u
3.2 Query Semantics
An SNQL query Q of the form CONSTRUCT WHERE FROM trans-
forms social networks into social networks.
The formal semantics can be expressed in standard formalisms (Datalog and
tuple-generating dependencies) as follows. Let D be a social network, Q an SNQL
query, and Q(D) the result of applying Q to D. For set of variables X, let x be
the tuple comprising all variables in X.
An extraction pattern is recursively decomposed and simulated by a Data-
log program as follows. Let PATT be the pattern to be simulated by predicate p
and assume that patterns PATT1 and PATT2 are simulated by p1 and p2 , respec-
tively. Let z, x and y contain the projected variables of PATT, PATT1 and PATT2,
respectively. The translation, based on the structure of PATT, is shown in Fig. 5.
The predicate p obtained from pattern PATT in the previous translation, is
now used to produce the query result. Here the list of triples of the CONSTRUCT
clause along with the corresponding lists of equalities play a central role. The
equalities are of two types. One type defines each variable: vi = termi , 1 ≤ i ≤ k;
the other is of the form termi = terml , where each term may contain variables
(from p), constants and functions.
For a given CONSTRUCT trList IF eqList the construction process takes
the result of the extraction process, the p(z) predicate, plus the list of equalities
eqList translated as ∧j eqj to produce the following rule:
0. Each triple t of the form (A,B,C) is translated as t(A, B, C), where t is n, r or
m according to the type of the triple. V
1. A list of triples (basic pattern) { t1, ..., tn }: p(z) ← i∈1..n ti (Ai , Bi , Ci ).
2. PATT1 AND PATT2: p(z) ← p1 (x), p2 (y)
3. PATT1 OR PATT2: p(z) ← p1 (x)
p(z) ← p2 (y)
4. PATT1 AND-NOT PATT2: p(z) ← p1 (x), ¬p2 (y).
5. PATT1 FILTER C: p(z) ← p1 (x), c(x)
(assuming condition C is simulated by predicate c)
6. TC (Vs, Vt, PATT1) WITH :
p(U, V ) ← p1 (. . . U . . . V . . . ), start cond(. . . U . . . V . . . )
p(U, V ) ← p1 (. . . U . . . W . . . ), p(W, V )
(assuming variable Vs corresponds to variable U and Vt to variable V of p1 (x))
7. AGG(VList, AggF, PATT1) : p(z, A(y)) ← p1 (z, y)
(assuming Vlist is the set of variables Z, Y = X − Z and AggF is the aggregate
function A)
Fig. 5. Translation of Extraction Pattern to Datalog.
^
construct(v1 , . . . , vk ) ← p(z) ∧ eqj . (1)
j
Finally, the resulting social network SN is the set of instantiations of each
triple t in the list of triples trList in the CONSTRUCT using the values in the
construct predicate:
[n o
SN = t(u1 , u2 , u3 ) : ∃(..u1 ..u2 ..u3 ..) ∈ construct and t in trList (2)
Example 6. Consider the SNQL query Q of Example 4 that involved grouping
and aggregation. The translation of Q to Datalog is shown in Figure 6. t
u
4 Complexity and Expressiveness
The main goal of this paper was to introduce a sufficiently flexible data model
and expressive query language that meets the data manipulation requirements
of social networks. In this section we state—without proof due to space con-
straints3 —results that show the good behaviour of the language regarding com-
plexity and expressive power.
SNQL is composed of two modules: one for extraction of information and
one for construction of a new network. In the design, consideration has been
given to providing the maximum expressiveness possible while keeping the com-
plexity of processing within reasonable bounds. First, for extraction, we consid-
ered GraphLog (possibly with summarization functions), which is designed to
3
Proofs can be found in the tech report TR/DCC-2011-5 at
http://www.dcc.uchile.cl/reportes
output-n(A4,isa,city) :- construct1(A4,L1,L4)
output-m(A4,name,L1) :- construct1(A4,L1,L4)
output-m(A4,inhabitants,L4) :- construct1(A4,L1,L4)
output-n(A5,isa,city) :- construct2(A5,R2,A6,L5)
output-n(R2,isr,friendship-between-cities) :- construct2(A5,R2,A6,L5)
output-n(A6,isa,city) :- construct2(A5,R2,A6,L5)
output-r(A5 friend,R2) :- construct2(A5,R2,A6,L5)
output-r(A6,friend,R2) :- construct2(A5,R2,A6,L5)
output-m(R2,number,L5) :- construct2(A5,R2,A6,L5)
construct1(A4,L1,L4) :- ag1(L1,N), A4=f(L1), L4=N
ag1(L1,count(A1)) :- ep1(A1,L1)
ep1(A1,L1) :- n(A1,isa,person), m(A1,city,L1)
construct2(A5,R2,A6,L5) :- ag2(L2,L3,M), A5=f(L2), A6=f(L3),
R2=g(A5,A6), L5=M
ag2(L2,L3,count(A2,R1,A3)) :- ep2(A2,A3,R1,L2,L3)
ep2(A2,A3,R1,L2,L3) :- n(A2,isa, person), n(R1,isr,friendship),
n(A3,isa,person), r(A2,friend,R1),
r(A3,friend,R1), m(A2,city,L2),
m(A3,city,L3), L2 != L3
Fig. 6. Translation of query in Example 4 to Datalog.
be as expressive as possible while staying within the LOGSPACE complexity
bound [8]. Second, for the construction module, the language is modeled after
SO tgds, which are known to be a family of transformations between tables of
tuples with the “right” expressiveness/complexity tradeoff [11].
It is therefore not surprising that SNQL covers all use cases of reasonable
complexity that we identified in current SN practice. (There are still some queries
which are not covered by SNQL, but it can be proved that they fall outside a
reasonably efficient complexity bound. A typical example is cohesive subgroups
such as k-cores.) Formally stated, this result can be presented as follows:
Claim. SNQL can express all use cases identified in SN practice that fall in the
NLOGSPACE complexity bound.
A formal proof of this claim relies on the list of use cases in current practice.
The column “Required Query Features” of Table 1 collects the features needed
for the classical use cases from the SN community. All of them, except induced
subgraph, are incorporated directly in the language. For the induced subgraph,
Example 5 gives the idea how this is done.
As for the expressive power as compared to classical databases languages, we
can prove the following two results:
Theorem 1. SNQL extraction has the same expressive power as GraphLog.
(a) For an expression
CONSTRUCT trList1 IF eqList1; ... trListn IF eqListn; V
translated in the form of eq. (3), define n clauses: auxj (xj ) ← p(x)∧ k eqk , j =
1, . . . , n.
(b) For each clause trListj IF eqListj; and each triple (x,y,z) in trListj, de-
fine a rule t(x, y, z) ← auxj (xj ).
(c) Add the clauses generated by (a) and (b) to the original Datalog program
generated from the extraction pattern (see Fig. 5).
(d) Obtain the values of the triples to be generated by running the new program.
Fig. 7. Evaluation Algorithm.
Theorem 2. SNQL construction can be specified by one SOtgd of the form:
∃f1 . . . fm (∀x1 (φ1 → ψ1 ) ∧ . . . ∧ ∀xn (φn → ψn )),
where each (φi → ψi ) has the form:
^
(p(x) ∧ eqk ) → (t1 ∧ . . . ∧ tr ), (3)
k
where p(x) and eqk follow the notation of (1) and (2), that is, predicate p(x) is
the result of the processing of the extraction pattern, and tj and eqk are predicates
resulting from the translation of the triples and equalities in the CONSTRUCT
clause, and each tuple xi includes all variables in p and in the eqk ’s.
A naive implementation of the semantics presented in Fig. 5 would materi-
alize intermediate results. This can be avoided by using the algorithm in Fig. 7.
Lemma 1. The Evaluation Algorithm is correct—it preserves SNQL semantics.
It is possible to show that the above evaluation computes queries efficiently
from a database perspective. As is customary when studying the complexity
of the evaluation problem for a query language [19], we consider its associated
decision problem. We denote this problem by Evaluation, defined as follows:
INPUT : A Social Network S, a query Q and a triple t = (a, b, c).
QUESTION : Is t ∈ [[Q]]D ?
Theorem 3. The complexity of Evaluation is in NLOGSPACE.
5 Conclusions
Based on social network practice, we have presented the design of a data model
and query language for SN. Of particular novelty, is the ability of our language to
transform one network into another, in the process creating new actors and new
attributes based on aggregation, features crucial for social network researchers.
We presented the syntax, semantics and complexity analysis. The language is
based on classical database results to obtain a good balance between expressive-
ness and complexity. Presenting both a graphical and SQL-like syntax, we de-
signed it to have the most encompassing expressiveness while staying tractable.
In fact, we show that it includes all tractable operations found in our survey
of SN data management practice, and we prove that the cost of transforming
networks can be done efficiently from a computational point of view.
Acknowledgements. C. Gutierrez thanks FONDECYT 1110287. P. T. Wood vis-
ited the U. of Chile funded by a Royal Society International Joint Project.
References
1. Abiteboul, S., Hull, R., Vianu, V.: Foundations of Databases. Addison-Wesley
(1995)
2. Abiteboul, S., Kanellakis, P.C.: Object identity as a query language primitive. J.
ACM 45(5), 798–842 (1998)
3. Amer-Yahia, S., Lakshmanan, L.V.S., Yu, C.: SocialScope: Enabling information
discovery on social content sites. In: CIDR. www.crdrdb.org (2009)
4. Brandes, U., Erlebach, T. (eds.): Network Analysis: Methodological Foundations,
LNCS, vol. 3418. Springer (2005)
5. Breiger, R., Carley, K., Pattison, P. (eds.): Dynamic Social Network Modeling and
Analysis: Workshop Summary and Papers. The National Academies Press (2003)
6. Cabibbo, L.: The expressive power of stratified logic programs with value invention.
Inf. Comput. 147(1), 22–56 (1998)
7. Carley, K.M.: Linking capabilities to needs. In: Breiger et al. [5], pp. 363–370
8. Consens, M.P., Mendelzon, A.O.: Graphlog: a visual formalism for real life recur-
sion. In: PODS. pp. 404–416. ACM Press (1990)
9. de Nooy, W., Mrvar, A., Batagelj, V.: Exploratory Social Network Analysis with
Pajek. Cambridge University Press (2005)
10. Dries, A., Nijssen, S., Raedt, L.D.: A query language for analyzing networks. In:
Cheung, D.W.L., et al. (eds.) CIKM. pp. 485–494. ACM (2009)
11. Fagin, R., Kolaitis, P.G., Popa, L., Tan, W.C.: Composing schema mappings:
Second-order dependencies to the rescue. ACM TODS 30(4), 994–1055 (2005)
12. Freeman, L., Romney, A.K., White, D.R. (eds.): Research Methods in Social Net-
work Analysis. Transaction Publishers (1992)
13. Güting, R.: Graphdb: modeling and querying graphs in databases. In: 20th VLDB
Conference. pp. 297–308 (1994)
14. Jagadish, H.V., Olken, F.: Database management for life science research. OMICS
7(1), 131–137 (2003)
15. Mika, P.: Social Networks and the Semantic Web, Semantic Web And Beyond
Computing for Human Experience, vol. 5. Springer (2007)
16. Ronen, R., Shmueli, O.: SoQL: A language for querying and creating data in social
networks. In: ICDE. pp. 1595–1602. IEEE (2009)
17. Scott, J.: Social Network Analysis. SAGE Publications, second edn. (2000)
18. Topaloglou, T., et al. Tyers, M.: Biological data management: Research, practice
and opportunities. In: VLDB. pp. 1233–1236 (2004)
19. Vardi, M.Y.: The complexity of relational query languages (extended abstract). In:
STOC. pp. 137–146. ACM (1982)
20. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications.
Structural Analysis in the Social Sciences, Cambridge University Press (1994)