An approach to efficiently storing property graphs in relational databases Work in progress paper Matthias Schmid University of Passau Germany Matthias.Schmid@uni-passau.de ABSTRACT Store [9] Sun et al. introduce an efficient approach to store Graph structured data can be found in an increasing amount property graph data in relational databases named SQL- of use-cases. While there exists a considerable amount of Graph. While the original approach seems to offer good solutions to store graphs in NoSQL databases, the com- performance and promises to also perform well on datasets bined storage of relationally stored data with graph struc- that do not fit into main memory, we were able to increase tured data within the same system is not well researched. its performance for read-only queries by adapting the adja- We present a relational approach to storing and process- cency list concept. ing huge property graphs, which is optimized for read-only The contributions of this paper are: queries. This way, all the advantages of full-fledged rela- 1. We propose a new adaption of the relational schema tional database systems can be used and the seamless in- presented in [9], tegration of classical relational data with graph-structured data is possible. 2. we show that the adapted schema performs better on read-only queries 1. INTRODUCTION 3. and we show that the new schema requires less disk The increasing appearance of data graphs reinforce the space. need for adequate graph data management. To meet this re- quirement of property graph storage, several graph databases 2. RELATED WORK engines have been developed. The most popular1 examples In this work we focus on the relational-based storage of of databases with graph storage capability include Neo4j2 , property graph data. Bornea et. al first proposed the ap- Microsoft Azure Cosmos DB3 and OrientDB4 . While these proach of shredding graph edges into adjacency tables in [1]. graph databases offer good performance as long as the data Sun et al. base their work in [9] on the previously mentioned fits into main memory, one can not always assume that this work of Bornea et. al and outlined a novel schema layout requirement can be fulfilled. In addition the seamless inte- for storage of property graph data in relational databases, gration of those solutions with relational-based information which is generalized from the approach to store RDF data in system remains a problem. relational databases. They combine the shredding of edges Motivated by our own use-case, in which we need to store into adjacency tables with the use of JSON -based attribute huge graphs that represent buildings, RDF data that seman- storage to overcome the limitations of fixed columns. To tically describes those buildings and relational data into a make the retrieval of data more convenient, they propose a single information system, we were looking for a relational- translation mechanism that converts Gremlin [8] queries to based solution that offers efficient read-focused performance. SQL queries, which can be run on the proposed relational In SQLGraph:An Efficient Relational-Based Property Graph schema. 1 In the field of RDF there exist numerous benchmarking from https://db-engines.com/de/ranking/graph+dbms, if not explicitly stated, all websites were accessed in Febru- efforts. But to the best of our knowledge the benchmarks ary 2019 provided by the Linked Data Benchmark Council (LDBC) 2 https://neo4j.com/ are the only benchmarks that directly address the problem 3 https://azure.microsoft.com/de-de/services/cosmos-db/ of benchmarking property graph stores. We do not con- 4 https://orientdb.com/ sider graph processing frameworks like Apache Giraph 5 in our work, since we focus on the storage and retrieval of graph data, not its parallel processing. The LDBC is an EU project with the goal to develop benchmarks for graph structured data. They strife to find the acceptance benchmarks like the TPC [7] have achieved. The Linked Data Benchmark Council - Social Network Benchmark (LDBC-SNB) is one approach being developed by the LDBC that uses a gener- 31st GI-Workshop on Foundations of Databases (Grundlagen von Daten- ated social network graph as its data set and represents the banken), 11.06.2019 - 14.06.2019, Saarburg, Germany. 5 Copyright is held by the author/owner(s). http://giraph.apache.org/ Our approach differs from the SQLGraph schema in re- spect to the definition of the adjacency tables. In their work Sun et al. [9] show that shredding vertex adjacency lists into a relational schema provides a significant advan- tage over other mechanisms, for example mechanisms that store all edge information in a single table. To this end a hash function has to be defined that matches edge labels to a respective column triple of the adjacency table. We directly apply the approach to compute a hash function through the use of coloring heuristics as described in [1]. Note that, since the LDBC-SNB data set includes a data model, it is possi- Figure 1: A property graph example ble to compute a conflict free hash function for this specific data set. In the original approach in [9] the edges are split into two tables: If only one edge of a label exists for the ver- data as a property graph. The benchmark suite also pro- tex, it’s edge-id, label and target node-id are stored in the vides a data set generator that can create test data with outgoing primary adjacency (OPA) and incoming primary different scale factors. An overview of different scale factors adjacency (IPA) tables respectively. By studying available and the corresponding number of vertices and edges is de- data sets and use cases (e.g. the LDBC-SNB) one can see picted in Table 1. Works on the Social Network Benchmark that it is usually not the case that only one edge of a spe- are not finished yet, but the Interactive Workload has been cific label exists for a single node. Therefore if the vertex released in draft stage [4]. has multiple outgoing edges of the same label, the edge- ids and target vertices are stored in the outgoing secondary SF Approx. #Vertices Approx. #Edges adjacency (OSA) and incoming secondary adjacency (ISA) 1 3,200,000 17,300,000 tables respectively, while the target vertex id of the OPA or 3 9,300,000 52,700,000 IPA is set to an artificial value that serves as the join partner 10 30,000,000 176,600,000 for the OSA and ISA table. A query to receive all outgoing 30 99,400,000 655,400,000 or incoming neighbours can be written with the use of an outer join. If there exists only a single edge, the outer join Table 1: Approximate number of vertices and edges will not find a corresponding partner in the secondary table by LDBC-SNB scale factor (SF) and therefore use the data in the primary table. If more edges do exist, the join partners will be the resulting edges. This query works independently of the number of edges per 3. SQLGRAPH SCHEMA ADAPTATION label and the hash function. We will present an example of this query structure for our schema adaption, that omits the For this paper we use the following definition of a property outer join, later in this section. graph. Nevertheless, this means every edge hop of a complex Definition 1. A property graph consists of a set of ver- query requires an (outer) join-operation between the pri- tices and directed edges. Every edge has a label assigned mary and secondary adjacency tables. This even is the case, to it. Each vertex or edge can additionaly have multiple if only a single edge of the required type is present, since key-value pairs assigned that serve as attributes. the construction of the query should be independent of the hash function. Because the retrieval of direct neighbours Figure 1 shows an example of a simple property graph that of a vertex is one of the most important operations in any describes two people, who are connected by an edge with graph application, this join poses a major bottleneck for the label knows. most queries. In our approach we eliminate the need for a join-operation 3.1 Proposed Schema between OPA and OSA by storing all edges of one type Our approach utilizes the combined approach of relational in the corresponding columns using arrays. Table 3 shows and JSON -based storage developed in [9]. We adapted the an example of our adjacency tables: In the graph are two schema of adjacency lists and as a result, our proposed knows-edges that point from the vertex with id 1 to vertices schema reduces the amount of required tables from six (as 2 and 3. Since in this example the knows and the likes in the original SQLGraph schema) to four tables. label are hashed to the same column triple, a second row for The vertex table stores the internal vertex id and the vertex 1 has to be inserted. Note that this can be omitted attributes for each vertex. In our prototypical implementa- with a better hash function or by providing more columns. tion the attributes are implemented as the jsonb data type An evaluation of optimal numbers of columns, if no hash provided by PostgreSQL. function with a low amount of resulting conflicts for a given amount of columns can be found, has not been part of our VID Attributes research yet and will have to be addressed in the future. Since we can provide a conflict free hash function for the {”lastName”: ”Mueller”, 1 LDBC-SNB data set, an outgoing neighbourhood query for ”firstName”: ”David”} a specific vertex can be answered by loading a single tuple {”lastName”: ”Choi”, 2 from the database. By eliminating the need of an outer join ”firstName”: ”Jae-Jin”} for any edge hop, a major drawback of the original schema is overcome. Table 2: The vertex attributes table VID EID1 Label1 Targets1 ... EIDk Labelk Targetsk 1 [4, 5] knows [2, 3] [11] creator of [13] 1 [12] likes [7] null null null 2 [6] likes [7] null null null Table 3: The adapted outgoing adjacency table huge to fit into main memory. To confirm the requirement of WITH u n s h r e d e d g e s AS ( redundant representation of edges, we defined path queries SELECT v e r t e x i d AS s o u r c e i d , with different fixed length and also path queries of variable UNNEST( a r r a y [ l a b e l 0 , . . . , l a b e l k ] ) AS l a b e l , lengths. After we confirmed the necessity of the edge tables, UNNEST( a r r a y [ t a r g e t 0 , . . . , t a r g e t k ] ) AS tmp we also chose several queries provided by the Interactive FROM o u t g o i n g a d j a c e n c y WHERE v e r t e x i d = Workload of the LDBC-SNB to evaluate the performance of ), our adjacency implementation against the original schema. g a t h e r e d g e s AS ( We chose test queries applying the following criteria: SELECT s o u r c e i d , l a b e l , a r r a y e l e m e n t s ( tmp ) AS t a r g e t i d FROM u n s h r e d e d g e s • The main pattern of the query uses paths of fixed WHERE l a b e l = length and use edges of known label. This criteria is ) desired, because preliminary evaluations (see [6]) have SELECT s o u r c e i d , l a b e l , t a r g e t i d FROM g a t h e r e d g e s shown, that in other cases the use of the attribute table performs better than the adjacency tables. Listing 1: Example neighbourhood query for a single node • The set of queries requires the use of incoming, outgo- ing and undirected edges to cover all combinations of An example for the general query structure implemented the adjacency tables. in the PostgreSQL dialect is depicted in Listing 1. The first common table expression unshred edges converts the list of • Different lengths of paths are contained in the set of tuples belonging to one vertex, of which every row contains queries to evaluate, if the difference in path length k edge types, into a list of rows that contains one edge type makes one of the two schema versions preferable. per row. The second common table expression gather edges then splits those tuples into a list of edges that represents a • Queries with big intermediate result sets are part of well known edge structure. the query set, since [9] states this type of query as a Analogue to the storage of vertex attributes, we store potential bottleneck of the original approach. edge attributes. As in the original schema, we do not The parameters required by the queries were randomly only store attributes in this table, but also additional in- chosen from the generated data set files and generated pa- formation about the edges. Namely the source vertex, the rameters. The same parameters were used for both ap- target vertex and the label of the edge are recorded. In [9] it proaches. is claimed, that queries that require to trace along an vari- To make the performance of the two schemas as compa- able number of edges are performed much more efficiently rable as possible, we first implemented all queries for the when the edge table is used instead of the adjacency table. adapted schema. Then we replaced only the necessary sub- Our preliminary evaluations support this claim. queries that concern the differences between the schemas, changing as little of the query as possible. We confirmed EID SID TID Label Attributes 4 1 2 knows {”since” : ”14.03.2016”} the correctness of our query implementations by comparing 5 1 3 knows {”since” : ”20.04.2016”} the results with results returned by a reference implemen- 6 2 7 likes null tation provided for Neo4j, of which correctness has been validated.6 Table 4: The edge attributes table We evaluated the previously chosen queries on the same hardware using the same data set. The evaluation was con- ducted on a dedicated server with two Intel Xeon 2.4GHz 3.2 Preliminary Schema Evaluation CPUs (in total 16 cores), 64GB memory and a single 240GB SSD running 64-bit Linux. We used PostgreSQL 10 on the In order to evaluate and test our concepts we have cre- aforementioned server, while we ran the client program of ated a prototypical implementation of the concepts in Post- the benchmark on a standard laptop that connected to the greSQL. database over LAN. All queries that were performed for 3.2.1 Methodology performance evaluation purposes were proceeded by several warm-up queries as in most state-of-the-art benchmarks and Our goal was to compare the read-only query performance also advised in [3]. of our adaptation with the original SQLGraph schema. To All examples previously shown in this paper are a sim- this end we used part of the Interactive Workload of the plified version of the test data provided by the LDBC-SNB LDBC-SNB [2, 4, 10]. data generator. We generated different data set sizes using the generator and performed tests with data sets that fit into the main 6 https://github.com/PlatformLab/ldbc-snb- memory of the system as well as data set sizes that are too impls/tree/master/snb-interactive-neo4j 3.2.2 Redundant Edge Data 10. This data set size does not fit in the main memory of Our findings support the need for redundant storage of the server. edge data in the edge attributes table as well as the adja- cency table as claimed by [1, 9]. We find a general tendency 60 for queries to perform significantly better with the use of ad- SQLGraph Schema jacency tables, if the queried path is of fixed length. While Adapted Schema this is true for paths of fixed length, the opposite holds for paths of variable length. This type of queries requires re- 40 cursive SQL queries. Recursive queries with the use of the time in [ms] edge attributes table outperform any recursive query that uses adjacency tables. In her Master’s thesis Kornev [6] conducted an extensive evaluation, that shows when to use the edge attributes table in favor of adjacency lists. 20 100 SQLGraph Schema Adapted Schema Required Disk Space in GB 0 80 SQ1 SQ3 SQ5 SQ7 LDBC SNB Short Queries - Scale Factor 10 60 Figure 3: Runtime of the Interactive Workload Short Queries 40 Figure 3 shows the results of the conducted tests in a 20 boxplot. The upper bound of the box shows the 75th per- centile, the line within the box shows the 50th percentile 0 and the lower bound of the box shows the 25th percentile, 1 3 10 while the whiskers show the maximal and minimal execution time respectively. The results of the simpler queries con- SNB Scale Factor firm our expectation regarding execution time. The schema adaption improved query performance across the board for Figure 2: Required storage capacity these queries. As we expected, the more edge hops a query performs, we can observe a bigger difference in execution time. Note that short query 3 (SQ3) uses an undirected 3.2.3 Required Disk Space edge. Since our schema does not directly support undi- Since we reduced the number of tables, we expected a re- rected edges, this query results in a SQL query that uses duction in required storage capacity. To this end we im- both the incoming adjacency and outgoing adjacency tables ported different sizes of generated data sets provided by and therefore is similar to a 2-edge-hop. the LDBC-SNB data set generator. The generator creates We then additionally conducted experiments on four com- data sets according to an input scale factor (1, 3, 10, 30, . . .). plex read-only queries of the Interactive Workload. Complex We then sampled the required storage capacity using Post- queries touch a significant amount of data and often include greSQL functions. The numbers shown in Figure 2 repre- aggregation [4]. The chosen queries differ in the count of re- sent the complete graph schema, including all indexes and quired edge-hops with CQ 12 containing the highest amount constraints. Due to hardware constraints we could not yet of hops. import data sets with a scale factor greater than 10. We will Once again our findings confirm an increase in read per- address higher scale factors in future work. Nevertheless our formance of our schema adaptation. Considering complex findings show, that our adaption of the schema reduces the queries the difference in execution time becomes even more required storage space by over 10%. This is mainly due to apparent, which confirms the findings in [9] that stated that the reduced overhead by removing one table and the asso- big intermediate join results, which is a point of focus for ciated index structure that is needed to efficiently join the the set of complex queries, can be a bottleneck for their OPA/IPA and OSA/ISA tables. approach. Our approach significantly increases query per- formance for these types of queries. 3.2.4 Query Performance We expected a reduction in query execution time for re- trieval queries that use the adjacency lists. To confirm this 4. CONCLUSION we evaluated our approach with the use of queries defined In this paper we have described our approach to store by the Interactive Workload. First we chose four queries of property graph data in a relational database. Our approach the short query set. This type of query usually requires the reduces the number of joins that are required for queries system to evaluate a relatively low amount of vertices and that contain paths of fixed length compared to earlier ap- edges to compute the answer, typically the neighbours of proaches. We have conducted a preliminary evaluation of one entity of the data set [4]. The results shown here were our approach using part of a standardized benchmark for performed against the generated data set with scale factor property graphs, namely the LDBC-SNB. The evaluation 60 6. ACKNOWLEDGMENTS SQLGraph Schema This research was partially supported by the ”Bayrische Adapted Schema Staatsministerium für Wirtschaft, Energie und Technologie” in the context of ”BaBeDo” (IUK568/001) which we conduct 40 in partnership with VEIT Energie GmbH. time in [s] 7. REFERENCES [1] M. A. Bornea, J. Dolby, A. Kementsietsidis, K. Srinivas, P. Dantressangle, O. Udrea, and 20 B. Bhattacharjee. Building an efficient rdf store over a relational database. pages 121–132, 06 2013. [2] P. A. Boncz. LDBC: benchmarks for graph and RDF data management. In 17th International Database 0 Engineering & Applications Symposium, IDEAS ’13, CQ2 CQ5 CQ8 CQ12 Barcelona, Spain - October 09 - 11, 2013, pages 1–2, LDBC SNB Queries - Scale Factor 10 2013. [3] D. Dominguez-Sal, N. Martinez-Bazan, Figure 4: Runtime of the Interactive Workload V. Muntes-Mulero, P. Baleta, and J. L. Larriba-Pey. A Complex Queries discussion on the design of graph database benchmarks. In Technology Conference on Performance Evaluation and Benchmarking, pages results show that the adaption is a more efficient version of 25–40. Springer, 2010. the database schema in regards to read-only queries. Due [4] O. Erling, A. Averbuch, J. Larriba-Pey, H. Chafi, to the reduction in the amount of required tables and there- A. Gubichev, A. Prat, M.-D. Pham, and P. Boncz. fore also index structures, we were also able to reduce the The LDBC social network benchmark: Interactive required disk space. workload. In Proceedings of the 2015 ACM SIGMOD Therefore we have found an approach that enables us to International Conference on Management of Data, efficiently store the property graph data from our use-case SIGMOD ’15, pages 619–630, New York, NY, USA, in a relational database and link it with the already exist- 2015. ACM. ing relational data. In addition, [9] show an approach to [5] N. Francis, A. Green, P. Guagliardo, L. Libkin, store RDF data in a property graph model. Thus, we can T. Lindaaker, V. Marsault, S. Plantikow, M. Rydberg, efficiently integrate all data of our use-case in a single rela- P. Selmer, and A. Taylor. Cypher: An evolving query tional database. language for property graphs. In Proceedings of the 2018 International Conference on Management of 5. FUTURE WORK Data, SIGMOD ’18, pages 1433–1445, New York, NY, Future evaluations will have to show the validity of the USA, 2018. ACM. approach compared to native database systems like Neo4j. [6] L. S. Kornev. Cypher für sqlgraph. Master’s thesis, Since most NoSQL and graph database systems focus on University of Passau, 2017. main memory computation, we expect those to perform com- [7] M. Poess and C. Floyd. New tpc benchmarks for parable or better than our approach on data sets that can decision support and web commerce. ACM Sigmod be handled in main memory. On the other hand we expect Record, 29(4):64–71, 2000. our approach do outperform non-relational based systems [8] M. A. Rodriguez. The gremlin graph traversal on data sets that require regular disk accesses due to their machine and language (invited talk). In Proceedings of size. These points will be addressed in future evaluations. the 15th Symposium on Database Programming Additionally we will evaluate the approach more exten- Languages, DBPL 2015, pages 1–10, New York, NY, sively using the complete LDBC-SNB Interactive Workload USA, 2015. ACM. on bigger data sets and more realistic server hardware. The [9] W. Sun, A. Fokoue, K. Srinivas, A. Kementsietsidis, workload also contains queries that insert data. We expect G. Hu, and G. Xie. Sqlgraph: An efficient updates to also perform efficiently. relational-based property graph store. In Proceedings One major drawback of this approach is the complexity of the 2015 ACM SIGMOD International Conference of the queries required to retrieve the data. To that end on Management of Data, pages 1887–1901. ACM, systems like Neo4j offer a very abstract and convenient way 2015. to query data with graph query languages like Cypher [5]. [10] G. Szárnyas, A. Prat-Pérez, A. Averbuch, J. Marton, We propose a translation mechanism that converts Cypher M. Paradies, M. Kaufmann, O. Erling, P. A. Boncz, queries to SQL queries that can be evaluated by our im- V. Haprian, and J. B. Antal. An early look at the plementation on PostgreSQL. Kornev [6] has already shown LDBC social network benchmark’s business that a translation of Cypher queries to SQL is possible. Un- intelligence workload. In Proceedings of the 1st ACM fortunately the translated queries do not in all cases achieve SIGMOD Joint International Workshop on Graph the efficiency that can be achieved with queries written by Data Management Experiences & Systems (GRADES) an expert. Therefore more optimization possibilities for the and Network Data Analytics (NDA), Houston, TX, translation mechanism need to be explored. This will be USA, June 10, 2018, pages 9:1–9:11, 2018. part of our future research. APPENDIX A. EVALUATION QUERIES: INTERACTIVE WORKLOAD Query Cypher Representation MATCH ( n : Person { i d : { i d } } ) − [ : IS LOCATED IN] −(p : P l a c e ) RETURN n . f i r s t N a m e AS f i r s t N a m e , n . lastName AS lastName , SQ 1 n . b i r t h d a y AS b i r t h d a y , n . l o c a t i o n I P AS l o c a t i o n I p , n . browserUsed AS browserUsed , n . g e n d e r AS gender , n . c r e a t i o n D a t e AS c r e a t i o n D a t e , p . i d AS c i t y I d MATCH ( n : Person { i d : { i d }}) −[ r :KNOWS] −( f r i e n d ) RETURN SQ 3 f r i e n d . i d AS p e r s o n I d , f r i e n d . f i r s t N a m e AS f i r s t N a m e , f r i e n d . lastName AS lastName , r . c r e a t i o n D a t e AS f r i e n d s h i p C r e a t i o n D a t e ORDER BY f r i e n d s h i p C r e a t i o n D a t e DESC, t o I n t ( p e r s o n I d ) ASC ; MATCH (m: Message { i d : { i d } } ) − [ :HAS CREATOR]−>(p : Person ) RETURN SQ 5 p . i d AS p e r s o n I d , p . f i r s t N a m e AS f i r s t N a m e , p . lastName AS lastName ; MATCH (m: Message { i d : { i d }}) < −[:REPLY OF] −( c : Comment) −[:HAS CREATOR]−>(p : Person ) OPTIONAL MATCH (m) − [ :HAS CREATOR]−>(a : Person ) −[ r :KNOWS] −(p ) RETURN c . i d AS commentId , c . c o n t e n t AS commentContent , c . c r e a t i o n D a t e AS commentCreationDate , p . i d AS r e p l y A u t h o r I d , SQ 7 p . f i r s t N a m e AS replyAuthorFirstName , p . lastName AS replyAuthorLastName , CASE r WHEN n u l l THEN f a l s e ELSE t r u e END AS replyAuthorKnowsOriginalMessageAuthor ORDER BY commentCreationDate DESC, t o I n t ( r e p l y A u t h o r I d ) ASC ; MATCH ( : Person { i d : { 1 } } ) − [ :KNOWS] −( f r i e n d : Person ) < −[:HAS CREATOR] −( message ) WHERE message . c r e a t i o n D a t e <= {2} AND ( message : Post OR message : Comment) RETURN f r i e n d . i d AS p e r s o n I d , f r i e n d . f i r s t N a m e AS personFirstName , f r i e n d . lastName AS personLastName , message . i d AS messageId , CASE e x i s t s ( message . c o n t e n t ) CQ 2 WHEN t r u e THEN message . c o n t e n t ELSE message . i m a g e F i l e END AS messageContent , message . c r e a t i o n D a t e AS messageDate ORDER BY messageDate DESC, t o I n t ( me ssageId ) ASC LIMIT { 3 } ; MATCH ( p e r s o n : Person { i d : { 1 } } ) − [ :KNOWS∗ 1 . . 2 ] − ( f r i e n d : Person ) <−[membership :HAS MEMBER] −( forum : Forum ) WHERE membership . j o i n D a t e >{2} AND not ( p e r s o n=f r i e n d ) WITH DISTINCT f r i e n d , forum OPTIONAL MATCH ( f r i e n d ) < −[:HAS CREATOR] −( p o s t : Post ) < −[:CONTAINER OF] −( forum ) CQ 5 WITH forum , count ( p o s t ) AS postCount RETURN forum . t i t l e AS forumName , postCount ORDER BY postCount DESC, t o I n t ( forum . i d ) ASC LIMIT { 3 } ; MATCH ( s t a r t : Person { i d :{1}}) < −[:HAS CREATOR] −() < −[:REPLY OF ] −(comment : Comment ) − [ :HAS CREATOR]−>( p e r s o n : Person ) RETURN p e r s o n . i d AS p e r s o n I d , p e r s o n . f i r s t N a m e AS personFirstName , CQ 8 p e r s o n . lastName AS personLastName , comment . c r e a t i o n D a t e , comment . i d AS commentId , comment . c o n t e n t AS commentContent ORDER BY commentCreationDate DESC, t o I n t ( commentId ) ASC LIMIT { 2 } ; MATCH ( : Person { i d : { 1 } } ) − [ :KNOWS] −( f r i e n d : Person ) OPTIONAL MATCH ( f r i e n d ) < −[:HAS CREATOR] −(comment : Comment ) − [ :REPLY OF] − >(: Post ) −[:HAS TAG]−>( t a g : Tag ) , ( t a g ) − [ :HAS TYPE]−>( t a g C l a s s : TagClass ) − [ : IS SUBCLASS OF ∗ 0 . . ] −>(b a s e T a g C l a s s : TagClass ) CQ 12 WHERE t a g C l a s s . name = {2} OR b a s e T a g C l a s s . name = {2} RETURN f r i e n d . i d AS f r i e n d I d , f r i e n d . f i r s t N a m e AS f r i e n d F i r s t N a m e , f r i e n d . lastName AS friendLastName , c o l l e c t (DISTINCT t a g . name ) , count (DISTINCT comment ) AS count ORDER BY count DESC, t o I n t ( f r i e n d I d ) ASC LIMIT { 3 } ; Table 5: Chosen queries for preliminary evaluations of the adapted schema approach