=Paper=
{{Paper
|id=None
|storemode=property
|title=Combined Structure-Weight Graph Similarity and its Application in E-Health
|pdfUrl=https://ceur-ws.org/Vol-1054/paper-04.pdf
|volume=Vol-1054
|dblpUrl=https://dblp.org/rec/conf/csws/KianiBB13
}}
==Combined Structure-Weight Graph Similarity and its Application in E-Health==
Combined Structure-Weight Graph Similarity and its Application in E-Health Mahsa Kiani, Virendrakumar C. Bhavsar, and Harold Boley Faculty of Computer Science University of New Brunswick Fredericton, NB, Canada {mahsa.kiani, bhavsar, harold.boley}[AT]unb.ca Abstract—A combined structure-weight similarity approach In this system, different aspects of a disease are weighted for comparing directed (vertex- and edge-)labeled (edge-) using regression estimation. Then, these calculated weights weighted graphs is presented. Vertex labels (as types) and are used as coefficients in a weighted distance measure. Note edge labels (as attributes) embody semantic information. Edge weights express assessments regarding the (percentage-)relative that each particular user group (e.g., profiles of all patients importance of the attributes, a kind of pragmatic information. having lung cancer) has the same values in the weight These graphs are uniformly represented and interchanged vector. This approach differs from the structural similarity using a weighted extension of Object Oriented RuleML. We algorithms [1], [2] which consider different set of weights propose semantic-pragmatic information retrieval and cluster- for each profile (even if they belong to the same group). The ing where a combination of structure and weight similarities between a query and stored graphs is calculated. The struc- similarity algorithms in [1], [2] compute the arithmetic mean ture and weight similarity values are used as primary and of the two weights on corresponding edges of compared secondary criteria, respectively, to rank the retrieved graphs. trees/wDAG in order to determine the weighted similarity. In The proposed weight similarity algorithm refines the ranking of this way, edge weights are used as scaling factors to ensure retrieved graphs that have identical or nearly identical query- that the overall similarity value is in the real interval [0, 1]. graph structure similarity but have different edge weights. It is shown that our approach leads to higher precision We have found that this approach cannot differentiate trees compared to earlier approaches that did not incorporate the nor wDAGs with different edge weights having identical similarity of edge weights. The proposed approach of semantic- or nearly identical structure similarity to the given query. pragmatic information retrieval and clustering can be applied, Therefore, we propose modifications to the original weighted for example, in e-Learning, e-Business, social networks, and similarity algorithm to address this issue. Health 3.0. In this paper, the application focus is in e-Health, specifically the retrieval of mental health records. In this paper, a combined structure-weight similarity algo- rithm is proposed based on two component algorithms: a Keywords-graph similarity; structure similarity; weight sim- version of the structure similarity algorithm in [2] and a new ilarity; weighted Object Oriented RuleML; e-Health. weight similarity algorithm. In our approach, we perform ranked retrieval over a set of (meta)data represented as I. I NTRODUCTION directed (vertex- and edge-)labeled (edge-)weighted graphs, Semantic information can be represented using hierarchi- each optionally associated with a data record. A special case cal structures, which express knowledge in multiple levels is that the ‘metadata’ already are the ‘data’ to be retrieved, of detail. In the e-Business domain, vertex-labeled, edge- with no need for a separate data record. Similar to [2], labeled and edge-weighted trees [1] are used in order to rep- graphs must be transformed to an internal representation resent attributes of products. In [2], these weighted trees are before computing their similarity. Such graphs are expressed generalized to weighted Directed Acyclic Graphs (wDAGs) using a weighted extension of Object Oriented RuleML in which substructures can be shared. Efficient similarity [5]. The XML parent-child structure reflects the hierarchical algorithms are required in many applications, such as for structure of the graphs, while the role elementex- schema matching in databases, buyer-seller matching in e- presses edge labels and the attribute weight expresses edge Business, and health record retrieval in e-Health. They can weights. Also, the sharing of a rooted subgraph by multiple also be used in social networks, e.g. to form similarity- parents can be represented using a RuleML element with an clustered wellness or patient groups [3]. Calculating similar- XML key referred to from multiple keyrefs. The graphs ities between patient profiles (i.e., health records) is difficult, could be expressed using other representation approaches as the various aspects of a disease should be weighted (e.g., Turtle [6] and RDF/XML [7]) as well. We assume that, differently, which entails that simple matching of attributes given a query graph, a ranked list of matching (meta)data is not adequate in e-Health [4]. Weights are already used graphs (and consequently corresponding records), which are in similarity algorithms [1], [2], [4]. In [4], similar patients stored in a dataset, is constructed. The structure similarity are identified based on similarity of symptoms and diseases. and the weight similarity algorithms match the query graph Copyright is held by the author(s) to each (meta)data graph and calculate their structure and a ranked list of the matching graphs is required to be edge weight similarity values, respectively. These pair values constructed. Subsequently, these ranked (meta)data graphs of structure and weight similarities (resulting from matching are used to look up corresponding records. The computed the query graph to each (meta)data graph) are considered weight similarity values should be comparable, therefore as ranking criteria to generate the ranking list of (meta)data (similar to [1]) our graphs have to conform to the same graphs. We demonstrate that this approach is able to differen- standard schema. tiate the graphs having identical or nearly identical structure Architecture: The proposed similarity approach has three similarity but different edge-weight similarity to the given modules: the structure similarity evaluation module, the query. weight similarity evaluation module, and the integration The proposed combined structure-weight similarity ap- and ranking module (see Figure 1). We have a set of proach is applied in e-Health domain. We represent graphs G = {G1 , G2 , G3 , · · · , Gn }, which represents the (meta)data of Electronic Medical Records (EMRs) using (meta)data for a set of records. Both number of vertices graphs which express disorders and treatment priorities and edges are assumed to be finite. Given a graph G0 , of patients. Then, our similarity approach is used to find the structure similarity of G0 with each member of G is mental health EMRs having similar (meta)data graphs to calculated using the recursive graph similarity algorithm a given query. To provide patient privacy and security for proposed in [2]; here G0 may represent a query. The struc- health records as well as (meta)data, different technological ture similarity algorithm is iterative. The given graphs are safeguards as well as policies could be used [8]. In addition, traversed from their roots to their leaves (top-down) and using (meta)data could act as an extra level of privacy, then their similarity is computed bottom-up. The structure as for extracting some statistics or trend, information in similarity values and weight similarity values are in the (meta)data itself is enough. Also, in retrieval applications, real interval [0, 1]. The weight similarity evaluation module only records related to the ranked results would be retrieved matches each member of G with G0 ; then it calculates the not all records. edge-weight similarity value. Figure 1 shows the architecture The rest of the paper is organized as follows. Section II of the similarity approach where G and G0 represent a explains our similarity approach. Section III focusses on an set of graphs and a given query graph being matched, application of the proposed approach in the e-Health domain. sSim(G, G0 ) denotes their structure similarity values, and Section IV concludes the paper. wSim(G, G0 ) expresses their weight similarity values. II. C OMBINED S TRUCTURE -W EIGHT G RAPH G input1 Structure Similarity sSim(G,G0 ) Evaluation Module S IMILARITY input2 Integration rankedList[G] input1 and Ranking In this section, graph representation and the architecture wSim(G,G0 ) Module input2 Weight Similarity of the combined structure-weight similarity approach are G0 Evaluation Module presented. The theoretical basis of the proposed weight Figure 1: Proposed Combined Structure-Weight Similarity similarity is explained and the characteristics of the weight Architecture similarity are mentioned. A recursive weight similarity al- gorithm and the computational experiments on a synthetic dataset are presented. The structure similarity values and weight similar- ity values of G and G0 are inputs to the integra- A. Approach tion and ranking module. After receiving the similarity Graph Representation: As stated earlier, we assume that pairs [sSim(Gi , G0 ), wSim(Gi , G0 )] for all graphs i = we are given a set of records, with each record having {1, 2, 3, · · · , n} in set G, the integration and ranking module an associated (meta)data represented as a graph. Note that ranks the graphs in G based on the structure similarity all graphs throughout this paper are single-rooted wDAGs. and weight similarity. Structure and weight similarity values All graphs are hierarchical as concepts can be represented could be combined with different approaches. Here, we using sub-concepts having different importance. The root consider weight similarity as the secondary criterion in vertex carries a class label, which types the main object. ranking of graphs. As a result, G1 could appear before This object is further described by the labeled weighted G2 (G1 G2 ) in the ranked list if and only if structure edges leading to other labeled vertices of the graph, etc. similarity value of G1 to G0 (the query) is greater than the Labels on outgoing edges from each given vertex are unique structure similarity value of G2 to G0 ; or the difference and appear in lexicographic (alphabetical) left-to-right order. between their structure similarity is less than or equal to Also, edge weights are values in the real interval [0, 1] and a threshold while the weight similarity value G1 to G0 is for each graph its edge weights normalized; therefore, the greater than the weight similarity value of G2 to G0 . Thus, sum of weights for all outgoing edges from each vertex (a) G1 G2 if and only if [sSim(G1 , G0 ) > equals 1. Further, we assume that given a query graph, sSim(G2 , G0 )], or [|sSim(G1 , G0 ) − sSim(G2 , G0 )| ≤ T hreshold and wSim(G1 , G0 ) > wSim(G2 , G0 )]. where dmax is the maximum possible depth of the source (b) G1 G2 or G2 G1 if [|sSim(G1 , G0 ) − vertex of corresponding edges in two graphs. sSim(G2 , G0 )| ≤ T hreshold and wSim(G1 , G0 ) = dX md wSim(G2 , G0 )] max X Sim = ( weSimp · Dd+1 ) (3) In this paper, we consider the threshold equal to 0. For d=1 p=1 each graph, we keep a count of the number of edges, assigning a unique integer j to each edge, starting from As the similarity of weights, numbers in the real interval 1 in top-down (root to leaf) and left-to-right order. As a [0, 1], related to two corresponding edges is calculated result, each edge is represented by ej , j {1, 2, 3, · · · , z}, using the Manhattan distance (Equation 1) or the Min/Max considering z as the total number of edges in a graph. As similarity measure (Equation 2), the similarity value of a all edges are directed, the source vertex u and the destination pair of weights weSimp , p {1, 2, 3, · · · , md } is in interval vertex v of each edge ej can be represented as an ordered [0, 1]. Also d, which is the depth of the source vertex related pair (u, v). Also, the weight of edge ej is represented to an edge, could be a value larger than or equal to 0. As a as w(ej ). The edge ej in graph G and the edge ej 0 in result, Dd+1 is a positive number. Thus, the summation of graph G0 are called corresponding edges if and only if they (weSimp · Dd+1 ) for all corresponding edges could result have identical edge labels as well as identical source vertex in a value larger than 1 and therefore Sim could be greater labels and destination vertex labels. The relation between than 1. In order to express the graph similarity as a value . in real interval [0, 1], the combined edge weight similarity corresponding edges ej and ej 0 is denoted as ej = ej 0 . Consider du as the depth of vertex u. In our graphs, du values (viz. Sim) is normalized by the sum of the Dd+1 and du0 are equal for two corresponding edges ej and ej 0 . used in various iterations of the recursive weight similarity algorithm. Starting from the first level in graphs, each time B. Weight Similarity a pair of weights is compared, the related depth factor is In the proposed weight similarity approach, the similar- added and this process is repeated for all levels of graphs. ity of weights related to two corresponding edges can be The normalization factor denoted by F is expressed as, calculated based on two similarity measures [9], viz. Man- dX md max X hattan distance, Equation 1, or Min/Max similarity measure, Equation 2, as given below: F = ( Dd+1 ) (4) d=1 p=1 weSim1 = 1 − |w(ej ) − w(ej 0 )| (1) Thus, the normalized weight similarity of two graphs (wSim) is given as, min(w(ej ), w(ej 0 )) weSim2 = (2) Sim max(w(ej ), w(ej 0 )) wSim = , (5) F The importance of each edge can be considered to be a which lies in real interval [0, 1]. The global depth degrada- function of the depth of its source vertex. As stated earlier, tion factor (D) could be equal to 1. In this case, the proposed the root vertex carries a class label, which types the main similarity approach gives the same importance to the weight object; therefore, the outgoing edges from the root have similarities of various levels of the graphs and the arithmetic the highest importance. This importance decreases as the mean of the weight similarity values is calculated. Therefore, depth of the source vertex of the edge increases. Similarly, the result of such a calculation is identical to considering contribution of the weight similarity of two corresponding the weight similarity of all attributes having the same effect edges in weight similarity of two graphs depends on the on the weight similarity of two graphs. This approach depth of the source vertex related to corresponding edges. results in a linear trend of similarity values. In Equation The coefficient for adjusting the contribution of edge weight 6, mtotal denotes the number of corresponding edges in similarity needs to decreases as the depth of the source total. weSim1(w(ej ), w(ej 0 )) is the similarity of weights vertex of corresponding edges increases. One approach for related to two corresponding edges based on the Manhattan defining this coefficient is using an exponential function with distance, while wSim is the global weight similarity of two D as the fixed base and d + 1 as the variable exponent. graphs based on the Manhattan distance. The same relation Therefore, in this paper, the adjustment coefficient is ex- holds when the weight similarity is calculated based on the pressed as Dd+1 . If p enumerates the pairs of corresponding Min/Max similarity measure as well. edges in depth d and md (md ≥ 0) denotes the number of mX corresponding edges in depth d, the weight similarity value total of graphs is expressed using Equation 3. In this equation, wSim = (1/mtotal ) · (weSim1(w(ej ), w(ej 0 ))) (6) each edge similarity value is multiplied by Dd+1 , in which k=1 D is the global depth degradation factor (D ≤ 0.5) and d is The weight similarity also has the following characteristics: the depth of the source vertex of the edge. 0 ≤ d ≤ dmax , (a) The similarity value generated by the weight similarity approach is a non-negative number. The minimum similar- learning component could be used to adjust the parameter. ity value equals 0. (b) The weight similarity of a graph Considering graphs G and G0 as the inputs of the algorithm, to itself is 1.0. The similarity of each pair of weights G.subgraph(ej ) denotes the sub-graph rooted at destination weSim(w(ej ), w(ej 0 )) is 1.0. Therefore, Sim has the same vertex of ej in graph G. G.root.label, G.root.inDegree, value as F and as a result the weight similarity of two and G.root.outDegree represent vertex label, in-degree, graphs (i.e., wSim) is equal to 1.0. (c) The weight similarity and out-degree of the root of graph G, respectively. Also, measure is a symmetric function, as the order of pair of ej ej 0 represents that ej could appear before ej 0 in graphs does not affect the result of the computation of a lexicographic ordered list. weSim is the similarity of weight similarity. (d) The weight similarity like many other weights related to two corresponding edges. root(G).depth similarities does not obey triangular inequality. The weight is a function which gives the depth for root of graph G similarity measure is a partial matching approach as only the relative to the root of the original graph. The output, wSim, weights related to the corresponding edges are compared. is the weight similarity value of G and G0 . The proposed weight similarity algorithm traverses two C. Algorithm given graphs in a top-down (root-leaf) order to compute the Algorithm 1, which calculates the weight similarity of two edge-weight similarity of the graphs. If two edges being graphs based on Manhattan distance, is represented in Figure traversed are corresponding edges, their weight similarity is 2. calculated using Equation 1 or 2. Two pointer variables, k and k 0 , indicate the positions of two outgoing edges being 1: procedure W S IMILARITY (G, G0 ) matched. If ej ej 0 , k is set to point to the next outgoing 2: if G or G0 only contains a single vertex then edge in G, while if ej 0 ej , k 0 would be increased to 3: return 0 . 4: end if point to the next outgoing edge in G0 . If ej = ej 0 , k and 5: if G.root.label 6= G0 .root.label then k are set to point to the next outgoing edges in G and G0 , 0 6: return 0 respectively. The loop is terminated as soon as any one of 7: else the following conditions is met: k > G.root.outDegree or 8: d ← root(G).depth k 0 > G0 .root.outDegree. 9: k←1 10: k0 ← 1 The algorithm is recursive, so the base case and recursive 11: while k ≤ G.root.outDegree case should be defined. The base case is where the problem ∧ k 0 ≤ G0 .root.outDegree do can be solved directly, while in the recursive case the prob- 12: ej ← G[k].root.edge lem is expressed as subproblems that are closer to the base 13: e0j ← G0 [k 0 ].root.edge case [10, pp. 228]. In this algorithm the base of the recursion . 14: if ej = ej 0 then is where G or G0 only contains a single vertex (Algorithm 15: F ← F + Dd+1 1, lines 2-4) or if G.root.label 6= G0 .root.label (Algorithm 16: weSim ← (1 − |w(ej ) − w(ej 0 )|) 17: Sim ← Sim + weSim · Dd+1 1, lines 5-6). In both cases, their weight similarity is 0. The + wSimilarity(G.subgraph(ej ), algorithm is tail recursive, i.e., the recursive invocation is G0 .subgraph(ej 0 )) the very last thing which is performed [10, pp. 245]. In 18: k ←k+1 the recursive case, the algorithm recursively invokes itself 19: k0 ← k0 + 1 using the roots of two sub-graphs of G and G0 as arguments 20: else if ej ej 0 then 21: k ←k+1 (Algorithm 1, line 17). 22: else As stated earlier, the labels of outgoing edges from each 23: k0 ← k0 + 1 vertex are arranged in the lexicographic order. Also, two 24: end if pointers indicate the positions of two edges being matched. 25: end while Using these features, the time complexity of the algo- 26: wSim ← Sim/F 27: return wSim rithm is improved. If G or G0 only contains a single 28: end if vertex or G.root.label 6= G0 .root.label for the roots 29: end procedure of two graphs, then the algorithm sets the weight sim- ilarity directly to 0 without any further computation; If Figure 2: Algorithm 1. Weight Similarity of two Graphs G.root.label = G0 .root.label, the algorithm uses one loop based on Manhattan Distance (Algorithm 1, line 11) to find the corresponding edges. For two graphs, consider t, t {1, 2, 3, · · · , r}, in which Algorithm 1 (see Figure 2) gives the weight similarity r equals to the total number of pairs of matched non-leaf algorithm, which traverses two input graphs G and G0 in vertices. When matching all outgoing edges of a pair of a left-right depth-first strategy. The parameter of the algo- vertices, three cases should be considered: (i) If ej ej 0 . rithm is D, which represents the global depth degradation or ej = ej 0 , for all values of k and k 0 , the number factor. Here we assume that D is equal to 0.5; however, a of iterations equals to IG t = G.root.outDegree, (ii) If ej 0 ej , for all values of k and k 0 , the number of Table I: Edge Weights of a Subset (G1 to G8 ) of 29 iterations is equal to IG0 t = G0 .root.outDegree, and Graphs (G1 to G29 ) with the Structure given in Figure 3 (iii) If only for some values of k and k 0 , ej ej 0 or ((a)) Weights of Edges e1 to e7 . ej = ej 0 , the number of iterations to find the corresponding Graph w(e1 ) w(e2 ) w(e3 ) w(e4 ) w(e5 ) w(e6 ) w(e7 ) edges is in the interval [min(IG t , IG0 t ), max(IG t , IG0 t )]. G1 0.01 0.99 0.01 0.99 0.01 0.99 0.01 The number of iterations for finding all corresponding G2 0.01 0.99 0.01 0.99 0.01 0.99 0.01 G3 0.01 0.99 0.01 0.99 0.01 0.99 0.01 edges in graphs, I, equals to the summation of itera- G4 0.01 0.99 0.01 0.99 0.01 0.99 0.01 tions Pr performedt for teach Prpair of vertices; I is in interval G5 0.01 0.99 0.01 0.99 0.01 0.99 0.25 t t G6 0.01 0.99 0.01 0.99 0.25 0.75 0.25 [ t = 1 min(I PG , IG 0 ), t=1 max(I G , IG )]. In the worst 0 r t t G7 0.01 0.99 0.25 0.75 0.25 0.75 0.25 case, I = t = 1 max(IG , IG 0 )], and therefore the com- G8 0.25 0.75 0.25 0.75 0.25 0.75 0.25 r plexity of the algorithm is Θ( t = 1 max(IG t , IG0 t )). P ((b)) Weights of Edges e8 to e14 D. Computational Experiments Graph w(e8 ) w(e9 ) w(e10 ) w(e11 ) w(e12 ) w(e13 ) w(e14 ) Now, we test the proposed weight similarity algorithm on G1 0.99 0.01 0.99 0.01 0.99 0.01 0.99 a synthetic dataset, in which weights are changed systemati- G2 0.99 0.01 0.99 0.01 0.99 0.25 0.75 cally to understand the effects of structure and weights on the G3 0.99 0.01 0.99 0.25 0.75 0.25 0.75 G4 0.99 0.25 0.75 0.25 0.75 0.25 0.75 similarity. The dataset contains graphs structurally identical G5 0.75 0.25 0.75 0.25 0.75 0.25 0.75 to the graphs given in Figure 3, but with different weights. G6 0.01 0.25 0.75 0.25 0.75 0.25 0.75 The graphs are balanced with maximum breadth assuming G7 0.01 0.25 0.75 0.25 0.75 0.25 0.75 G8 0.25 0.75 0.25 0.75 0.25 0.75 0.25 branching factor of 2. The dataset contains 29 graphs. A e1 e2 Figure 4 depicts the similarity values of G1 for the B C synthetic dataset with the remaining graphs using the graph similarity algorithm in [2] as well as our combined structure- e3 e4 e5 e6 D E F I weight similarity algorithm (using the similarity measure e7 e8 e9 e10 e11 e12 e13 e14 based on the Manhattan distance). 1.0 J K L O P 0.9 0.8 0.7 Figure 3: Graph Structure of Metadata in Dataset 0.6 Similarity 0.5 0.4 0.3 Graph Similarity in [2] 0.2 In this dataset, we have five possible values for a 0.1 Manhattan Approach Min/Max Approach 0 pair of edge weights: [0.01, 0.99], [0.25, 0.25], [0.5, 0.5], 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 [0.75, 0.25], or [0.99, 0.01]. In G1 of dataset, weights of all Graph(Gi ) edges having the same source vertex are [0.01, 0.99]. Now, we change the edge weights from right to left in a level Figure 4: Similarity of G1 to 29 Graphs in the Dataset and then bottom-up for various levels, exhausting the five possible sets of edge weight pairs. This results in 29 graphs While similarity values based on the previous graph in the dataset, of which eight graphs, G1 to G8 , are shown similarity algorithm [2] are always equal to 1, our com- in Table I, where each row represents the weights related to bined structure-weight similarity approach differentiates the a graph 1 . Enabling a compact specification and description structurally identical graphs with different weights. Also, of the weights, this notation is used to illustrate different Figure 4 gives a comparison of the two similarity measures, weight values for one graph structure. the similarity measure based on the Manhattan distance and Considering this systematic changes in weights, the the Min/Max similarity measure. Here again we compute weight similarities of G1 in the dataset with respect to the similarity w.r.t. G1 . For the depth degradation factor the remaining graphs are expected to decrease gradually. equal to 0.5 and for the same set of weights for a dataset, Therefore, the synthetic dataset provides a starting point for both similarity measures generate similarity values with a an evaluation of our weight similarity algorithm. decreasing trend. It is important to note that for Figure 4, the 1 The complete dataset is available from authors. similarity decreases as a result of the systematic change of weights of edges (having the same source): gradual increase of the edge weight for the left vertex and gradual decrease of the edge weight for the right vertex. The bumps in the similarity plots (e.g. at G8 , G15 , and G22 ) are observed as the result of level transitions, i.e., the systematic changes of weights in each level of the graph. We have given above the computational results for the always related to the last psychological evaluation of patients similarities of members of a dataset. We can generalize (available in the mental health records). Figure 5 illustrates the behavior of the similarity computation to other possible the generic structure of (meta)data of mental health EMRs graph structures such as trees [1], and generalized trees [11], in the database as well as a query having the same structure. and conclude that the proposed weight-similarity algorithm, Mental Health EMR with any one of the similarity measures, is effective in differentiating graphs having identical or nearly identical e1 e2 e3 Affective Set Cognitive Set structure similarity values (but different weights). Weight Behavioral Set e4 e7 similarity considers only weight of common subgraphs of e6 e8 two graphs being compared, while structure similarity takes e5 Aggression Set Panic Disorder Cocaine Intoxication into account common as well as uncommon subgraphs. Anorexia Nervosa Therefore, two graphs could be similar from weight sim- e9 e10 ilarity perspective, while their uncommon sub-graphs are large (i.e. small structure similarity). Note that although the Physical numerical similarity values of the two similarity measures are different, they result in the same relative ranking of Figure 5: Graph Structure of a Query and Metadata of the graphs for the given query. Since there is no universal Mental Health EMRs benchmark for evaluating similarity [12], it is not possible to select or recommend one of the similarity measures over Table II represents the edge labels of the generic structure the other and both similarity measures could be used for the (in Figure 5), in which l(ej ) denotes the label of edge ej , purposes of relative ranking. j {1, 2, 3, · · · , 10}. The patients have panic disorder and III. E-H EALTH A PPLICATION : M ENTAL H EALTH also delirium due to cocaine intoxication. Other disorders E LECTRONIC M EDICAL R ECORD of the patients are anorexia nervosa and physical aggression including fantasies and real acts [15, pp. 421]. Group therapy is used as a treatment option for drug abusers [13, pp. 577-620]. Newcomers should be placed in Table II: Edge Labels of a Query and Metadata Graphs groups with at least one or two similar members. Open group (having the Structure in Figure 5) for Mental Health EMRs membership in which new members are allowed to enter as others leave is the norm [14, pp. 262-273]. Therefore, l(e1 ) Affective disorders l(e6 ) Aggression l(e2 ) Behavioral disorders l(e7 ) Delirium retrieving similar mental health EMRs to select patients for l(e3 ) Cognitive disorders l(e8 ) Substance induced panic group therapy is a challenging task. This selection should be l(e4 ) Anxiety l(e9 ) Real act based on the gathered dynamic, behavioral, and diagnostic l(e5 ) Appetite disorder l(e10 ) Fantasies information in a screening interview [15, pp. 934]. Consider the scenario where the user (e.g., a psychologist) wants to Edge weights of four EMR (meta)data, representing the find an appropriate group for a new patient in order to diagnosis segment of a mental health EMR, and a query are schedule group therapy sessions. In this case, mental health illustrated in Table III. Note the different last subscripts for EMRs that describe similar disorders as well as treatment the two edges emanating from the Aggression vertex and priorities should be found. Each (meta)data expresses the terminating at the same Physical destination vertex. Further, individualized treatment plan about patient’s disorders and there are three edges from the root vertex. the treatment priorities based on the last psychological evaluation. Similar to [2], we represent the attributes of each Table III: Edge Weights of a Query (G01 ) and four (meta)data using a graph based on a standard schema. The Metadata Graphs (having the Structure in Figure 5) for attributes of this schema are extracted from [15], [16], and Mental Health EMRs the terms representing the (meta)data are based on DSM IV Graph w(e1 ) w(e2 ) w(e3 ) w(e4 ) w(e5 ) w(e6 ) w(e7 ) w(e8 ) w(e9 ) w(e10 ) [16]. The attributes express possible affective, behavioral, G01 0.01 0.01 0.98 1.0 0.01 0.99 1.0 1.0 0.01 0.99 and cognitive problems of a patient. The edge weights in G1 0.01 0.01 0.98 1.0 0.01 0.99 1.0 1.0 0.01 0.99 G2 0.5 0.25 0.25 1.0 0.25 0.75 1.0 1.0 0.25 0.75 graphs represent the relative priority regarding treatment G3 0.4 0.3 0.3 1.0 0.5 0.5 1.0 1.0 0.5 0.5 of each disorder in the group therapy session. Therefore, G4 0.3 0.35 0.35 1.0 0.75 0.25 1.0 1.0 0.75 0.25 severe, influential, and dangerous disorders as well as the items for which treatments have the greatest benefit have Now we compare the similarity of query with the four higher priority (i.e., higher weight) in our treatment-oriented (meta)data graphs G1 , G2 , G3 , and G4 of the EMRs given (meta)data. As treatment priorities change over time, edge in Tables II and III using the combined structure-weight weights could be different in each evaluation phase by the similarity algorithm. The computed similarity values are psychologist. In order to select patients for group therapy, given in Table IV. The structure similarity values between in the proposed system the edge weights of (meta)data are query G0 and any of four (meta)data graphs are identical; [4] Fritz, P. and Klenk, S. and Dippon, J. and Heidemann, G., therefore, we cannot distinguish between them using the “Determining patient similarity in medical social networks,” structure similarity alone. The edge weight similarity results in Proc. MedEx Workshop, 2010, pp. 6–13. using the proposed algorithm are also shown in Table IV. [5] H. Boley, “Object-Oriented RuleML: User-Level Roles, URI- Grounded Clauses, and Order-Sorted Terms,” in Proc. Rules Table IV: Computational Results for the Metadata of and Rule Markup Languages for the Semantic Web (RuleML- Mental Health EMRs and the Query in Table III 2003). LNCS 2876, Springer, Oct. 2003, pp. 1–16. Graph Graph Structure Manhattan Min/Max Rank Similarity Approach Approach [6] D. Beckett and T. Berners-Lee. (2011) Turtle A Readable RDF Syntax. [Online]. Available: http: G0 G1 1.0 1.0 1.0 1 //www.w3.org/TeamSubmission/turtle/ G0 G2 1.0 0.6834 0.3762 2 G0 G3 1.0 0.6356 0.3492 3 [7] Cyganiak, R. and Wood, D., Ed., Resource Description G0 G4 1.0 0.5878 0.3249 4 Framework (RDF): Concepts and Abstract Syntax. World Wide Web Consortium, Jan. 2013. [Online]. Available: http://www.w3.org/TR/2013/WD-rdf11-concepts-20130115/ We can clearly see that the similarities are different and they can be used to rank four (meta)data graphs. Further, [8] Jacques, L.B., “Electronic Health Records and Respect for both similarity measures (see columns 4 and 5 in Table IV) Patient Privacy: A Prescription for Compatibility,” Vand. J. are equally acceptable as they result in the same relative Ent. & Tech. L., vol. 13, pp. 441–462, 2011. ranks. Instead of ranked graphs based on their similarity to a [9] Boriah, S. and Chandola, V. and Kumar, V., “Similarity given query, the proposed approach could cluster the mental Measures for Categorical Data: A Comparative Evaluation,” health EMRs based on a threshold to facilitate creation of in Proc. the 8th SIAM International Conference on Data supportive virtual communities, which is one of the main Mining, 2008, pp. 243–254. goals of Health 3.0 [17]. [10] Drake, P., Data Structures and Algorithms in Java. Upper IV. C ONCLUSION Saddle River, NJ, USA: Prentice-Hall, Inc., 2005. Our combined structure-weight similarity approach is able to distinguish graphs having identical or nearly identical [11] Dehmer, M. and Emmert-Streib, F. and Kilian, J., “A Similar- ity Measure for Graphs with Low Computational Complex- structure but different weights. By considering the weight ity,” Applied Mathematics and Computation, vol. 182, no. 1, similarity in addition to the structure similarity, preferences pp. 447 – 459, 2006. of user are compared with the preferences expressed as edge weights of graphs stored in dataset. The similarity [12] Janowicz, K. and Raubal, M. and Schwering, A. and Kuhn, of edge weights is calculated in a recursive way, giving W., “Semantic Similarity Measurement and Geospatial Ap- plications,” T. GIS, vol. 12, no. 6, pp. 651–659, 2008. more importance to weights of edges in higher levels of a graph. The combined structure-weight similarity algorithm [13] Carr, A., The Handbook of Child and Adolescent Clinical has been implemented in Java and it has been applied to Psychology: A Contextual Approach. Rouledge, New York: retrieve mental health electronic medical records (EMRs). Taylor and Francis Group, 1999. ACKNOWLEDGMENTS [14] Ruiz, P. and Strain, E.C. and Langrod, J., The Substance The authors would like to thank Mehrdad Kiani, M.D., Abuse Handbook, ser. Doody’s all reviewed collection. who provided help in the health domain. This research Philadelphia: Wolters Kluwer Health/Lippincott Williams and Wilkins, 2007. is partially funded by a Discovery Grant from the Nat- ural Sciences and Engineering Council of Canada and a [15] Sadock, B.J. and Sadock, V.A., Kaplan and Sadock’s Syn- Ph.D. Fellowship Grant from the Atlantic Computational opsis of Psychiatry: Behavioral Sciences/Clinical Psychiatry, Excellence Network (ACEnet) awarded to the second author. 10th ed. Philadelphia: Lippincott Williams and Wilkins, 2007. R EFERENCES [1] Bhavsar, V.C. and Boley, H. and Yang, L., “A Weighted-Tree [16] A. P. Association and A. P. A. T. F. on DSM-IV., Diagnostic Similarity Algorithm for Multi-Agent Systems in E-Business and Statistical Manual of Mental Disorders: DSM-IV-TR., Environments,” Computational Intelligence, vol. 20, no. 4, pp. 4th ed., ser. Diagnostic and Statistical Manual of Mental Dis- 584–602, 2004. orders. Washington, DC: American Psychiatric Association, 2000. [2] Jin, J., “Similarity of Weighted Directed Acyclic Graphs,” MSc Thesis, Faculty of Computer Science, University of New [17] Kiani, M. and Bhavsar, V.C. and Boley, H., “Clustering Using Brunswick, Canada, Sep. 2006. Combined Structure-Weight Graph Similarity,” University of New Brunswick, Canada, Internal Report (In Preparation). [3] Boley, H. and Shafiq, O., and Smith, D. and Osmun, T., “The Social Semantic Subweb of Virtual Patient Support Groups,” in Proc. the 3rd Canadian Semantic Web Symposium (CSWS2011), Vancouver, British Columbia, Canada. CEUR, Aug. 2011, pp. 1–18.