Unilateral Jaccard Similarity Coefficient

                            Julio Santisteban                                        Javier L. Tejada Carcamo
               Universidad Católica San Pablo                                     Universidad Católica San Pablo
         Campus Campiña Paisajista s/n Quinta Vivanco,                      Campus Campiña Paisajista s/n Quinta Vivanco,
                    Barrio de San Lázaro                                               Barrio de San Lázaro
                       Arequipa, Peru                                                     Arequipa, Peru
                    jsantisteban@ucsp.edu.pe                                            jtejadac@ucsp.edu.pe


ABSTRACT                                                                                             d(i, j) = d(j, i)                  (2)
Similarity measures are essential to solve many pattern recog-
nition problems such as classification, clustering, and re-                                    d(i, j) + d(j, k) ≥ d(i, k)              (3)
trieval problems. Various similarity measures are catego-                    Here for objects i, j and k, where d() is the distance between
rized in both syntactic and semantic relationships. In this                  objects i and j. Bridge [1] argues that there exists empiri-
paper we present a novel similarity, Unilateral Jaccard Sim-                 cal evidence of violations against each of the three axioms.
ilarity Coefficient (uJaccard), which doesn’t only take into                 Yet, there also exists geometric models of similarity which
consideration the space among two points but also the se-                    take asymmetry into account [10]. Nosofsky points out that
mantics among them.                                                          a number of well-known models for asymmetric proximity
                                                                             data are closely related to the additive similarity and bias
Categories and Subject Descriptors                                           model [5]. Tversky [13] has proposed a different model in or-
                                                                             der to overcome the metric assumption of geometric models.
E.1 [Data Structures]: Graphs and networks; G.2.2 [Graph
                                                                             One of the strengths of contrast models is its capability to
Theory]: Graph algorithms
                                                                             explain asymmetric similarity judgments. Tversky’s asym-
                                                                             metry may often be characterized in terms of stimulus bias
General Terms                                                                and determined by the relative prominence of the stimuli.
Theory                                                                                                        |A ∩ B|
                                                                                     sim(a, b) =                                 ,
                                                                                                   |A ∩ B| + α|A − B| + β|B − A|        (4)
Keywords                                                                                                                     α, β ≥ 0
Jaccard, distance, similarity
                                                                             Here A and B represent feature sets for the objects a and b
                                                                             respectively; the term in the numerator is a function of the
1.    INTRODUCTION                                                           set of shared features, a measure of similarity, and the last
   Since Euclid to today many similarity measures have been                  two terms in the denominator measure dissimilarity: α and
developed to consider many scenarios in different areas, par-                β are real-number weights; when α != β. Jimenez et al. [6],
ticularly in the last century. Similarity measures are used                  Weeds and Weir [14] and Lee [7] also propose an asymmet-
to compare different kind of data which is fundamentally                     ric similarity measure based on Tversky’s work. However
important for pattern classification, clustering, and infor-                 all proposals include a stimulus bias, asymmetric similar-
mation retrieval problems [3]. Similarity relations have gen-                ity judgments, which Tversky refers to as human judgment.
erally been dominated by geometric models in which objects                   Today, similarity measure is deeply embedded into many of
are represented by points in a Euclidean space [12]. Simi-                   the algorithms used for graph classification, clustering and
larity is defined as “Having the same or nearly the same                     other tasks. Those techniques are leaving aside the seman-
characteristics” [4], while the metric distance is defined as                tic of each vertex and it’s relation among other vertices and
“The property created by the space between two objects or                    edges.
points”. All metric distance functions must satisfy three ba-                In a direct graph, the similarity from U to Z is not the same
sic axioms: minimality and equal self-similarity, symmetry,                  as the distance from Z to U, this due to the intrinsic features
and triangle inequality.                                                     of a direct graph. The similarities are different because the
                      d(i, i) = d(j, j) ≤ d(i, j)                    (1)     channels are dissimilar. According to Shannon’s informa-
                                                                             tion theory we could argue that each vertex is a source of
                                                                             energy with an average entropy which is shared among it’s
                                                                             channels, and while that information flow among the ver-
                                                                             tex’s channels, we need to be consider it in the similarity. A
                                                                             similarity does not fit all tasks or cases.
                                                                             In Natural Langue Processing, where the similarity between
Copyright c 2015 for the individual papers by the papers’ authors. Copy-     two words is not symmetric sim(word a,word b) != sim(word
ing permitted for private and academic purposes. This volume is published    b,word a). WordNet [4] presents 28 different types of rela-
and copyrighted by its editors.
SIGIR Workshop on Graph Search and Beyond ’15 Santiago, Chile                tions; those relations have direction but are not symmetrical,
Published on CEUR-WS: http://ceur-ws.org/Vol-1393/.                          they are not even synonyms because each synonym word has
                                                                   same problem. On the other hand we propose a measure
                   1           2           3
                                                                   that does not include this bias. We propose a modified ver-
                                                                   sion of Jaccard Similarity coefficient (1), unilateral Jaccard
                                                                   Similarity coefficient (uJaccard) (2)(3), used to identify the
                                     y
                                                                   similarity coefficient of Va to Vc With respect to vertex Va,
                          x
                                                                   and to also identify the similarity coefficient of Vc to Va
                                                                   With respect to vertex Vc.

           Figure 1: Structural Equivalence.                                                              |a ∩ c|
                                                                                    Jaccard(Va , Vc ) =                        (5)
                                                                                                          |a ∪ c|

a particular semantic, meaning and usage, but are similar.                                                 |a ∩ c|
Hence if two words have symmetric distance or similarity,                         uJaccard(Va , Vc ) =                         (6)
                                                                                                         |edges(a)|
those two words are the same. Paradigmatic is an intrinsic
feature in language, It lets the utterer exchange words with                                               |c ∩ a|
other words, words with similar semantics [11]. In this paper                     uJaccard(Vc , Va ) =                         (7)
                                                                                                         |edges(c)|
we focus on paradigmatic analysis to support our unilateral
Jaccard Similarity coefficient (uJaccard).                         Here Va and Vc are the number of edges in vertex a and
The rest of the paper is organized as follows. In section 2 we     c, likewise the edges(Vc) are the number of edges in ver-
will show the unilateral Jaccard Similarity coefficient (uJac-     tex c. if uJaccard is close to 0, it means that they are not
card). In section 3 we will consider some cases; finally in        similar at all. The objective of using uJaccard is to iden-
section 4 we conclude this work.                                   tify how similar a vertex is to other vertices in relation to
                                                                   itself. uJaccard could be calculated among two connected
2.    PARADIGMATIC SIMILARITY DEFINI-                              vertices, uJaccard could also be calculated among vertices
                                                                   that are not connected directly, but which are connected
      TION                                                         by in-between vertices. The number of in-between vertices
                                                                   could be from 1 to n, we do not recommend a deep compar-
2.1     Basics Of Paradigmatic Structures                          ison since the semantics of the vertex loosest its meaning.
   Paradigmatic analysis is a process that identifies entities     Hence max(n)=3, it is suggested for NLP. For the calcula-
which are not related directly but are related by their prop-      tions we do not consider the number of in-between vertices
erties, relatedness among other entities and interchangeabil-      since we focus on the information flow and not the informa-
ity [2]. In language the reason why we tend to use mor-            tion transformation carried out on the intermediate vertices.
phologically unrelated forms in comparative oppositions is
to emphasize the semantics, this is done by substitution and       3.    EXPERIMENTAL EVALUATION
transposition of words with a similar signifier. Similarity is
not defined by a syntactic set of rules but rather by the use
of the language. In some cases this use is not grammatically                        2                               8
or syntactically correct but it is commonly used. We defined                                                        9
the signifier as being the degree of relation among entities
                                                                                    3            7                        12
of the same group, where not all members of the group have               1          4                           10
the same degree of relatedness. This is due to the fact that
a member of a group might belong to more than one group.                            5                           11
2.2     Extended Paradigmatic                                                       6
  Two vertices in a graph are structurally equivalent if they
share many of the same network neighbours. Figure 1 de-                              Figure 2: Toy graph.
picts a structural equivalence between two vertices y and x
who have the same neighbours. Regular equivalence is more
subtle, two regularly equivalent vertices do not necessarily       3.1       Toy Testing
share the same neighbours, but they do have neighbours                Using similarity uJaccard (6),(7) we can build a paradig-
who are themselves similar [8] [15]. We will use structural        matic approach to group vertices. Figure 2 shows a toy
equivalence as the bases of uJaccard.                              graph with 12 vertices and 16 edges, following the paradig-
                                                                   matic analysis, we can determine that vertex 12 and 7 belong
2.2.1    Unilateral Jaccard Similarity                             to group P because they have the same number of edges to
   To calculate a paradigmatic similarity we start with a          a same set of vertices. Vertex 1 also belongs to group P
question, is the similarity coefficient from vertex Va to Vc       because vertex 1 has 3 of the 5 edges, the same as vertex 7,
the same to the similarity coefficient from vertex Vc to Va        the degree of membership of vertex 1 is lower than vertices 7
?. If we argue that both similarity coefficients are the same,     and 12 because vertex 1 has other edges that are not shared
we are arguing that the edges from the vertices Va and Vc          by vertices 7 or 12. In the same manner we can determine
are the same, and it is clear that that is not usually the case.   that vertex 8, 9, 10 and 11 belong to group Q because they
Thus both vertices have different sets of edges. One problem       have an equal number of edges to the same set of vertices.
with Tversky [13] similarity is the estimation for α and β         Similarly vertices 2, 3 and 4 belong to group R, and vertices
which are stimulus bias, generally a human factor. Similarly,      5 and 6 belong to group O. In this example we can eas-
other similarities which are based on Tversky idea, have the       ily identify the paradigmatic approach, where two or more
vertices belong to the same group if they have the same or              It is shown in figure 3.2 that node 1 might belong to clus-
similar neighbours, but the neighbours in turn belong to an-            ter {2,3,4} or cluster {5,6}; to resolve this problem we use
other group.                                                            uJaccard similarity measure to find the similarity of node 1
  Following the uJaccard similarity and the paradigmatic                to other nodes. Table 3 shows that similarities from node 1
                                                                        to other nodes 1 level deep are the same, so we could not
                                                                        allocate node 1 to a particular cluster. Table 3 also shows
      Table 1: uJaccard calculation from figure 3                       that similarities from node 1 to other nodes 2 levels deep,
            uJaccard (V1,V7) = 3/5 = 0.600                              in which uJaccard(1,3) has a strong similarity over the rest.
            uJaccard (V7,V1) = 3/7 = 0.428                              We could conclude that node 1 belong to cluster {2,3,4}.
            uJaccard (V7,V12) = 4/7 = 0.571                              In figure 3.2 also node 1 might belong to cluster {2,3,4} or
            uJaccard (V12,V7) = 4/4 = 1.000
            Jaccard (V1,V7) = 3/9 = 0.333                                                          4                    5
            Jaccard (V7,V1) = 3/9 = 0.333
            Jaccard (V7,V12) = 4/7 = 0.571                                                 3               1
            Jaccard (V12,V7) = 4/7 = 0.571
                                                                                                   2                    6
approach, the results of the graph in figure 3 are shown in
table 3.1. we notice that uJaccard similarity provides bet-                                Figure 3: Toy graph.
ter information of similarity than Jaccard, this is because
uJaccard considers the notion of unilateral similarity. Table
3.1 shows three toy graphs, in which we present a compar-
                                                                              Table 3: Cut a graph 3.2 using uJaccard
ison between Jaccard and uJaccard. As show in table 3.1
                                                                                   1 level deep       2 levels deep
uJaccard provides a unilateral similarity improving the sym-
                                                                                uJaccard(1,4) 1/4 uJaccard(1,2) 1/4
metric similarity Jaccard.
                                                                                uJaccard(1,2) 1/4 uJaccard(1,3) 2/4
  Table 3.1 shows three toy graphs, in which we present a
                                                                                uJaccard(1,5) 1/4 uJaccard(1,4) 1/4
                                                                                uJaccard(1,6) 1/4 uJaccard(1,5) 1/4
         Table 2: Test uJaccard in toy graphs                                   -               - uJaccard(1,6) 1/4
                                   uJaccard
                                   sim(2,5)=4/4                         cluster {5,6,7} or cluster {8,9,10,11}; this is where uJaccard
           2
                                   sim(5,2)=4/6                         comes in, being able to solve this problem. Table 4 shows
                                                                        result of similarities from node 1 to all other nodes on the
                     1   3           4                   Jaccard        network in different levels deep. cluster {8,9,10,11} presents
                                              7
                                                         sim(2,5)=4/6   the highest number of strong similarities, therefor we can
                 5                                       sim(5,2)=4/6   conclude that node 1 belongs to cluster {8,9,10,11}.
             6
                                                         uJaccard                                  4       2
                                                         sim(7,5)=1/3
                 6               7                       sim(5,7)=1/1                                  3
                 5       10              8               Jaccard
                                                         sim(7,5)=1/3                                  1
                                 9                       sim(5,7)=1/3
                                                                                           5                        8
                                                         uJaccard                      6       7               11           10
                                                         sim(2,4)=3/4
                     5
                             8                 11        sim(4,2)=3/5                                               9
                                 12
                         6               13
         2       1                            10         Jaccard                           Figure 4: Toy graph.
                                                    14   sim(2,4)=3/6
             3
                         4
         7
                                                         sim(4,2)=3/6
                                 9                                      3.3   Social Network
                                                                           We tested uJaccard against two social network graphs; the
comparison among Jaccard and uJaccard. As show in table                 first is the coauthorship network of scientists [9] the second
3.1 uJaccard provide an unilateral similarity improving the             is the network of Hollywood’s actors1 .
symmetric similarity Jaccard.                                             The first network is the coauthorship network of scientists
                                                                        working on network theory and experiments, compiled by
3.2    Cut a graph                                                      M. Newman [9]. We want to find the top scientists that
   In graph theory, a cut is a partition of the vertices of a           Newman is similar to or that have paradigmatic similarity.
graph into two disjoint subsets. There are many techniques              As shown in table 5 the 3 most of Newman’s paradigmatic
and algorithms to cut a graph, but in some cases there are              similar scientists are Callaway, Strogatz and Holme. On the
graphs that are difficult to cut, due to their symmetric dis-           1
                                                                          The    Internet  Movie    Database:                    ftp://ftp.fu-
tribution of vertices.                                                  berlin.de/pub/misc/movies/database/
      Table 4: Cut a graph 3.2 using uJaccard                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   Table 5: uJaccard calculation from 3.3, in search
           2 levels deep         3 levels deep                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  paradigmatic scientists to Newman
       uJaccard(1,4)     1/3 uJaccard(1,4)     1/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  2 levels deep         3 levels deep      4 levels deep
       uJaccard(1,2)     1/3 uJaccard(1,2)     1/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             Scientist Newman is similar to:
       uJaccard(1,6)     1/3 uJaccard(1,6)     1/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Callaway        0.15 Strogatz       1.63 Strogatz      8.25
       uJaccard(1,7)     1/3 uJaccard(1,7)     2/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Strogatz        0.15 Holme          1.59 Callaway 7.85
       uJaccard(1,9)     1/3 uJaccard(1,9)     2/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Watts           0.15 Kleinberg 1.59 Watts              7.81
       uJaccard(1,11) 1/3 uJaccard(1,11) 2/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     Hopcroft        0.11 Sole           1.59 Kleinberg 7.18
       uJaccard(1,10) 1/3 uJaccard(1,10) 1/3                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               Scientists that are similar to Newman:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Adler           0.33 Aberg          0.50 Aberg         2.50
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Aharony         0.33 Adler          0.66 Adler         14.0
                                                                                                                                                                                                                                                                          BAIESI, M


                                                                                                                                                                                                                                                                     PACZUSKI, M


                                                                                                                                                                                                                                                     CORRAL, A


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Aleksiejuk      0.50 Aharony        0.66 Aharony       14.0
                                                             JEDYNAK, M

                                                                                                                                                                                                                                                                                                                             DEFRAYSSEIX, H
                                                                                                                                                                                                                                                                BASSLER, K
                                                                     SIENKIEWICZ, J
                                                                                               DIAMBRA, L
                                                       FRONCZAK, A


                                                                                                                                                                                                                                                     KOZMA, B              HENGARTNER, N                                                                                                                                                                                                                                                                                                                                                      HARI, R
                                               FRONCZAK, P                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               KNUUTILA, J
                                                                     HOLYST, J                                                                                                                                                                                   KORNISS, G
                                                                                                                                                                                                                                                                                                                            KUNTZ, P
                                                                                       DAFONTOURACOSTA, L                                                                                                                                                                                                                            SALAZARCIUDAD, I                                                                                                                                                                                                                                                                               ILMONIEMI, R
                                                                                                                                                                                                                                                                                                                 THERAULAZ, G
                                                                                                                                     ROTHMAN, D                   SANWALANI, V                                                                                                                                                                                                                                                                                                                                                                                                                                                                          LOUNASMAA, O
                                 LAHTINEN, J                                                                                                                                                                                                                                                               GAUTRAIS, J                         GARCIAFERNANDEZ, J
                                                                                                                                                                                                                                                                                                                                 BUHL, J                                                                                              MENDES, J
                                                                            ALEKSIEJUK, A                                                                                                                                                                                                                                                                                                                         GOLTSEV, A


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Ancelmeyers 0.66 Alava              0.50 Alava         1.00
                                                                                                     AHARONY, A                                        KALAPALA, V                                                                                             TOROCZKAI, Z                                                          FERRERICANCHO, R                                                                                                                                                                                                       DIGARBO, A
                  ONNELA, J                                                            ADLER, J                                                                                 PORTER, M                               BAK, P                                                                                                                                                                                                                                                                                                                                                                                                            HAMALAINEN, M
                                                                                                                                                                                                                                                                                                                     VALVERDE, S
                                                                                                                                                                                                                                     ZALIZNYAK, A                                                            DENEUBOURG, J
                                                                                                                                                                                           GIRVAN, M                                                                                                                                                                                                                                           SAMUKHIN, A
                                                                                                                                                                         CLAUSET, A                                                          SIMONSEN, I                                                                                      MONTOYA,KOHLER,
                                                                                                                                                                                                                                                                                                                                                       J      R
                              KASKI, K                                    ARAUJO, A
                                                                                                                                                                                                   MARTIN, M                                                                                                                                                                                                                                                                                                                                      MEUCCI, R                                                                  SALMELIN, R
                                                                                                                                 MUHAMAD, R                                                                                   ERIKSEN, K                                                                                                                                                                                                                                                                                                                               ALLARIA, E
          CHAKRABORTI, A                                                                                                                                   WARMBRAND, C          MUCHA, P                                                                                                                                                                                                                                                                                                                                                                                                                                                       TIMMERMANN, L
                                          KERTESZ, J                                                                                                 DODDS, P
                                                             BERNARDES, A                                                                                                                                                                                                                                    CANCHO, R          SOLE, R
                                                                                            STAUFFER, D                                                                                                     JIN, E                                                                                                                             JANSSEN, C
                                                                                                                                                                                         PARK, J                                                                                                                                                                                                                      DOROGOVTSEV, S                                                                                                                                                                                                KUJALA, J
             JARISARAMAKI, J                                                                                                                                                                                                                    MASLOV, S
                       KANTO, A                                                                                                                                          MOORE,ANCELMEYERS,
                                                                                                                                                                               C            L
                                                                            COSTA, U                                              SABEL, C
                                                                                                                                                                                                                                                               FRAUENFELDER, H                                                                                                                                                                                                                                                                                                                                                                  GROSS, J
                                                                                                                                                             GASTNER, M                                SCHRAG, S                                                                                                                                                        KEPLER, T                                                                                                                                                                         ARECCHI, F
                                                                                                                                                                                                                            SNEPPEN, K                                                                                                                                                        DANON,
                                                                                                                                                                                                                                                                                                                                                                                     VEGAREDONDO, F L
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         SCHNITZLER, A
                      ALAVA, M                                                                                                                       WATTS, D                                                                                                                                                                                                                                                    RUBI, M                                                                                                                                                                                  FREUND, H
                                                                          MEYERORTMANNS, H                                                                                         NEWMAN, M                                                                                                                                                                                                                                                                                 VAZQUEZPRADA, M
                                         SZABO, G                                                                                                                                                                                                                                                                                                                                   GLEISER, P                                 RAMASCO, J
                                                                                                                                                                                                         LUSSEAU, D                                                                                                                                                                                                                                                                                                                         MENDOZA, C                                         PARK, E             ZAKS, M
                                                                                                                                                              BALTHROP, J                                                                                                                                                                                       SMITH, E
                                                                                                                                                                                                                                           ROSVALL, M                                                                                                                                                                 VRAGOVIC, I


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 Araujo          0.33 Albert         0.10 Albert        0.50
                                                                                                                                                                                                                                                                                                                                                                                                        ARENAS, A                                                                        FLORIA, L                                 VANNUCCHI, F
                                                                                                                                                                                                                                                                                                                                                                                      GIRALT, F                                                                                                                                                         BRAGARD, J                                       TASS, P
                                                                                                                                                                                                                                                                                                                                                                               CABRALES, A                                                                                                                                                                                                                           WEULE, M
                                                                                                                                                                                                                                                                                                                                                                                                                                                              GOMEZGARDENES, J                                                                                                                                            VOLKMANN, J
                                                                                                                                                                             FORREST, S            GHOSHAL, G                                                                                                                                                                                                                                        SOFFER, S                                                                                                   MANCINI, H
                                                                                                                                                                                                                                                                                                                          LILLO, F                                                                 DIAZGUILERA, A                                                        PACHECO, A
                                                                                                               SPATA, A                                                                                                                                     BENNAIM, E                                                                                                                                                                                                                              GOMEZ, J
                                                                                                                                              CALLAWAY, D                                                                                                                                                                                   ANDRADE, J                                                                                                                                                                                  HENTSCHEL, H                          MAZA, D
                                                                                                                                                                                                                                                                                                                                                                           GUIMERA, R                                                                                                                            LOPEZRUIZ, R   CHAVEZ, M                                                              ROSENBLUM, M                            RUDZICK, O
                                                                                                                                                       STROGATZ, S                                                                                                                                                                                                                                                         LOUIS, E
                                                                                                                                                                                                               LEICHT, E               MINNHAGEN, P                                                                                                                                                                                                                                                                                                                                             KURTHS, J       PIKOVSKY, A
                                                                                                                                                                                                                                                                                                            BONANNO, G                                  TURTSCHI, A                                  PASTORSATORRAS, R                                      ECHENIQUE, P
                                                                                                                                                                                YEUNG, M                                         TRUSINA, A                                                                                                                                                                                       BOGUNA, M                                                                                                                                             OSIPOV, G
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                MORENO, Y                                                       BOCCALETTI, S
                                                                                               BUCOLO, M          FORTUNA, L                                                                                                                                                                                                  MOREIRA, A                                                  PEREZ, C
                                                                                                       FORTUNATO, S
                                                                                                                                           USAI, L     HOPCROFT, J                                                                                     CHUNG, J                                                                                                                                                                                                                                                         HWANG, D                                     VALLADARES, D
                                                                                                                                                                                                                                                                                                                                                             CAMACHO, J                                      GUARDIOLA, X
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                VALLONE, A                                             ABEL, H
                                                                                                                               FRASCA, M                                                                                                                                                                                                                                                                                                                         NEKOVEE, M
                                                                                                                                                                         MIROLLO, R                                                                                                                                                                                                                                                                                                                                                  AMANN, A                                                                                                   CHATE, H
                                                                                                                   LAROSA, M                                                                                                                                                                                             MANTEGNA, R                                                                                                               VAZQUEZ, A
                                                                                                                                                                                                                                                                                                                                                                                                   LLAS, M                                                                                                                                                                                           SCHAFER, C
                                                                                                                                                                                                       ERGUN, G                             LEYVRAZ, F                                                                                      AMARAL, L
                                                                                                                                        COSENZA, S MATTHEWS, P                                                                                       CHOI, M                                             EDLING, C                                            MOSSA, S                                                          WEIGT, M                                                                                                                                            ZHOU, C
                                                                                                                                                                                                                                      HOLME, P                             ANTAL, T                                                                                                                                                                                                                                                                                                                                                                               GREGOIRE, G
                                                                                                                        STAGNI, C                                                                                                                                                                                                                                                                            BARRAT, A                                             LEONE, M                                                                        PELAEZ, A
                                                                                                            PLUCHINO, A
                                                                                                                                                                                                                                                                                                                         ABERG, Y                                              BARTHELEMY, M
                                                                                                                                 RAPISARDA, A                                    DARBYDOWMAN, K                                                                                                                                                                                                                                        VESPIGNANI, A                                            RADICCHI, F
                                                                                                     REMONDINI, D                                                                                                                                                                                    LILJEROS, F                                            SCALA, A
                                                                                                                 MARCHIORI, M
                                                                                                                       KLEINBERG, J                                                                                    VAZQUEZ, F          KIM, B                    PARK, H                                                                                                                                                                           ZECCHINA, R
                                                                                                                                                                                                                                                KRAPIVSKY, P                                                                                                                                                                                                                  CASTELLANO, C                CECCONI, F
                                                                                                                                              CRUCITTI, P                                               RODGERS, G                                                                                                                                                                                                                                                                                                                                                                                          MOTTER, A
                                                                                                                                                                                                                                                                                                                                       STANLEY, H                                                      GONDRAN, B
                                                                                                    TIERI, P                                                         TADIC, B                                                                                                                                                                                     PROVERO, P
                                                                                                               FRANCESCHI, C                                                                                                             REDNER, S                                                                                                                                                                                                              VILONE, D
                                                                                                                                                                                                                                                                                                                                                                                                                                                  FLAMMINI, A                                                                                             PECORA, L
                                                                                                                                   LATORA, V                                                                            HAN, S                         HONG, H                                                                                                                                                                                                                                                                                                                                                                   NISHIKAWA, T
                                                                            UPFAL, E                                                                                                                                                                                                                                                                                                                                                                                                          PARISI, D
                                                                                                                                                                                      THURNER, S                                                                                                                                                                                         GUICHARD, E
                                                                                       TOMKINS, A                                                    CARUSO, F                                                                                                       KIM, J              KIM, S                                                                                                                                                                                                           LORETO, V
                                                                                                         VALENSIN, CASTELLANI,
                                                                                                                   S                                                                                                                  YOON, C                                                                                                                                                                                                                                                                                                                                                                          DEMOURA, A
                                                                                                                               G                                                                                                                                                                                                                                             HERRMANN, C                                                                                                                                                                                  BARAHONA, M
                                                                                                                                                                                                                           HUSS, M
                                                                                             RAGHAVAN, P                                                                                                                                                                                                                                                                                                                                                                                                                                                                                            GREBOGI, C
                                                                           BRODER, A                                                                                                                                                                                                                                                                                                                                                                                                                                                                        CARROLL, T
                                                                                                                                                                                                                                                                                                                                                                                                                                                                  MARITAN, A
                                                                                                                 KUMAR, S                                   KINNEY, R                                                                                          KAHNG, B                                                                                                                                                                                                                                                                      HEAGY, J
                                                                  SIVAKUMAR, D                                                                                                                                                             OH, E
                                                                                            RAJAGOPALAN, S                             PORTA, S                                                                                                                                                                                                                                                                                                                                                                                                                                                                 HOPPENSTEADT, FLAI,
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             MITRA, M                                                                                                                               Y
                                                                                                                                                                                                                                                                                 GHIM, C                                                   HONG, D                                                                                                                                                                                                                                                          DASGUPTA, P
                                                                                 KUMAR, R                                                                                                                                                                                                      JUNG, S                                                                                                                                                                          CIEPLAK, M
                                                                                                                                                                                                                                                          GOH, K                                                                                                                                                                                                                                                                                                         FINK, K
                                                                       WIENER, J                                                                                                                                                                RHO, K                                                                                                                                                                                                                                                                                               JOHNSON, G
                                                                                                                                                                                                                                                                                                                                                                                                 CALDARELLI, G                                      RIGON, R
                                                                                                                                                                                                                                                                                                                                                                                                                                                            RODRIGUEZITURBE, I
                                                                                            MAGHOUL, F                                                                                                                                                                                                                                                                   COLAIORI, F
                                                                                                                                                                                                                                                                 KIM, D             LEE, D                                                                                                                                                                                                    HOLTER, N
                                                                                 STATA, R                                                                         ALBERT, I                ALBERT, R                                                                                                                                                                                                                                                                        BANAVAR, J
                                                                                                                                                                                                                              JEONG, H
                                                                                                                                                                                                                                                       PARK, Y                                                                                                                                                                                                                                                                                                                                                                                      YE, N
                                                                                                                                                                                                                                                                                                                                                                                                                 COCCETTI, F                                                                                                                                                                                                     LIU, Z
                                                                                                                                                                                                                                                                                                                                                                                                     CASTRI, M                                          RINALDO, A              FEDROFF, N
                                                                                                    LAWRENCE, S                                                                                                                                                                                                                                                                 SERVEDIO, V                                                                       GIACOMETTI, A
                                                                                                                                                                                                            TOMBOR, B                   ALMAAS, E                                                                                       HERRMANN, H
                                                                                                                                                                                                                                                                                                                                                           CAPOCCI, A                     DELOSRIOS, P
                                                                                                                                                                     NAKARADO, G                       MASON, S FARKAS, I                                     BIANCONI,STROUD,
                                                                                                                                                                                                                                                                        G      D

                                                                                       GILES, C                                                                                                                        SZATHMARY, E                                                                                                                                                                        PETERMANNN,
                                                                                                                                                                                                                                                                                                                                                                                                GARLASCHELLI, D        T
                                                                                                                                                                                                                                                    TU, Y
                                                                                                                                                                                                     NEDA, Z KOVACS, B                                                                                                                                                               BATTISTON, S
                                                                                                                                                                                                         PODANI, J SCHUBERT, A
                                                                                                   FLAKE, G                                                                                                                                                                                                              DEARCANGELIS, L

                                                                               COETZEE, F                                                                                                                                        YOOK, S                                                                                                                                                               MUNOZ, M
                                                                                                                                                                                                          DERENYI, I
                                                                                                                                                                                                                OLTVAI, ZRAVASZ, E                  KULKARNI, R                                                                           ROUX, S GONZALES, M
                                                                                                                                                                          DOVIDIO, F                                                   MACDONALD, P                                                                                                                                                                               DONETTI, L
                                                                                                          GLOVER, E
                                                                                                                                                                                                                                                                                                                                                                                                                    LOFFREDO, M
                                                                                        PENNOCK, D
                                                                                                                                                                                                   VICSEK, T
                                                                                                                                                                                                                                                                                                                                                                                             PIETRONERO, L


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                The results of the search on the network of top 250 actors
                                                                                                                                                                                                                           BARABASI, A
                                                                                                                                                                                                                                              SOMERA, A                                                                                                                    CATANZARO, M
                                                                                                                                                                                                                                                                                                                                                                                                                             TORRES, J
                                                                                                                                                                       MARODI, M                                                                                                             GORMAN, S
                                                                                                                                                                                                WUCHTY, S
                                                                                                                                                                                                                                                                                                                                               SOUSA, A                                                      GARRIDO, P
                                                                                                                                                                                                         BEG, Q                 MONGRU, D
                                                                                                                                                                                  CZIROK, A                            DOBRIN, R
                                                                                                                                                                  SHOCHET, O                                                                    SCHWARTZ, N                                                                                                                            DELUCIA, M
                                                                                                                                                                                                                                                                                                                                                                                                                            MARRO, J
                                                                                                                                                                                                           WILLIAMS, R
                                                                                                                                                                                                                                 MARTINEZ, N                                                                                                                                                            BOTTACCIO, M
                                                                                                                                                                                                                                                            BENAVRAHAM, D
                                                                                                                                                                                        COHEN, I                       DEZSO, Z                                                                                                                           YUSONG, T
                                                                                                                                                                                                                                                                                                                                     LINGJIANG, K
                                                                                                                                                                                                                                                COHEN, R                                                                                                                                    MONTUORI, M
                                                                                                                                                                                                               BERLOW, E
                                                                                                                                                                     BENJACOB, E                                               DUNNE, J
                                                                                                                                                                                                                                                                                                                                                MUREN, L                                                                               DICKMAN, R
                                                                                                                                                                                                                                                HAVLIN, S                 EREZ, K
                                                                                                                                                      SHEFI, O


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                and top 1000 actors, using uJaccard and the paradigmatic
                                                                                                                                                                                                                DEMENEZES, M
                                                                                                                                                                                                                                                              ROZENFELD, A

                                                                                                                                                                     SEGEV, R
                                                                                                                                                 AYALI, A


                                                                                                                                                            GOLDING, I                                                                                           MAKSE, H
                                                                                                                                                                                                     PENNA, T                                   SONG, C

                                                                                                                                                                                                                     MOUKARZEL, C


                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                approach are presented in tables 6 and 7. In table 6 we focus
Figure 5: Coauthorship network of scientists, se-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               in Tom Cruise, we found that Tom Cruise is most similar
lected nodes belong to scientists Newman, Callaway,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             to Julia Roberts but Julia Roberts is most similar to John
Strogatz, Holme.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Travolta, Tom Cruise is third in Julia Roberts’ similarity
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                list. clearly there is not a symmetric similarity among Julia
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Roberts and Tom Cruise. Moreover Julia Roberts is not the
other hand the top 3 scientists that are similar to Newman                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      most similar toward Tom Cruise, the most similar towards
are Adler, Aberg and Aharony. uJaccard has been calcu-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          Tom Cruise is Heath Ledger. Hence this confirm that uJac-
lated in 2,3 and 4 levels deep away from Newman. Newman                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         card helps to identify similarities, particularly asymmetric
is more similar to Strogatz but the most similar scientist to                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   similarities. Table 6 also shows similar scenario among Tom
new Newman is Adler and not Strogatz, even that Strogatz                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        Cruise, Tom Hanks and Joan Allen in the network of top
most similarity is toward Newman.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                               250 actors and actresses, this confirm the usability of uJac-
For the second network, we created the second social net-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       card.
work of Hollywood’s actors, we based on The Internet Movie                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        In Table 7 we use the network of top 250 actors and ac-
Database (note). We download actors and actresses data,
which includes title of movies in which they worked, we also
download a list of top 1000 (nota) and top 250 (nota) actors                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    Table 6: uJaccard similarity among top 1000 and
and actresses. The network is composed of nodes represent-                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      top 250 actor and actresses, searching paradigmatic
ing actors and actresses, and vertices are the movies in which                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  similar actor
those actors worked together. A node is created for every                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                Top 1000 actors, cruise tom is similar to:
person, with their names as the key, when two people are                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             roberts julia
in the same movie; a vertex is created between their nodes.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         roberts julia     0.405 travolta john          0.418
The first network presents 1000 top actors and actresses who                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        hanks tom         0.401 hanks tom              0.412
also work in 41,719 movies with a total 113,478 edges. The                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          jackson samuel    0.399 jackson samuel         0.407
second network presents 250 actors and actresses who work                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           douglas michael 0.397 cruise tom               0.399
in 15,831 movies with a total of 14,096 edges. For this test                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                        eastwood clint    0.393 spacey kevin           0.399
we remove duplicated edges.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                              Top 250 actors, cruise tom is similar to:
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      hanks tom
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    hanks tom         0.430 douglas michael 0.425
   • From a given actor A
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    douglas michael 0.420 cruise tom               0.421
   • We search for actors that actor A is similar to                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                eastwood clint    0.420 jackson samuel         0.418
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    spacey kevin      0.413 travolta john          0.414
   • From the actor A’s similar actor list we get the most                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          jackson samuel    0.410 spacey kevin           0.411
     similar actor B                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           Who is similar to cruise tom:
   • We search for actors that actor B is similar to                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    top 1000 actors            top 250 actors
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    ledger heath      0.490 allen joan             0.496
   • This is done to analyse if actor A and actor B are                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             bacon kevin       0.488 balk fairuza           0.495
     reciprocally similar                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           crowe russell     0.482 bello maria            0.488
                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    gibson mel        0.482 collins pauline        0.487
   • Then we look for actors that are most similar to actor                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         benigni roberto 0.482 aiello danny             0.487
     A

   • We do this on the network top 250 actors and top 1000                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      tresses and we focus on Anthony Quinn and Jack Nicholson.
     actors.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                    We start by searching for actors that Anthony Quinn is sim-
ilar to, then we search for actors that are most similar to An-   5.   ACKNOWLEDGMENTS
thony Quinn in 1 and 2 levels deep. We notice that Anthony          This research was funded in part by ”Fondo para la Inno-
Quinn is most similar to Tom Hanks but most similar ac-           vación, Ciencia y Tecnologı́a - Peru” (FINCyT). We thank
tor to Anthony Quinn is Antonio Banderas, while Anthony           the Universidad Católica San Pablo for supporting this re-
Quinn is the 153th most similar for Tom Hanks. Antonio            search.
Banderas is most similar to Samuel Jackson and not to An-
thony Quinn, while Anthony Quinn is the 53th most similar
for Antonio Banderas. Therefore we could conclude that
                                                                  6.   REFERENCES
Anthony Quinn and Tom Hanks are not symmetric simi-                [1] D. Bridge. Defining and combining symmetric and
lar rather they are asymmetric similar. Table 7 also shows             asymmetric similarity measures. In B. Smyth and
similar scenario for Jack Nicholson.                                   P. Cunningham, editors, Advances in Case-Based
                                                                       Reasoning (Procs. of the 4th European Workshop on
                                                                       Case-Based Reasoning), LNAI 1488, pages 52–63.
Table 7: uJaccard similarity among top 250 actor                       Springer, 1998.
and actresses, searching paradigmatic actor, in 2 and              [2] F. De Saussure and W. Baskin. Course in general
3 levels deep                                                          linguistics. Columbia University Press, 2011.
               Top 250 actors, are similar to:                     [3] R. Duda, P. Hart, and D. Stork. Pattern classification
        quinn anthony                nicholson jack                    2nd ed., 2001.
  hanks tom           0.451 hanks tom              0.436           [4] C. Fellbaum. Wordnet and wordnets. In A. Barber,
  jackson samuel      0.443 eastwood clint         0.429               editor, Encyclopedia of Language and Linguistics,
  lemmon jack         0.443 travolta john          0.417               pages 2–665. Elsevier, 2005.
  cruise tom          0.435 williams robin         0.417           [5] E. W. Holman. Monotonic models for asymmetric
  de niro robert      0.435 douglas michael        0.414               proximities. Journal of Mathematical Psychology,
             Who are similar to quinn anthony:                         20(1):1–15, 1979.
         1 level deep                 2 levels deep                [6] S. Jimenez, C. Becerra, and A. Gelbukh. Soft
  banderas antonio 0.366 banderas antonio 21.866                       cardinality: A parameterized similarity function for
  bardem javier       0.350 benigni roberto        20.758              text comparison. In Proceedings of the First Joint
  martin steve        0.333 burns george           20.565              Conference on Lexical and Computational
  goodman john        0.320 baldwin alec           20.413              Semantics-Volume 1: Proceedings of the main
  allen woody         0.285 mcqueen steve          20.244              conference and the shared task, and Volume 2:
             Who are similar to nicholson jack:                        Proceedings of the Sixth International Workshop on
         1 level deep                 2 levels deep                    Semantic Evaluation, pages 449–453. Association for
  greene graham       0.484 banderas antonio 41.477                    Computational Linguistics, 2012.
  bronson charles     0.482 bacon kevin            41.133          [7] L. Lee, F. C. Pereira, C. Cardie, and R. Mooney.
  brody adrien        0.480 baldwin alec           41.0862             Similarity-based models of word cooccurrence
  bale christian      0.476 bronson charles        40.982              probabilities. In Machine Learning. Citeseer, 1999.
  baldwin alec        0.474 benigni roberto        40.948          [8] F. Lorrain and H. C. White. Structural equivalence of
                                                                       individuals in social networks. The Journal of
                                                                       mathematical sociology, 1(1):49–80, 1971.
                                                                   [9] M. E. Newman. Finding community structure in
4.   CONCLUSION                                                        networks using the eigenvectors of matrices. Physical
   A key assumption of most models of similarity is that a             review E, 74(3):036104, 2006.
similarity relation is symmetric. The symmetry assumption         [10] R. M. Nosofsky. Stimulus bias, asymmetric similarity,
is not universal, and it is not essential to all applications          and classification. Cognitive Psychology, 23(1):94–140,
of similarity. The need for asymmetric similarity is impor-            1991.
tant and central in Information Retrieval and Graph Data          [11] M. Sahlgren. The word-space model: Using
Networks. It can improve current methods and provide an                distributional analysis to represent syntagmatic and
alternative point of view.                                             paradigmatic relations between words in
We present a novel asymmetric similarity, Unilateral Jaccard           high-dimensional vector spaces. 2006.
Similarity (uJaccard), where the similarity among A and B         [12] R. N. Shepard. Representation of structure in
is not same to the similarity among B and C, uJaccard(A,B)             similarity data: Problems and prospects.
!= uJaccard(B,A); this is based on the idea of paradigmatic            Psychometrika, 39(4):373–421, 1974.
association. In comparison to Tversky [13] our approach
                                                                  [13] A. Tversky. Features of Similarity. In Psychological
uJaccard does not need a stimulus bias, whereas in the case
                                                                       Review, volume 84, pages 327–352, 1977.
of Tversky human judgement is needed.
                                                                  [14] J. Weeds and D. Weir. Co-occurrence retrieval: A
We present a series of cases in which we confirmed its use-
                                                                       flexible framework for lexical distributional similarity.
fulness and we validated uJaccard. We could extend uJac-
                                                                       Computational Linguistics, 31(4):439–475, 2005.
card to include weights to improve the asymmetry, we could
also use uJaccard and the paradigmatic approach to clus-          [15] D. R. White and K. P. Reitz. Graph and semigroup
ter Graph data Networks. These are tasks in which we are               homomorphisms on networks of relations. Social
working on.                                                            Networks, 5(2):193–234, 1983.
In conclusion, the proposed uJaccard similarity proved to be
useful despite its simplicity and the few resources used.