<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Equidistant Nodes Clustering: a Soft Clustering Algorithm Applied for Synset Induction</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>© Mikhail Chernoskutov</string-name>
          <email>mach@imm.uran.ru</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>© Dmitry Ustalov</string-name>
          <email>dmitry@informatik.uni-mannheim.de</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Proceedings of the XX International Conference</string-name>
          <xref ref-type="aff" rid="aff2">2</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Data and Web Science Group, University of Mannheim</institution>
          ,
          <addr-line>Mannheim</addr-line>
          ,
          <country country="DE">Germany</country>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Ural Federal University, Krasovskii Institute of Mathematics and Mechanics</institution>
          ,
          <addr-line>Yekaterinburg</addr-line>
          ,
          <country country="RU">Russia</country>
        </aff>
        <aff id="aff2">
          <label>2</label>
          <institution>Data Analytics and Management in Data Intensive</institution>
          ,
          <addr-line>Domains” (DAMDID/RCDL'2018), Moscow, Russia, October 9-12, 2018</addr-line>
        </aff>
      </contrib-group>
      <fpage>57</fpage>
      <lpage>62</lpage>
      <abstract>
        <p>In this paper, we present an algorithm for fuzzy graph clustering called Equidistant Nodes Clustering (ENC). In the core of the algorithm there is an assumption that the nodes in a community can reach each other within a limited number of steps. We describe the algorithm and apply it to a natural language processing task of synset induction. The experimental results show that ENC outperforms popular clustering algorithms as according to a gold standard evaluation.</p>
      </abstract>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>1 Introduction</title>
      <p>The problem of graph clustering arises from social
network analysis, natural language processing,
bioinformatics, etc. Given an undirected graph  =
( ,  ), a clustering algorithm outputs a covering for the
set of nodes  that reasonably groups the nodes into
communities or clusters.</p>
      <p>
        A vast diversity of the application-specific
requirements for the cluster structure urges the
development of scalable clustering algorithms for
extremely large and complex graphs. In particular, we
focus on the soft clustering of graph in which the clusters
may overlap. This is very useful for handling such
phenomena as protein-protein interactions [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], lexical
ambiguity [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ], etc. The contribution of this paper is
ENC, a new graph-based algorithm for soft clustering.
Also, we evaluate ENC on a synset induction. Our gold
standard-based experiments show that ENC outperforms
or performs on par with popular graph clustering
algorithms. In some papers, soft clustering is also
referred to as fuzzy clustering; these terms are synonyms.
      </p>
      <p>The rest of the paper is organized as follows.
Section 2 surveys the related work. Section 3 describes
ENC, a soft graph clustering algorithm that uses the
equidistant node finding approach. Section 4 shows the
experimental setup and compares ENC to other
clustering algorithms on the synset induction task.
Section 5 concludes the paper with the final remarks.</p>
    </sec>
    <sec id="sec-2">
      <title>2 Related Work</title>
      <p>
        Perhaps, the most well-known soft graph clustering
algorithm is the clique percolation method (CPM) [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ].
CPM finds communities (or clusters) with the biggest
sizes which consist of fixed-sized adjacent cliques. In
terms of CPM, two cliques are considered adjacent if
they share ( − 1) nodes, where  is number of nodes in
the clique. This algorithm is well-known in
bioinformatics and social network analysis; for most
real-world tasks the value of  is usually 2, 3 or 4.
      </p>
      <p>
        MaxMax [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] is a non-parametric graph clustering
algorithm that outputs overlapping clusters designed
especially for word sense induction problems. This
algorithm constructs an intermediate representation of
the input undirected graph as a directed graph according
to the maximal affinity of the adjacent nodes. Then, it
extracts quasi-strongly connected subgraphs from the
intermediate graph.
      </p>
      <p>
        WATSET is a state-of-the-art overlapping community
detection meta-algorithm [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Like MaxMax, it is
designed especially for addressing the word sense
induction task. WATSET internally uses non-overlapping
community detection algorithms like Markov clustering
(MCL) [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] and Chinese Whispers (CW) [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] by building
and clustering an intermediate sparser undirected graph
representation.
      </p>
    </sec>
    <sec id="sec-3">
      <title>3 Equidistant Node Finding for Fuzzy</title>
    </sec>
    <sec id="sec-4">
      <title>Graph Clustering</title>
      <p>
        In the core of the proposed algorithm, Equidistant
Nodes Clustering (ENC), there is an assumption that the
nodes inside clusters have stronger connectivity between
each other than outside the clusters. As opposed to
Blondel et al. [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ] who treat connectivity of nodes in
cluster as the ratio between number of edges falling
inside the cluster and number of edges that crossing the
cluster border, we consider connectivity in close
connection with the length of the shortest path between
the nodes in a cluster (the shorter the path, the stronger
the connectivity between nodes). Therefore, clusters of
nodes can be seen as  -cliques (a clique in which the
distance between every pair of nodes does not exceed k
hops). Such type of cluster structure is chosen to
distinguish groups of nodes with short pairwise path
length from all other nodes in networks. Hence, we put
forward the following assumption: a subset of nodes is a
cluster if every pair of nodes is reachable within a limited
predefined number of steps.
      </p>
      <p>The idea of ENC can be illustrated using the word
synonymy data from the English Wiktionary on Fig. 1. A
synonymy graph is an undirected graph which nodes are
lexical units (words); a pair of nodes is connected by an
edge iff these words are synonyms. Due to the data
sparsity, some edges are missing, e.g., an edge between
the words “make” and “build”, or an edge between the
words “produce” and “assemble”. However, sets of
words from Fig. 1 can be treated in terms of  -cliques
with  = 2. It means that each word pair in this set is
connected by one edge {make, construct} or two edges
{produce, engender}, {engender, assemble}.</p>
      <p>In this formulation, cluster overlap is possible due to
node sharing between several equidistant node sets
(Fig. 2). In this case, clusters of words indicated by nodes
incident to edges with same color. As seen, the word
“break” falls into three different clusters simultaneously:
two of them are cliques and one is  -clique with  = 2.
We design our algorithm, ENC, in order to detect such
fuzzy  -cliques. Considering the task of synset
induction, we assume that  = 2.</p>
      <sec id="sec-4-1">
        <title>3.1 Overview of the ENC Algorithm</title>
        <p>The ENC algorithm has four steps. At the first step, we
extract all the regions of interest from the graph, i.e.,
twohops neighborhood for each node in the graph. Such a
neighborhood size is chosen to consider all possible 
cliques involving corresponding node. This is crucial for
the reduction the problem size by not considering the
non-informative parts of the graph (outside of two-hops
neighborhood for corresponding node), which is useful
in processing very large graphs. Aforementioned regions
of interest will be used in subsequent steps to extract 
cliques.</p>
        <p>
          At the second step, we extract all the  -cliques from
each ego network (i.e., node and all its two hops
neighborhood). In order to do this, it is possible to reduce
task of finding  -cliques to the well-known task of
finding simple cliques [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ] by topology modification of
each initial ego network. This topology modification
implies reducing all two-hops paths to single hop paths
by applying transitive closure procedure. After that,
cliques that we have in transitive closure-based graph is
at the same time  -cliques in corresponding ego network.
At this point, we have a large number of  -cliques with
small structural differences in the adjacent ego networks
since these ego networks can share most of its nodes.
        </p>
        <p>At the third step, we rank all the  -cliques by
mapping them to a real number. Given a  -clique С and
a ranking function in general form as  ∶ С → ℝ , we
denote as  ∈ ℝ the score of the  -clique. In particular,
 assesses whether the  -clique looks like synset or not,
and its output  number is “degree” of such similarity.
There are a number of ways to determine  depending on
what type of clusters the algorithm should detect.
Considering the task of synset induction, we chose the
following options for determining  function (the list is
not limited by the following functions and may be
augmented in further research):
• Average edge weight in  -clique (in this case 
cliques with higher values of average edge weights
are more likely to be synsets). According to the synset
induction task, edge weight corresponds to the
strength of connection between different nodes.
Hence, the higher edge weight is, the more similar
two nodes are. We refer to this implementation of
function  as rank_w in the results section.
• Transitivity of  -clique (relative number of triangles
in  -clique). It shows how dense is corresponding 
clique. The denser graph is, the higher values of
transitivity it has. We refer to this function as rank_t.
• Density of  -clique (ratio of the number of edges
in  -clique to total number of possible edges).
Actually, it is another way to assess how dense graph
is. We refer to this function as rank_d.
• Also we test some combinations of aforementioned
metrics like multiplication of average edge weight to
transitivity coefficient (rank_wt) and multiplication
of average edge weight to graph density (rank_wd).
The reason of considering these combinations is
because we believe that  -cliques both with higher
values of edge weights and dense inner structure have
much more probability to be “successful” synset
candidates.</p>
        <p>Last thing on this step is sorting all  -cliques by its 
value in decreasing order, thus, obtaining list of  -cliques
sorted by its likelihood to be a proper community.</p>
        <p>At the fourth step, we successively map the  -cliques
from the sorted list to the topology of the input graph
according to its  value. It means that we take the 
clique with highest  value from the list. Then, we mark
all the nodes in the input graph that correspond nodes in
selected  -cliques with id of that clique. Then we take
next  -clique (according to its  value) and mark
corresponding nodes in the input graph again. At some
point, there may be a situation when nodes in selected 
clique has already been marked with another clique ids
in one of the previous steps (i.e., different  -cliques are
overlapped). It is important to consider the overlapping
degree (number of nodes in  -clique that marked by
other  -cliques) of each  -clique with other already
mapped  -cliques, since it may affect the correctness of
the algorithm results. For instance, if  -clique candidate
overlaps with 50 % of all its nodes with another  -clique
that has already been mapped to input graph then it is
highly likely to be excluded (depending of the  -clique
size). The reason is because different synsets tend to
overlap by only small part of its nodes (like in example
depicted on fig. 2).</p>
        <p>After mapping all  -cliques from the sorted list to the
topology of initial graph we treat the successful ones as
the output set of soft clusters (so called synsets in the
synset induction task).</p>
      </sec>
      <sec id="sec-4-2">
        <title>3.2 Pseudocode</title>
        <p>The pseudocode for the proposed overlapping
community detection algorithm, ENC, is presented
below.</p>
        <sec id="sec-4-2-1">
          <title>Input: an undirected graph G</title>
          <p>Output: a set of clusters clusters
1 for each v in G do
2 Gs ← ExtractSubgraph(G,v,2)
3 T ← TransitiveClosure(Gs,2)
4 cliques_local ← {K:Ci∈T,i∈N}
5 cliques_global ← cliques_global ∪
cliques_local</p>
          <p>ENC starts from the procedure of determining 
cliques in the neighborhood of each node. Line 2
describes extraction of a subgraph of  with the center in
node  and its two hops neighborhood. Line 3 describes
the transitive closure for the nodes having the path length
less or equal than two steps. In line 4, simple cliques are
extracted from the transitive closure graph. All the nodes
forming cliques are collected in the shared list in line 5.
In lines 1–6 we extract all possible  -cliques candidates.</p>
          <p>The second step ranks all the  -cliques formed after
transitive closures. In line 7 each  -clique is initialized
with some real number 0 ≤  ≤ 1 using the GetRank
function. GetRank function represent one of the
procedures described in Section 3.1 that takes  -clique
as input and returns its corresponding  -value. It may be,
for instance, density or transitivity of considered 
clique. Then, after forming the list consisting of all
possible  -cliques, we sort it by the descending order of
 in line 8 using SortCliques function.</p>
          <p>Lines 9–11 initialize the  -cliques counter for each
involved node. In real-world graphs, a node does not
participate in a very big number of  -cliques, so we aim
at preventing this behavior with counting them.</p>
          <p>Finally, in the last loop we determine all the
communities by choosing the most important  -cliques.
For that, we apply the CheckOverlap function in line 14
that determines whether the given  -clique is suitable to
be a community, or we should discard it. In terms of
synset induction task, we check compliance with the
conditions in Section 3.1. It means that we discard 
cliques that overlaps in almost all its nodes, and, vice
versa, accept  -cliques that has only small part of its
nodes participating in other  -cliques. In case if  -clique
is suitable, we put it into the final list of clusters in line
15 and mark its nodes in lines 16–18 to prevent
unnecessary overlapping with the  -cliques provided
with the lower values of  . Reasoning about choosing
appropriate conditions for the CheckOverlap functions
are described in Section 3.1.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-5">
      <title>4 Evaluation</title>
      <p>We evaluate ENC by comparing its performance to
several state-of-the-art graph clustering algorithms on a
synset induction task. A synset induction task is a natural
language processing task of clustering a synonymy graph
to group the words having the same meaning into sets of
synonyms—synsets. In this task, the input is a synonymy
graph  = ( ,  ), where  is the vocabulary and  ⊆  2
is a reflexive symmetric relation (synonymy) defined on
the vocabulary. The performance of the algorithms is
compared to the pre-defined gold standard clustering.</p>
      <p>
        We compare ENC to the following algorithms:
CPM [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ], MaxMax [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and WATSET [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. Our evaluation
approach is the same as in [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ] and [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]. We run clustering
algorithms on the same input graph and compare the
result to a gold standard clustering. For that, we
decompose every synset of  words into a set of  (−1 )
2
word pairs and compute pairwise precision, recall and
their harmonic mean also known as F1-score. We used
the same input graph as in [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] which has 77 906 nodes,
71 880 edges, and based on the English Wiktionary. The
edges were weighted using cosine similarity between the
Skip-gram vectors trained on the Google News
corpus [
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]. As the gold standard, we use the same synsets
from
      </p>
      <p>
        WordNet [
        <xref ref-type="bibr" rid="ref5">5</xref>
        ] and BabelNet [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] for the English
found that edges with the weight smaller than 0.3 may be
useless because it is highly unlikely that the nodes
incident to such edges are really synonyms. When this
filter is enabled, we denote it as filt_on. Otherwise, we
denote it as filt_off.
clique and 
details):
      </p>
      <p>Another way to tune the ENC algorithm is to
normalize  -clique ranks according to their sizes.
Normalization should help to keep balance between 
clique size and density of edges inside it when computing
the  -clique rank. When this filter is enabled, we refer to
it as norm_on. Otherwise, we denote it as norm_off.</p>
      <p>Also, we selected the following conditions for 
clique overlapping (where  is number of nodes in 
is number of overlapped nodes, which
participate in other  -cliques as well, see Section 3.1 for
10 , we refer to these conditions
•
•
•
•
= 5,  ≥</p>
      <p>16
as cond_1 in results section.</p>
      <p>⎪⎧</p>
      <p>⎨
⎪
⎩
⎪⎧ 
⎨
⎪
⎩
⎧ 
⎪
⎨ 
⎪
⎩
⎧ 
⎪
⎨ 
⎪
⎩
= 4, 10 ≤  ≤</p>
      <p>12
= 1, 2 ≤  ≤
= 2, 4 ≤  ≤
= 3, 7 ≤  ≤
= 4, 11 ≤  ≤


= 1, 2 ≤  ≤
= 2, 4 ≤  ≤
= 3, 7 ≤  ≤</p>
      <p>13
= 5,  ≥
= 1, 2 ≤  ≤
= 2, 4 ≤  ≤
= 3, 6 ≤  ≤
= 4, 9 ≤  ≤
= 5, 11 ≤  ≤

 = 6,  ≥</p>
      <p>16
= 1, n = 2
= 2, 3 ≤  ≤
= 3, 6 ≤  ≤
= 4, 8 ≤  ≤
= 5, 10 ≤  ≤
 = 6,  ≥
15
15
3
6
3
6
3
5
10
15
5
9
14
9 , as cond_2.
8 , as cond_3.</p>
      <p>7 , as cond_4.</p>
      <p>Aforementioned conditions are selected only to agree
with task of synset induction (i.e., the assumption that
different synset can share only small part of its nodes).
These conditions may be the subject of tuning in the
further research.</p>
      <p>We conclude this subsection with summary table for
all tunable parameters in proposed algorithm and in
reference algorithms. Table 1 shows the combination of
the parameters used in our study.</p>
      <sec id="sec-5-1">
        <title>Method</title>
        <p>ENC</p>
        <sec id="sec-5-1-1">
          <title>WATSET CPM</title>
        </sec>
      </sec>
      <sec id="sec-5-2">
        <title>4.2 Results</title>
      </sec>
      <sec id="sec-5-3">
        <title>Parameters</title>
        <p>Edge filtering ∈ {filt_on, filt_off}
 -clique ranking function 
∈ {rank_w,
rank_t, rank_d, rank_wt, rank_wd}
 -clique rank normalization ∈
{norm_on, norm_off}
Overlapping condition ∈ {cond_1,
cond_2, cond_3, cond_4}
ClusterLocal ∈ {MCL, CWtop,</p>
        <sec id="sec-5-3-1">
          <title>CWlog, CWnolog}</title>
          <p>ClusterGlobal ∈ {MCL, CWtop,</p>
        </sec>
        <sec id="sec-5-3-2">
          <title>CWlog, CWnolog}</title>
          <p>k ∈ {2, 3, 4}
We
carried
out 100
experiments
for
different
configurations of the evaluated algorithms. The results
obtained on WordNet are present in Table 2; the results
obtained on BabelNet are present in Table 3. The values
of precision, recall and F1-score are provided. For brevity
reasons, each table
contains results
only for ten
experiments. Also, only best achieved results of WATSET
and CPM are provided. All rows in tables are sorted in
decreasing order according to the value of F1.</p>
        </sec>
      </sec>
    </sec>
    <sec id="sec-6">
      <title>5 Conclusion</title>
      <p>As according to the experimental results, the
assumptions in our algorithm, ENC, makes it possible to
successfully capture the structure in synonymy graphs
for English. ENC showed the performance comparable
to the state-of-the-art algorithm, WATSET, although using
a different treatment for the overlapping community
detection problem. Further studies include scaling of the
algorithm for larger graphs and applying ENC for synset
induction from other languages.</p>
      <p>Acknowledgments. The reported study was funded
by RFBR according to the research projects No.
16-3700203 мол_a and No. 16-37-00354 мол_а. The research
was supported by the Ministry of Education and Science
of the Russian Federation Agreement
no. 02.A03.21.0006. We used the “Uran” supercomputer
of the Krasovskii Institute of Mathematics and
Mechanics in this study.
0,164
0,382
0,229</p>
      <p>On WordNet, the ENC algorithm shown lower values
of F1 comparing to WATSET. However, ENC shows high
values of precision for the cases of turned off
normalization, graph density-based ranking alongside
with filtering low-weight edges and softest conditions of
 -clique overlapping (cliques can overlap by more
nodes). Despite the fact that CPM shows the highest
precision and MaxMax shows the highest recall, both
algorithms demonstrate imbalance between its precision
and recall, which resulting in lower values of F1.</p>
      <p>On BabelNet, the ENC algorithm shows the best
overall results. The only stable parameter is filtering,
which is always turned on for each testing instance.
Much softer conditions (cond_3 and cond_4) meet much
more often than more stronger conditions (cond_1 and
cond_2). As seen from the Table 3, ENC shows nearly
the same precision of overlapping community detection
with and without normalization as well as for different 
cliques ranking functions. The results for CPM and
MaxMax on BabelNet demonstrate imbalance between
precision and recall, which leads to lower values of F1,
like in the case of WordNet.</p>
      <p>As according to the experimental results obtained on
WordNet and BabelNet, ENC shows generally better
results for synset induction task when algorithm
parameters are oriented on detecting clusters with dense
edge structure (rank_d, rank_w and its combination
rank_wd). The reason for this is a fact that set of words
tend to be synset if it has a lot of connections between its
words. Also, ENC works well when there is
preprocessing in the form of discarding low-weight
edges before computation. This helps filter useless
connections between clusters thus highlighting clusters
inner dense structure. Another important feature for
ENC-based synset induction is soft overlapping
conditions (cond_3 or cond_4). It allows to have much</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          [1]
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Chinese Whispers: an Efficient Graph Clustering Algorithm and Its Application to Natural Language Processing Problems</article-title>
          .
          <source>In: Proceedings of the First Workshop on Graph Based Methods for Natural Language Processing</source>
          , pp.
          <fpage>73</fpage>
          -
          <lpage>80</lpage>
          . TextGraphs-1. Association for Computational Linguistics, New York, NY, USA (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          [2]
          <string-name>
            <surname>Blondel</surname>
            ,
            <given-names>V.D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Guillaume</surname>
            ,
            <given-names>J.-L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lambiotte</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Lefebvre</surname>
          </string-name>
          , E.:
          <article-title>Fast unfolding of communities in large networks</article-title>
          .
          <source>Journal of Statistical Mechanics: Theory and Experiment</source>
          .
          <year>2008</year>
          (
          <volume>10</volume>
          ): P10008. doi:
          <volume>10</volume>
          .1088/
          <fpage>1742</fpage>
          -
          <lpage>5468</lpage>
          /
          <year>2008</year>
          /10/P10008
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          [3]
          <string-name>
            <surname>Bomze</surname>
            <given-names>I.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Budinich</surname>
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pardalos</surname>
            <given-names>P.M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pelillo</surname>
            <given-names>M.</given-names>
          </string-name>
          (
          <year>1999</year>
          )
          <article-title>The Maximum Clique Problem</article-title>
          . In: Du DZ., Pardalos P.M. (eds) Handbook of Combinatorial Optimization. Springer, Boston, MA, USA. doi:
          <volume>10</volume>
          .1007/978- 1-
          <fpage>4757</fpage>
          -3023-
          <issue>4</issue>
          _
          <fpage>1</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          [4]
          <string-name>
            <surname>van Dongen</surname>
            ,
            <given-names>S.</given-names>
          </string-name>
          :
          <article-title>Graph Clustering by Flow Simulation</article-title>
          .
          <source>Ph.D. thesis</source>
          , University of Utrecht (
          <year>2000</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          [5]
          <string-name>
            <surname>Fellbaum</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>WordNet: An Electronic Database</article-title>
          . MIT Press, Cambridge (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          [6]
          <string-name>
            <surname>Hope</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          , Keller, B.:
          <article-title>MaxMax: A Graph-Based Soft Clustering Algorithm Applied to Word Sense Induction</article-title>
          . In: Gelbukh,
          <string-name>
            <surname>A</surname>
          </string-name>
          . (ed.)
          <article-title>CICLing 2013</article-title>
          . LNCS, vol.
          <volume>7816</volume>
          , pp.
          <fpage>368</fpage>
          -
          <lpage>381</lpage>
          . Springer, Heidelberg (
          <year>2013</year>
          ) doi: 10.1007/978-3-
          <fpage>642</fpage>
          - 37247-6_
          <fpage>30</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          [7]
          <string-name>
            <surname>Mikolov</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Sutskever</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Chen</surname>
            ,
            <given-names>K.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Corrado</surname>
            ,
            <given-names>G.S.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dean</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          :
          <article-title>Distributed representations of words and phrases and their compositionality</article-title>
          .
          <source>In: Advances in Neural Information Processing Systems</source>
          , vol.
          <volume>26</volume>
          , pp.
          <fpage>3111</fpage>
          -
          <lpage>3119</lpage>
          . Curran Associates Inc., Harrahs and Harveys,
          <string-name>
            <surname>NV</surname>
          </string-name>
          , USA (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          [8]
          <string-name>
            <surname>Navigli</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Ponzetto</surname>
            ,
            <given-names>S.P.:</given-names>
          </string-name>
          <article-title>BabelNet: The automatic construction, evaluation and application of a wide-coverage multilingual semantic network</article-title>
          .
          <source>Artificial Intelligence</source>
          <volume>193</volume>
          ,
          <fpage>217</fpage>
          -
          <lpage>250</lpage>
          (
          <year>2012</year>
          ) doi: 10.1016/j.artint.
          <year>2012</year>
          .
          <volume>07</volume>
          .001
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          [9]
          <string-name>
            <surname>Palla</surname>
            ,
            <given-names>G.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Derenyi</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Farkas</surname>
            ,
            <given-names>I.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Vicsek</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Uncovering the overlapping community structure of complex networks in nature and society</article-title>
          .
          <source>Nature</source>
          .
          <year>2005</year>
          . Vol.
          <volume>435</volume>
          . P.
          <volume>814</volume>
          -
          <fpage>818</fpage>
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          [10]
          <string-name>
            <surname>Ustalov</surname>
            ,
            <given-names>D.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Panchenko</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Biemann</surname>
            ,
            <given-names>C.</given-names>
          </string-name>
          :
          <article-title>Watset: Automatic Induction of Synsets from a Graph of Synonyms. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)</article-title>
          , pp.
          <fpage>1579</fpage>
          -
          <lpage>1590</lpage>
          . ACL 2017.
          <article-title>Association for Computational Linguistics</article-title>
          , Vancouver, BC, Canada, (
          <year>2017</year>
          ). doi:
          <volume>10</volume>
          .18653/v1/
          <fpage>P17</fpage>
          -1145
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>