<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Heterogeneous Data Co-Clustering by Pseudo-Semantic Affinity Functions</article-title>
      </title-group>
      <contrib-group>
        <aff id="aff0">
          <label>0</label>
          <institution>RAI Centre for Research and Technological Innovation</institution>
          ,
          <addr-line>Corso E. Giambone 68, I10135 Torino</addr-line>
          ,
          <country country="IT">Italy</country>
        </aff>
      </contrib-group>
      <abstract>
        <p>The convergence between Web technology and multimedia production is enabling the distribution of content through dynamic media platforms such as RSS feeds and hybrid digital television. Heterogeneous data clustering is needed to analyse, manage and access desired information from this variety of information sources. This paper defines a new class of pseudo-semantic affinity functions that allow for a compact representation of cross-modal documents relations.</p>
      </abstract>
      <kwd-group>
        <kwd>Co-clustering</kwd>
        <kwd>Pseudo-semantic Model</kwd>
        <kwd>Syntactic Affinity</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>The proliferation of multimedia production tools is enabling the convergence of
different media technologies. The result of this technological breakthrough is the
creation of new relationships between different media objects and the modalities
in which they are generated and consumed.</p>
      <p>Heterogeneous data clustering can be defined as the set of methods that
combine data assets that are different in nature, for example audiovisual and
textual material, and use the obtained information to present additional
knowledge, potentially not discoverable by the analysis of the individual sources. A
type of multi-dimensional heterogeneous data clustering is co-clustering, which
allows simultaneous clustering of the rows and columns of a matrix. Given a set
of M rows in N columns, a co-clustering algorithm generates co-clusters i.e. a
subset of rows which exhibit similar behaviour across a subset of columns, or
vice-versa. A pioneer work is described in [1], where a collection of text data
is modelled as a bipartite graph between documents and words, thus reducing
the clustering problem to a graph partitioning problem. A similar approach
applied to multimedia collections is presented in [7]. The work in [5] describes a
co-clustering algorithm that monotonically increases the preserved mutual
information by intertwining both the row and column clustering at all stages. As
clustering of high-dimensional data can suffer from the curse of
dimensionality problem, Domeniconi et al. [2] proposed a technique to generate clusters in
subspaces spanned by different combinations of dimensions.</p>
      <p>Another problem related to co-clustering is how to set the optimum
similarity function for maximising the cluster separability [3]. This paper presents
an asymmetric vector affinity function on which pseudo-semantic dependency
graphs among data objects are built based on the cross-projection of two sets of
heterogeneous data spaces. Under this framework we implemented a novel
coclustering algorithm for the seamless fusion of multimedia content coming from
various sources. Because of asymmetric nature of the adopted model,
hierarchical relations, i.e. syntactic entailment and equivalence, between data objects, as
well as representativeness degrees of the found groupings can be expressed.</p>
      <p>The paper is organised as follows. §2 describes the mathematical principles
on which the pseudo-semantic affinity measure is built. §3 outlines the functional
blocks performing the proposed co-clustering algorithm. §4 presents the
experimental evaluations conducted to demonstrate the effectiveness and potential
of the framework and the algorithms. §5 provides an example of a real-world
application based on the proposed technology. §6 concludes the paper.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Pseudo-semantic Affinity</title>
      <p>In our work co-clustering is based on the on the concept of pseudo-semantic
affinity (PSA), a generalisation of the concept of semantic similarity originally
proposed in [4]. By this generalisation we aim at increasing completeness without
loosing accuracy. The PSA is loosely inspired by common language when we
speak about relationships among people. There we call affine people who are
related by some relativeness path. Some affinity concepts are symmetrical, e.g.
being brother of, some of the symmetrical affinities are similarities, e.g. DNA
sequence alignment measurements, but there are also generic affinities in which
neither of the two properties hold, e.g. a nephew is not his uncle’s uncle.
Definition 1. Let a, b be two non-negative real vectors of size N. A
pseudosemantic affinity is a function Sm,n(a, b) : !0+N × !0+N ⇒ !0+ of the form:
Sm,n(a, b) =</p>
      <p>
        $a, b%
&amp;a&amp;m&amp;b&amp;n
(
        <xref ref-type="bibr" rid="ref1">1</xref>
        )
where m, n ∈ [0, +∞), and $·% is the inner product.
      </p>
      <p>
        It can be demonstrated that the PSA is a pseudoquasimetric1 (or hemimetric)
for the similarity of a and b in the considered vector space. Also, if a and b are
two feature vectors representing two data objects, then Eq. (
        <xref ref-type="bibr" rid="ref1">1</xref>
        ) models syntactic
properties of the relations between a and b, such as equivalence, entailment (or,
conversely, dependency), and inclusion, thus providing hierarchical relationships
among the considered data sets.
      </p>
      <p>
        Let be Δ = m−n and k ∈ !, then S(·) has the following analytical properties:
Auto-affinity (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ), Co-symmetry (
        <xref ref-type="bibr" rid="ref3">3</xref>
        ), Symmetrisation (
        <xref ref-type="bibr" rid="ref4">4</xref>
        ), Index Shift/Scaling (
        <xref ref-type="bibr" rid="ref5 ref6">5
- 6</xref>
        ), Argument Scaling (
        <xref ref-type="bibr" rid="ref7">7</xref>
        ), Cosine Generalisation (8)
      </p>
      <p>
        Sm,n(a, a) = &amp;a&amp;1−m&amp;a&amp;1−n = &amp;a&amp;2−n−m
(
        <xref ref-type="bibr" rid="ref2">2</xref>
        )
1 See http://en.wikipedia.org/wiki/Metric_(mathematics)#Pseudoquasimetrics
Sm,n(a, b) = Sn,m(b, a)
      </p>
      <p>Sm,n(a, b) = Sn,n(a, b)&amp;a&amp;−Δ = Sm,m(a, b)&amp;b&amp;Δ
Sm+k,n(a, b) = Sm,n(a, b)&amp;a&amp;−k = Sm+k,n+k(a, b)&amp;b&amp;k</p>
      <p>Sm,m(a, b) = Sn,n(a, b)&amp;a&amp;−Δ&amp;b&amp;−Δ</p>
      <p>
        Sm,n(a, ka) = k1−nSm,n(a, a)
Sm,n(a, b) = cos(a, b)&amp;a&amp;1−n−Δ&amp;b&amp;1−n
(
        <xref ref-type="bibr" rid="ref3">3</xref>
        )
(
        <xref ref-type="bibr" rid="ref4">4</xref>
        )
(
        <xref ref-type="bibr" rid="ref5">5</xref>
        )
(
        <xref ref-type="bibr" rid="ref6">6</xref>
        )
(
        <xref ref-type="bibr" rid="ref7">7</xref>
        )
(8)
(9)
(10)
(11)
(12)
(13)
Eq. (
        <xref ref-type="bibr" rid="ref2">2</xref>
        ) implies that the affinity of a vector with itself depends on its norm.
However, it is possible to demonstrate that independence from &amp;a&amp; is achieved
iff Sm,n(a, a) = γ, where γ is a constant value. This implies that &amp;a&amp;(2−m−n) = γ,
which is true if and only if 2−m−n = 0. In this case the PSA is said well-defined.
Eq. (8) states that any PSA is a generalisation of the Cosine similarity measure,
where cos(a, b) = "a,b# = S1,1(a, b).
      </p>
      <p>$a$$b$
2.1</p>
      <sec id="sec-2-1">
        <title>Geometrical Properties of PSA</title>
        <p>Let A = {x1, . . . , xM} ⊆ R0+N be a set of non-negative real-valued vectors
in which we want to solve a constraining problem on the values of the PSA
measurement. A particular constraining problem, is represented by thresholding,
i.e. finding couples (xi, xj) ∈ A ⊆ R0+N such that Sm,n(xi, xj) ≥ α. A way with
which this problem can be solved is constituted by analysing the adjacency
matrix</p>
        <p>A = Acos • F n,Δ = Acos • Φ1−n−Δ • Φ1−nT
where aij = Sm,n(xi, xj), • is the Hadamard product, Acos is the Cosine
similarity adjacency matrix, and φij = &amp;xi&amp;. This re-formulation allows for the
interpretation that an adjacency matrix corresponding to a PSA Sn+Δ,n(xi, xj) is a
distortion of the Cosine similarity adjacency matrix A obtained by
Hadamardmultiplying Acos by the term Φ1−n−Δ • Φ1−nT . We observe that the matrix Φ
is a static feature for A, i.e. it can be calculated once for all, and that the level
of distortion of Acos depends only on n and Δ. To give empirical examples of
these features, let us consider as A a set of randomly generated vectors in the
space [0, 1] × [0, 1] and let us add the element x∗ = (1/2, 1/2) ∈ A. We then
calculate the matrix A of Eq. (9) and find the following subsets of A:
Eqx∗ = {x ∈ A : Sm,n(x∗, x) ≥ α ∧ Sm,n(x, x∗) ≥ α}
Disx∗ = {x ∈ A : Sm,n(x∗, x) &lt; α ∧ Sm,n(x, x∗) &lt; α}
Enx∗ = {x ∈ A : Sm,n(x∗, x) ≥ α ∧ Sm,n(x, x∗) &lt; α}</p>
        <p>
          Ienx∗ = {x ∈ A : Sm,n(x∗, x) &lt; α ∧ Sm,n(x, x∗) ≥ α}
Eqs. (10) and (11) denote, respectively, syntactic equivalence and syntactic
disjointness between vectors x∗ and x. Eq. (12) denotes a syntactic entailment from
x∗ to x. Eq. (13) denotes a (inverse) syntactic entailment to x∗ from x (see [6]
for further detail). Examples of the results for four possible configurations of m,
n, with α = 0.87 are shown in Fig. 1, where subsets Eqx∗ , Disx∗ , Enx∗ , and
Ienx∗ are rendered in white, black, light grey, and dark grey respectively.
A particular case of Eq. (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ) is when m = 2 and n = 0. In this situation the PSA
measure has some interesting properties w.r.t. plain Cosine similarity, which
will be illustrated by some examples in this Section. Empirical investigations
demonstrated the effectiveness of choosing S2,0(·) as affinity function for
realworld clustering problems (see §4). Let us consider the following vectors:
for which it results S1,1(a, b) = cos(a, b) ! 0.77, S2,0(a, b) = 1, S2,0(b, a) = 0.6,
S1,1(c, b) = cos(c, b) ! 0.77, S2,0(c, b) = 2, S2,0(b, c) = 0.3, S2,0(c, a) = 2 and
S2,0(a, c) = 0.5. It can be observed that S1,1(c, b) = S1,1(a, b), S2,0(a, c) =
S2,0(a, a2 ) = 12 S2,0(a, a) and S2,0(a, b) = 12 S2,0(c, b), as defined by the analytical
properties of the PSA measure.
        </p>
        <p>These examples enlighten the following qualitative properties of the
pseudosemantic affinity measurement introduced in this research:
– The pseudo-semantic affinity measure accounts compactly for the
asymmetric relationship existing among vectors in terms of angle and size at the same
time;
– as such, and differently from Cosine similarity, the pseudo-semantic
affinity is sensible to vector scaling, and accounts proportionally to the scaling
coefficient.</p>
        <p>A natural synthesis of the two above example brings to state that vector a is
completely included in vector b, and that vector c is doubly included in vector b.
Furthermore we can say that vector b is included by 35 in vector a. Intuitively, it
means that S2,0(a, b) accounts for the extent to which the vector a is explained
by the vector b.</p>
        <p>As a final example, let us consider a vector e defined as e = a + % with %
orthogonal both to a and b and ρ = $"$
$a$ - 1, i.e. a vector representing an
additive bi-orthogonal noise to a. This example models situations in which there
a
2
are random errors in the calculation of feature vectors due to uncorrelated noisy
input, e.g. spurious terms included in a textual feature vector by an imperfect
HTML text extraction engine, or wrong words in the transcription of spoken
content. The PSA measures between e and b are:</p>
        <p>cos(e, b) = cos(b, e) ≈ cos(a, b) !1 − 12 ρ2"</p>
        <p>S2,0(e, b) ≈ S2,0(a, b) #1 − ρ2$ and S2,0(b, e) = S2,0(b, a)
These results enlighten an important feature distinguishing the pseudo-semantic
affinity measurement from Cosine similarity: addition of bi-orthogonal noise to
a vector always affects Cosine similarity, while it affects semantic affinity only
partially. In other words, the degree of inclusion of a vector a when perturbed
with by bi-orthogonal noise % in another vector b is affected, but the opposite
does not hold, i.e. the degree of inclusion of a vector b in a vector a is not
affected by bi-orthogonal noise added to a.
3</p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Co-clustering Algorithm</title>
      <p>Let Π = {πi}im=1 and B = {βj }jn=1 be two input spaces of data sets, for which a
metric in the hybrid output space H = Π ∪ B is not defined. Our co-clustering
algorithm uses the PSA measure to aggregate the objects from the input spaces
in the output space H. Figure 2 shows all the procedural blocks involved in
the framework. The data from the ancillary space (B) are used as a dynamic
dictionary with which the data from the primary space (Π) are represented.
Each aggregation γi∗ ∈ D∗(α) consists of two components:
– A subset of items of Π which share common attributes in B, as given by
a PSA measurement Sm,n(πa, πb), πa, πb ∈ Π, which operates on the
crossrelevance between items of Π and items of B condensed by the
transformation T into the space A. It is possible to demonstrate that the PSA is
a pseudo-quasi-metric and thus can be used to compare the heterogeneous
data objects in A;
– The attributes of the dictionary B which are relevant to each or all depending
on the chosen criterion for multimodal aggregation (i.e. either intersection
or union) aggregated elements of γi;
The following subsections describes all the constituent parts of the framework.
3.1</p>
      <sec id="sec-3-1">
        <title>Relevance Cross-Projection</title>
        <p>
          Let T : Π × B → [0, 1] be a linking function such that T (πi, βj ) tends to 1 if
βj ∈ B is somewhat relevant to πi ∈ Π, and T (πi, βj ) tends to 0 if βj ∈ B is
not relevant to πk ∈ Π. As an example, T (·) could be the score function that
project the data from the primary space Π to the data of the ancillary space B.
The relations between items from the two spaces can be thus expressed by the
cross-projection matrix T, whose elements are tij = T (πi, βj ).
From the matrix T we evaluate the affinity between all the couples of the primary
space objects. Formally, let πa, πb be two objects from the primary space,
represented by the cross-projection vectors ta = #T (πa, βj )jn=1$, tb = #T (πb, βj )jn=1$,
ta, tb ∈ T. Then the PSA of order m, n from ta to tb is evaluated according to
Eq. (
          <xref ref-type="bibr" rid="ref1">1</xref>
          ), and its value is stored in the affinity matrix E = (eab), where
eab =
 1, if a = b

        </p>
        <p>S(ta, tb), if a 1= b and S(ta, tb) ≥ α
 0, otherwise.
(14)
3.3</p>
      </sec>
      <sec id="sec-3-2">
        <title>Graph Visit and Multimodal Aggregation</title>
        <p>The affinity matrix E can be interpreted as the adjacency matrix of a directed
disconnected graph G = (V, E) whose vertices are the primary space objects
and whose edges are the affinities among them. The co-clustering problem can
be thus reduced to a graph visiting problem. For that, we first symmetrise E, by
considering either connected or not each couple of nodes. then we use the depth
first search (DFS) algorithm, obtaining the set D(α) = {g1, g2, . . . , gK } so that
(i gi = G and ∀i 1= j, gj ) gi = ∅ of disconnected sub-graphs included in G.
Since the DFS algorithm is valid for not oriented graphs, first we generate a not
oriented representation of the original oriented graph using one of the following
criteria:
– Maximum affinity: vertices a, b are linked if max (S(ta, tb), S(tb, ta)) &gt; α
– Average affinity: vertices a, b are linked if 12 (S(ta, tb) + S(tb, ta)) &gt; α
– Minimum affinity: vertices a, b are linked if min (S(ta, tb), S(tb, ta)) &gt; α
Given D(α) we can finally build the set of multimodal aggregations D(α)∗ =
{γ1∗, . . . , γ|∗D(α)|} ⊆ 2H by collecting the elements of B which are relevant to
each γi ∈ D(α). Letting K = |D(α)|, this can be done according to one of the
following criteria.</p>
        <p>Union criterion
(15)
(16)
(17)
(18)
(19)
(20)
∀i : γi∗ = γi ∪ Bi, i = 1, . . . , K</p>
        <p>Bi = ∪|jγ=i|1βij
βij = {b ∈ B : R(πij , b) &gt; η} ,
where η is a parametric threshold. Following this criterion, the function of D(α)∗
is that of integrating the partition D(α) with all the relevant elements of B.
Notice that D(α)∗ is not in general a partition of H = Π ∪ B, because elements
of B may be relevant to elements of Π belonging to different elements of D(α),
and because some elements of B may be not relevant to any element of Π.</p>
      </sec>
      <sec id="sec-3-3">
        <title>Intersection criterion</title>
        <p>∀i : γi∗ = γi ∪ Bi, i = 1, . . . , K</p>
        <p>Bi = ∪|jγ=i|1βij
βij = {b ∈ B : ∀k : R(πik, b) &gt; η} ,
Following this criterion, the function of D(α)∗ is that of integrating the partition
D(α) with the elements of B that are relevant to each element of γi. Notice that
neither in this case D(α)∗ is a partition of H = Π ∪ B.
4</p>
      </sec>
    </sec>
    <sec id="sec-4">
      <title>Experimental Results</title>
      <p>This section provides empirical evidence to demonstrate the benefits of our
coclustering framework and algorithm. In particular, we show that the co-clustering
approach based on PSA measurement overcomes clustering methods based on
symmetric similarity measures. For this purpose, we applied the algorithm to
the real case of aggregating news content from the Web and TV broadcasts, i.e.
the primary space Π and the ancillary space B defined in §3, respectively. For
the evaluations we set up a pool of 25 users, taken from the employers of our
organisation, who were unaware of the rationales of the framework.</p>
      <p>We randomly collected a set of 2, 155 news articles taken from 16 different
newspapers and press agencies websites. The articles were randomly chosen in
order to cover a broad range of subjects such as politics, sports, entertainment,
current events and foreign affairs. Two different feature extraction and clustering
strategies were applied to these data.</p>
      <p>The first strategy adopted the common term frequency-inverse document
frequency (tf/idf) weighting-based vector model and Cosine similarity. Each Web
article was represented by the tf/idf values for the terms occurring in the
document. Clustering was performed using Eq. (8) as distance metric, where we set
n = 1 and Δ = 0. The minimum affinity criteria for different values of α was
used for determining whether two graph vertices were connected or not.</p>
      <p>In the second strategy we first performed part-of-speech tagging , selecting
for each article the set of words tagged as nouns, proper nouns or adjectives.
These word sets were then matched with the speech content of about 24, 000
TV news stories, resulting from the automatic segmentation of 3, 670 newscast
programmes. Each Web article was thus represented by a cross-projection
vector counting the relevance scores of the text documents obtained by the
automatic transcription of each TV story’s speech content w.r.t. the set of words
extracted from the article. This procedure practically implements the relevance
cross-projection described in §3.1. Further detail on the overall process can be
found in [6]. For each couple of Web articles a, b the pseudo-semantic affinities
Sm,n(a, b) and Sm,n(b, a) were calculated across several combinations of m, n.
Clustering was performed applying the maximum affinity criteria for different
values of α and using the union criterion with η = 0.55.</p>
      <p>We evaluated the effectiveness of our framework by measuring the clustering
cohesion (Ω), the clustering coverage (Ψ ) and the clustering efficiency (Λ =
Ω2Ω+ΨΨ ). The clustering cohesion can be related to the precision of the algorithm
and is calculated as the ratio of objects in a cluster that are relevant to the
concept expressed by the cluster and the total number of objects included in the
same cluster averaged on the set of detected clusters. The concept expressed by
the cluster is the concept that the user assumed as such during the assessment of
a specific cluster. This implicitly means that performance should not empirically
be lower than 0.5, since it is expected that in a situation in which the two halves
of a cluster are equally distributed between two concepts the user would choose
one of the two as the reference concept and discard the other. The clustering
coverage can be related to the recall of the algorithm and is calculated as the
ratio between the average number of objects included in a cluster with cardinality
greater than 1 (Avg. Card. * Tot. Clusters) and the total number of objects to be
clustered. The clustering efficiency can be related to the F-measure accounting
for the balance between cohesion of the detected clusters and their expected
cardinality.</p>
      <p>Table 1 shows the performance results obtained for the tf/idf-based clustering
strategy. Table 2 shows the performance results obtained for the relevance
crossprojection-based co-clustering strategy using different symmetric combinations
of the PSA measure. Table 3 compares these two clustering approaches in terms
of clustering efficiency. Observe that the cross-projection-based strategy always
outperforms the tf/idf-based strategy. Table 4 shows the performance results
obtained for the relevance cross-projection-based co-clustering strategy using
generic combinations of the asymmetric PSA measure. Results indicate that
best performers are not always the same. For example for α = 0.5 best cohesion
is achieved for m = 7 and n = 3, while for α = 0.95 it is m = 5 and n = 0.
However, it can be observed that under some combinations of m and n, the
algorithm always achieves good results independently from the chosen value of
α. In the absence of the concrete possibility of full ground-truth testing of all
possible combinations, these or similar global observations may help in finding
the proper choice of PSA parameters for a given information retrieval system.
For example, Table 4 shows that the combination (m, n, α) = (2, 0, 0.85) provides
better balance between the clustering cohesion and coverage for the studied case.
Though these findings are not enough to identify clear conditions and criteria
under which to select the proper parameter combination for an information
retrieval system, we think that at least they clearly show that the generalisation
towards the PSA domain is not useless nor trivial to resolve, and as such needs
further thorough investigation.
In order to provide interactive visualisation of the generated multimodal
aggregations for news retrieval, browsing and editing, we developed a browsing interface
that enables users to track down the structural properties of the selected
aggregation and, at the same time, navigate and manage its contents. This functionality
is inspired by hyperlink networks, e.g. social and citation networks, Internet
communications, and graph theory. The structure of each multimodal
aggregation is represented as an oriented graph where the included Web articles are
the nodes and the relationships among them are the edges, reflecting the
asymmetric structure obtained by applying the PSA-based co-clustering algorithm.
The graphs are browsable through a Java Web interface. Users can select the
visualisation layout to apply for graph display (default is Fruchterman-Reingold
layout - FRL). The nodes are labelled according to the titles of the Web articles,
and are sized based to their representativeness (i.e., in-degree) w.r.t. the main
topic of the aggregation, so that more representative nodes are drawn with larger
shapes and bolder colors. Moving on one node shows the information related to
the node, in terms of node’s title, in-degree and out-degree. Similarly, moving
on one edge shows the edge weight. Starting from one node, the user can select,
traverse and manipulate all its context/detail nodes. Context nodes of a node
v are those nodes z for which S(v, z) ≥ α, while detail nodes are those nodes
w for which S(w, v) ≥ α. This functionality allows users to find semantically
meaningful paths between the news items of the aggregation.</p>
      <p>The use of the FRL graph layout algorithm allows for intuitive visual-aided
data navigation and manipulation. The assumption is that the graph’s topology
can be used to describe the underlying concepts of the aggregation. Cohesive
aggregations would be modelled by dense graphs. On the other hand, less cohesive
aggregations would be modelled by spread graphs. Figures 3 and 4 show two
examples. The former illustrates how the system is able to visually distinguish
two subtopics in the context of a larger topic. The overall topic is about the
summer traffic jams. The two subtopics (one of which is highlighted in yellow)
are aggregations reporting of the same main topic (traffic jams) over two
different consecutive weeks, marked respectively as ”red” and ”black” by the Italian
Transport Ministry. The latter expresses still the same concept in the case of
January 2009 bad weather period. However in this case the highlighted part is
referring to the disasters happened in France and Spain (foreign countries) while
the remain part is referring to Italian casualties.</p>
      <p>Fig. 4. Visual-aided graph-based news navigation - example 2.</p>
    </sec>
    <sec id="sec-5">
      <title>Conclusions and Future Work</title>
      <p>This paper presented a family of pseudo-semantic affinity (PSA) functions for
modelling the relationships between heterogeneous data objects. A co-clustering
framework based on these functions was also presented. The motivation behind
the use of this framework is its capability of representing hierarchical
relationships among data of different nature, which are not directly derivable by using
one-way clustering techniques (e.g. tf/idf vector space model and Cosine
similarity). The experiments demonstrate that PSA measure outperforms the Cosine
similarity in terms of clusters’ cohesion and coverage, thus providing better
precision and recall. It was also demonstrated that the PSA is only partially affected
by noisy data. Future work will explore how to adaptively select the best
combination of PSA parameters for the optimisation of information retrieval processes.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>I. S. Dhillon.</surname>
          </string-name>
          <article-title>Co-clustering documents and words using bipartite spectral graph partitioning</article-title>
          .
          <source>In KDD '01: Proc. of the 7th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining</source>
          , pages
          <fpage>269</fpage>
          -
          <lpage>274</lpage>
          ,
          <year>2001</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <given-names>C.</given-names>
            <surname>Domeniconi</surname>
          </string-name>
          and
          <string-name>
            <given-names>D.</given-names>
            <surname>Gunopulos</surname>
          </string-name>
          and
          <string-name>
            <given-names>S.</given-names>
            <surname>Ma</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Yan</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Al-Razgan</surname>
          </string-name>
          and
          <string-name>
            <surname>D.</surname>
          </string-name>
          <article-title>Papadopoulos Locally Adaptive Metrics for Clustering High Dimensional Data</article-title>
          .
          <source>Data Mining and Knowledge Discovery Journal</source>
          ,
          <volume>14</volume>
          (
          <issue>1</issue>
          ):
          <fpage>63</fpage>
          -
          <lpage>97</lpage>
          ,
          <year>2007</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <given-names>C.</given-names>
            <surname>Domeniconi</surname>
          </string-name>
          and
          <string-name>
            <given-names>J.</given-names>
            <surname>Peng</surname>
          </string-name>
          and
          <string-name>
            <given-names>B.</given-names>
            <surname>Yan Composite</surname>
          </string-name>
          <article-title>Kernels for Semi-supervised Clustering</article-title>
          .
          <source>Knowledge and Information Systems</source>
          ,
          <volume>1</volume>
          -
          <fpage>18</fpage>
          ,
          <year>2010</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <given-names>V.</given-names>
            <surname>Kashyap</surname>
          </string-name>
          and
          <string-name>
            <given-names>A.</given-names>
            <surname>Sheth</surname>
          </string-name>
          .
          <article-title>Semantic and schematic similarities between database objects: a context-based approach</article-title>
          .
          <source>The VLDB Journal</source>
          ,
          <volume>5</volume>
          (
          <issue>4</issue>
          ):
          <fpage>276</fpage>
          -
          <lpage>304</lpage>
          ,
          <year>1996</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <given-names>B.</given-names>
            <surname>Long</surname>
          </string-name>
          ,
          <string-name>
            <given-names>Z.</given-names>
            <surname>Zhang</surname>
          </string-name>
          , and
          <string-name>
            <given-names>P. S.</given-names>
            <surname>Yu</surname>
          </string-name>
          .
          <article-title>Co-clustering by block value decomposition</article-title>
          .
          <source>In KDD '05: Proc. of the 11th ACM SIGKDD Intl. Conf. on Knowledge discovery and data mining</source>
          , pages
          <fpage>635</fpage>
          -
          <lpage>640</lpage>
          ,
          <year>2005</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <given-names>A.</given-names>
            <surname>Messina</surname>
          </string-name>
          and
          <string-name>
            <given-names>M.</given-names>
            <surname>Montagnuolo</surname>
          </string-name>
          .
          <article-title>A generalised cross-modal clustering method applied to multimedia news semantic indexing and retrieval</article-title>
          .
          <source>In WWW '09: Proc. of the 18th Intl. Conf. on World wide web</source>
          , pages
          <fpage>321</fpage>
          -
          <lpage>330</lpage>
          ,
          <year>2009</year>
          .
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <given-names>M.</given-names>
            <surname>Rege</surname>
          </string-name>
          ,
          <string-name>
            <given-names>M.</given-names>
            <surname>Dong</surname>
          </string-name>
          , and
          <string-name>
            <given-names>F.</given-names>
            <surname>Fotouhi</surname>
          </string-name>
          .
          <article-title>Co-clustering image features and semantic concepts</article-title>
          .
          <source>In Proc. of the Intl. Conf. on Image Processing</source>
          , pages
          <fpage>137</fpage>
          -
          <lpage>140</lpage>
          ,
          <year>2006</year>
          .
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>