<!DOCTYPE article PUBLIC "-//NLM//DTD JATS (Z39.96) Journal Archiving and Interchange DTD v1.0 20120330//EN" "JATS-archivearticle1.dtd">
<article xmlns:xlink="http://www.w3.org/1999/xlink">
  <front>
    <journal-meta />
    <article-meta>
      <title-group>
        <article-title>Transductive Distributional Correspondence Indexing for Cross-Domain Topic Classi cation</article-title>
      </title-group>
      <contrib-group>
        <contrib contrib-type="author">
          <string-name>Alejandro Moreo Fernandez</string-name>
          <email>alejandro.moreo@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Andrea Esuli</string-name>
          <email>andrea.esuli@isti.cnr.it</email>
          <xref ref-type="aff" rid="aff0">0</xref>
        </contrib>
        <contrib contrib-type="author">
          <string-name>Fabrizio Sebastiani</string-name>
          <email>fsebastiani@qf.org.qa</email>
          <xref ref-type="aff" rid="aff1">1</xref>
        </contrib>
        <aff id="aff0">
          <label>0</label>
          <institution>Istituto di Scienza e Tecnologie dell'Informazione Consiglio Nazionale delle Ricerche</institution>
          ,
          <addr-line>56124 Pisa, IT</addr-line>
        </aff>
        <aff id="aff1">
          <label>1</label>
          <institution>Qatar Computing Research Institute Qatar Foundation</institution>
          ,
          <addr-line>PO Box 5825, Doha, QA</addr-line>
        </aff>
      </contrib-group>
      <abstract>
        <p>Obtaining high-quality annotated data for training a classi er for a new domain is often costly. Domain Adaptation (DA) aims at leveraging the annotated data available from a di erent but related source domain in order to deploy a classi cation model for the target domain of interest, thus alleviating the aforementioned costs. To that aim, the learning model is typically given access to a set of unlabelled documents collected from the target domain. These documents might consist of a representative sample of the target distribution, and they could thus be used to infer a general classi cation model for the domain (inductive inference). Alternatively, these documents could be the entire set of documents to be classi ed; this happens when there is only one set of documents we are interested in classifying (transductive inference). Many of the DA methods proposed so far have focused on transductive classi cation by topic, i.e., the task of assigning class labels to a speci c set of documents based on the topics they are about. In this work, we report on new experiments we have conducted in transductive classi cation by topic using Distributional Correspondence Indexing method, a DA method we have recently developed that delivered state-of-the-art results in inductive classi cation by sentiment. The results we have obtained on three popular datasets show DCI to be competitive with the state of the art also in this scenario, and to be superior to all compared methods in many cases.</p>
      </abstract>
      <kwd-group>
        <kwd>Transduction</kwd>
        <kwd>Cross-Domain Adaptation</kwd>
        <kwd>Topic Classi cation</kwd>
        <kwd>Distributional Hypothesis</kwd>
      </kwd-group>
    </article-meta>
  </front>
  <body>
    <sec id="sec-1">
      <title>-</title>
      <p>As a supervised task, automatic Text Classi cation (TC) is constrained to the
availability of high-quality corpora of annotated documents to train a classi er
that will then predict the classes of new documents about a given domain of
knowledge. In the absence of any such labelled collection for the domain of
interest, an additional cost, economical and of time, is to be undertaken in order
to collect and annotate the training examples.</p>
      <p>
        Domain Adaptation (DA) is an special case of Transfer Learning (TL)[
        <xref ref-type="bibr" rid="ref13 ref14">13,14</xref>
        ]
to TC, aimed at reducing, or completely avoiding, such costs, by leveraging
on any di erent, but related, source of knowledge for which a training corpus
exists already. DA thus challenges one core assumption of machine learning,
usually referred to as the iid assumption, according to which the training and
test examples are believed to be drawn from the same distribution. Traditionally,
two di erent scenarios are considered in DA: (i) cross-domain adaptation [
        <xref ref-type="bibr" rid="ref2">2</xref>
        ],
where the source and target domains di er in the topics they are about; and (ii)
cross-lingual adaptation [
        <xref ref-type="bibr" rid="ref15">15</xref>
        ], where the source and target domains are expressed
in di erent languages, although dealing with the same topics. This article focuses
on cross-domain adaptation.
      </p>
      <p>
        The transference of knowledge is typically attempted by uncovering
regularities in examples that are shared across domains. To that aim, a representative
(unlabeled) sample from the target distribution is collected and passed to the
inference method when learning the decision function. Many of the proposed
approaches to cross-domain adaptation so far though, have considered this target
sampling to be, at the same time, the test set, i.e., the (only) set of documents
one might be interested in classifying (see e.g., [
        <xref ref-type="bibr" rid="ref1 ref10 ref17 ref18 ref4 ref9">4,10,17,18,9,1</xref>
        ]). This fact leads
us distinguish between inductive and transductive cross-domain approaches,
depending on the type of inference they carry out3. Accordingly, inductive
crossdomain approaches might be viewed as those aiming at deploying a classi cation
model that generalizes adequately on the target domain, whereas transductive
cross-domain approaches are only requested to deliver an accurate classi cation
of the target set at one's disposal [
        <xref ref-type="bibr" rid="ref16">16</xref>
        ].
      </p>
      <p>
        A general trend one could observe from the related literature of cross-domain
adaptation is that the vast majority of the inductive approaches proposed so
far have been dedicated to sentiment classi cation (namely, assessing positive or
negative labels to opinion-laden texts), while most of the transductive approaches
have instead been tested in topic classi cation4. Be that as it may, two
welldi erentiated folds of techniques for cross-domain adaptation exist, for which it
remains unclear how much e ort it entails for porting one of these methods (say,
an inductive one) to the con guration of the other group (say, to the transductive
setting), or to more general TL con gurations (e.g., when the source and target
tasks are di erent).
3 This distinction is surprisingly overlooked in the related literature though. This is
probably so due to the terminology Pan &amp; Yang used in their popular survey [
        <xref ref-type="bibr" rid="ref13">13</xref>
        ],
where they categorized as transductive all TL approaches in which the source and
target task are the same, but the source and target domains are di erent, while term
inductive was instead attributed the opposite meaning, i.e., when the domains are
the same, but the source and target tasks di er.
4 This seemingly deliberate partition might rather respond to the characteristics of
the most popular benchmark collections available for each problem.
      </p>
      <p>
        This paper is an extension of our former work in [
        <xref ref-type="bibr" rid="ref11 ref5">5,11</xref>
        ], where the
Distributional Correspondence Indexing (DCI) method for cross-domain and
crosslingual adaptation was proposed. DCI creates words embeddings based in the
distributional hypothesis (words with similar meanings tend to co-occur in
similar contexts [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ]), and delivered new state-of-the-art results for inductive
classication by sentiment recently. We now put to test DCI in a di erent problem
setting, i.e., the transductive approach, and report new experiments on a
different task, i.e., cross-domain classi cation by topic. Results con rm that our
Transductive DCI (hereafter TDCI for short), behaves robustly also in this
scenario, delivering comparable classi cation accuracies, and even better in many
cases, to state-of-the-art methods, while still being computationally cheap.
      </p>
      <p>The rest of this paper is organized as follows. Section 2 o ers a brief overview
of related work. In Section 3 we describe our proposal. Section 4 reports the
results of the experiments we have conducted, while Section 5 concludes.
2</p>
    </sec>
    <sec id="sec-2">
      <title>Related work</title>
      <p>
        In this section, we brie y review main related methods in the literature of domain
adaptation. We will restrict our attention to transductive approaches proposed
for topic classi cation. The interested reader can check [
        <xref ref-type="bibr" rid="ref11">11</xref>
        ] for a discussion
focusing on inductive methods for sentiment classi cation, and [
        <xref ref-type="bibr" rid="ref13 ref14">13,14</xref>
        ] for a more
general overview on transfer learning methods.
      </p>
      <p>
        Transductive Support Vector Machines (TSVMs) for text classi cation where
proposed in [
        <xref ref-type="bibr" rid="ref8">8</xref>
        ] as an extension of Suport Vector Machines (SVMs) aiming at
minimizing the misclassi cation error in a concrete test set, assuming it
accessible when inducing the decision function. Even though it was not particularly
designed to deal with DA problems, it has often been reported as a baseline in
the related literature. The Co-Clustering approach [
        <xref ref-type="bibr" rid="ref4">4</xref>
        ] uses clusters of words and
documents as a bridge to propagate the class structure from the source domain
to the target domain. The key idea, is to use the class labels in the source
domain as a constraint for the word clusters, that are shared among both domains.
The Matrix Tri-factorization [
        <xref ref-type="bibr" rid="ref18">18</xref>
        ] approach follows a somewhat similar
assumption, based on the belief that associations between word clusters and classes
should remain consistent between the source and target domain. The method
thus performs two matrix tri-factorizations, for the source and target domains,
in a joint optimization framework subject to sharing the association between
word clusters and classes. Topic-bridged Probabilistic Latent Semantic
Analysis [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ] is an extension of Probabilistic Latent Semantic Analysis (PLSA) that
models the relations between (observed) documents and terms thorough a set
of (hidden) latent features, hypothesizing those latent features to be consistent
across domains. Along these lines, Topic Correlation Analysis [
        <xref ref-type="bibr" rid="ref9">9</xref>
        ] establishes a
distinction between latent features that could be shared between domains, and
those that are rather domain speci c. A joint mixture model is rst used to
cluster word features into shared and domain-speci c topics. Then, a mapping
between the domain-speci c topics from both domains is induced from a
correlations analysis, that serves to derive a shared feature space where the transference
of supervised knowledge is facilitated. Finally, the Cross Domain Spectral
Classi cation [
        <xref ref-type="bibr" rid="ref10">10</xref>
        ] approach formulates the knowledge transference thorough spectral
classi cation, via optimizing an objective function aimed at regularizing the
supervised information contained in the source domain in order to bring to bear
improved consistence with respect to the target domain structure. In [
        <xref ref-type="bibr" rid="ref1">1</xref>
        ] a
probabilistic method based on Latent Dirichlet Allocation (LDA) is proposed. The
method jointly optimizes the marginal and conditional distributions following
a EM algorithm, while also di erentiating between the domain-dependent and
domain-independent latent features.
3
      </p>
      <p>Transductive Distributional Correspondence Indexing
Loosely speaking, the main challenge one has to face in domain adaptation is to
deal with the discrepancy of words relevance that comes about by its particular
role in the source domain, and that is not generalizable to the target domain.
That is to say, most important words for the source domain, on which the decision
surface is likely to hinge upon, are likely not helpful enough in discriminating
the positive and negative regions in the target domain.</p>
      <p>
        DCI builds upon (i) the concept of pivots terms [
        <xref ref-type="bibr" rid="ref3">3</xref>
        ], namely, frequent and
discriminant words which behave expectedly in a similar way in the source and
target domains; and (ii) the distributional hypothesis [
        <xref ref-type="bibr" rid="ref6">6</xref>
        ], which states that terms
with similar meanings tend to co-occur in similar contexts. Our idea is to model
each term as a word embedding where each dimension quanti es its relative
semantic similarity to a xed set of pivots. The expectation is that words with
equivalent role across domains might end up lying close to each other in the new
embedding space, as they are expected to present similar distributions to the
pivots in their respective knowledge domains. Take as an example a classi er by
genre (sci- , drama, horror, romantic, ...), that is trained with documents from a
source domain of lms, but intended to classify documents from a target domain
of books. Note that role equivalences between, e.g., `director'-`writer',
`duration'`length', or ` lm'-`book' might be uncovered by inspecting their co-occurrence
distribution to some pivots like `plot', `character', or `story', which are expected
to be approximately invariant across domains. As a result, the surface decision
boundary found for the source domain will likely generalize well in the target
domain. DCI is an instantiation of this model that implements a pivot selection
strategy (section 3.2), and quanti es the similarity of meaning of two words
thorough a Distributional Correspondence Function (DCF, section 3.3).
3.1
      </p>
      <sec id="sec-2-1">
        <title>Preliminaries</title>
        <p>Given a source (S) and a target (T ) domain of documents, with di erent marginal
distributions, for which a training set of annotated documents T rS exists
exclusively for S, cross-domain classi cation by topic might be formalized as the task
of assigning class labels C = fc1; : : : ; cjCjg to target documents in a test set T eT
by means of a classi er trained on T rS which is also given access to a sample
of (non annotated) documents UT from T (and, optionally, to a sample US from
S), where classes in C represent prede ned topics of discussion, such as e.g.,
\politics", \economics", or \computers".</p>
        <p>We will here restrict our attention to the binary case C = fc; cg, that is,
deciding whether a document discusses a given topic c, or not. We will also
adhere to the aforementioned \transductive setting", in which the sample of
target documents given to corresponds also to the unique set of documents we
might be interested in classifying, i.e., UT = T eT , and there is not any sample
US from the source collection other than the training set T rS .
3.2</p>
      </sec>
      <sec id="sec-2-2">
        <title>Pivot Selection</title>
        <p>
          According to [
          <xref ref-type="bibr" rid="ref3">3</xref>
          ], pivots are frequent and discriminant terms that behave similarly
in both the source and target domain. Regarding frequency, and as was done in
[
          <xref ref-type="bibr" rid="ref15">15</xref>
          ], we restrict the set of pivot candidates to those which occur in at least
= 30 document in the source and target corpora. Following [
          <xref ref-type="bibr" rid="ref15 ref2">2,15</xref>
          ], we use the
mutual information between the term and the classes fc; cg to asses the degree
of discrimination of a given feature in the training set (i.e., exclusively in the
source domain). Finally, we apply the cross-consistency heuristic de ned in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ]
which allows the model to be aware of the prevalence5 drift across the source
and target domains.
3.3
        </p>
      </sec>
      <sec id="sec-2-3">
        <title>Distributional Correspondence Functions</title>
        <p>
          DCFs are a family of real-valued functions that quantify the deviation of
correspondence between two terms with respect to the expected correspondence
due to chance. Di erent interpretations of correspondence could be plugged into
the de nition, leading to di erent implementations of DCF. In this work, we
will restrict our attention to the cases in which correspondence is measured as
the cosine similarity (Eq. 1), the Asymmetric Mutual Information (AMI { Eq.
2), the Pointwise Mutual Information (PMI { Eq. 3), and linear (Eq. 4), as
discussed in [
          <xref ref-type="bibr" rid="ref11">11</xref>
          ].
        </p>
        <p>Correspondence between two terms f i and f j in a given domain is measured
by comparing their context distribution vectors f i and f j . Context distribution
vectors are extracted from the co-occurrence matrix of the domain, and model
how a term relates to a set of contexts (e.g., documents).</p>
        <p>Cosine(f i; f j ) =
hf i; f j i
kf ikkf j k
ppipj
AM I(f i; f j ) = (f i; f j )</p>
        <p>X</p>
        <p>X
x2ffi;fig y2ffj;fjg</p>
        <p>P (x; y)
P (x; y) log2 P (x)P (y)
5 The prevalence of a term is typically de ned as the proportion of documents in which
a term appears in a corpus.
(1)
(2)</p>
        <p>Where pi denotes the prevalence (proportion of occurrences in the total
number of contexts) of feature f i, P (x) denotes the probability that feature x occurs
in a random context, P (x) is the probability that x does not occur in a
random context, and (x; y) is a function that changes the sign when x and y are
negatively correlated 6.
3.4</p>
      </sec>
      <sec id="sec-2-4">
        <title>Word Embeddings and Document Representation</title>
        <p>
          The feature representations of DCI might be though as a generalization of
CoOccurrence vectors (see, e.g., [
          <xref ref-type="bibr" rid="ref12">12</xref>
          ]), where the co-occurrence metric is any of the
DCF, and the context window is set to the document length. Once a set of m
pivots P = fp1; p2; : : : ; pmg and a DCF have been selected, each term f in
the source and target domains is modeled as an m-dimensional vector
        </p>
        <p>P (f ijf j )
(3)
(4)
(5)
(6)
!f = ( (f ; p1); (f ; p2); : : : ; (f ; pm))
where f and pi are the context distribution vectors of the term f and the
ith pivot, respectively. Note that, because we are operating in the
transductive regime, the context distribution vectors f and pi are taken from the
cooccurrence matrix of the training set when modeling the source terms, and from
the co-occurrence matrix of the test set when modeling the target terms 7.</p>
        <p>Finally, train and test documents are indexed in the embedding space via a
weighted sum of all word embeddings of the terms composing the documents.
That is, document di is represented as the m-dimensional vector
!
d i =</p>
        <p>X wij !fj
fj2di
where wij is the weight of term fj in document di (we used the standard
cosinenormalized tf idf ), and !fj is the word embedding for term fj .</p>
        <p>
          Once the training and test matrices have been represented in the embedding
space, the classi er is learned. As the classi er we adopt the Transductive SVM
[
          <xref ref-type="bibr" rid="ref8">8</xref>
          ], that also takes into account the structure of the test data while modeling
the decision function. We used the linear-kernel which have consistently delivered
good accuracy in text classi cation so far [
          <xref ref-type="bibr" rid="ref7">7</xref>
          ].
6 That is, when the true positive rate plus the true negative rate as obtained from the
4-cell contingency table of x and y is lower than 0.
7 In this case, and di erently from [
          <xref ref-type="bibr" rid="ref11 ref5">5,11</xref>
          ] we do not apply uni cation to the common
features, because during preliminary tests we observed most of the features to appear
simultaneously in the source and target domains, causing thus most of the words in
the vocabulary to get uni ed. This contradicts the rationale behind the uni cation
process, originally proposed to consolidate the representations of shared words across
languages, such as proper nouns in cross-lingual adaptation.
(7)
4
        </p>
      </sec>
    </sec>
    <sec id="sec-3">
      <title>Experiments</title>
      <p>In this section, we report on the experiments we run to test the e ectiveness of
our TDCI method in cross-domain topic classi cation.</p>
      <p>As the evaluation measure we adopt standard accuracy, i.e., the ratio between
the number of correctly labeled documents over the total number of documents
sued to the classi er, i.e.,</p>
      <p>Acc =</p>
      <p>T P + T N</p>
      <p>T P + F P + F N + T N
where T P , T N , F P , and F N stand for the numbers of true positives, true
negatives, false positives, and false negatives, respectively. Note this choice is
perfectly valid given that all datasets are approximately balanced with respect
to the positive and negative classes,</p>
      <p>In order to gain in reproducibility and to facilitate a comparison of
performance with other methods, we consider most commonly used benchmarks in
the related literature, including the Reuters-21578, SRAA, and 20 Newsgroups
collections. Aside from being well-known benchmarks collections in the reign of
topic classi cation, their class codes are organized hierarchically, and some
representative subsets could thus be taken in order to generate new benchmarks
that are well-suited for domain adaptation as well8.</p>
      <p>Reuters-21578: is one of the most used collections in TC research.
Reuters21578 is a set of 21,578 news stories appeared in the Reuters newswire in 1987.
Documents in the collection are assigned to 5 top classes, among which, orgs,
people, and places classes have commonly been selected in other works for
experimenting in domain adaptation, leading to three datasets, orgs vs people, orgs vs
places, and people vs places ; a preprocessed version could be found in9.</p>
      <p>SRAA: consists of 73,218 Usenet posts about simulated autos, simulated
aviation, real autos, and real aviation, accessible in10. In this dataset, the pairs
of classes real vs simulated, and auto vs aviation have been used to instantiate
two di erent domain adaptation problems. For example, in real vs simulated,
documents about aviation were used as the source domain, while documents
about autos constitute the target domain; the binary decision problem consists
thus in discerning between real and simulated topics. In a similar vein, auto
vs aviation is created, where documents about simulated vehicles act as source
domain examples, and documents about real vehicles as the target ones.
8 This procedure consists in taking two top classes, say, A and B, with subclasses
A:1 : : : A:x and B:1 : : : B:y, respectively. Then, two disjoint folds are taken for the
source (S) and target (T ) sides in each class; e.g., AS = [1 i&lt;lA:i and AT =
[l i xA:i, represent the source and target splits for the class A. Finally, the training
and test sets are de ned as T rS = AS [ BS and T eT = AT [ BT , where documents
in A are labeled as positives, and documents in B are labeled as negatives.
9 http://www.cse.ust.hk/TL/dataset/Reuters.zip
10 http://people.cs.umass.edu/~mccallum/data/sraa.tar.gz</p>
      <p>Dataset ISVM TSVM CoCC TPLSA CDSC MTrick TCA PSCC Linear AMI PMI Cosine
orgs vs places 0.721 0.740 0.680 0.653 0.682 0.768 0.730 0.742 0.792 0.787 0.797 0.793
orgs vs people 0.737 0.793 0.764 0.763 0.768 0.808 0.792 0.807 0.782 0.810 0.799 0.805
people vs places 0.595 0.614 0.826 0.805 0.798 0.690 0.626 0.690 0.700 0.703 0.676 0.700
real vs simulated 0.737 0.920 0.880 0.889 0.812 - - - 0.962 0.966 0.967 0.958
auto vs aviation 0.799 0.949 0.932 0.947 0.880 - - - 0.974 0.972 0.978 0.976
comp vs sci 0.699 0.842 0.870 0.989 0.902 - 0.891 0.900 0.906 0.879 0.910 0.858
rec vs talk 0.722 0.971 0.965 0.977 0.908 0.950 0.962 0.962 0.978 0.981 0.978 0.959
rec vs sci 0.803 0.945 0.945 0.951 0.876 0.955 0.879 0.955 0.974 0.969 0.974 0.959
sci vs talk 0.783 0.913 0.946 0.962 0.956 0.937 0.940 0.947 0.946 0.943 0.943 0.931
comp vs rec 0.782 0.903 0.958 0.951 0.958 - 0.940 0.958 0.913 0.909 0.914 0.909
comp vs talk 0.955 0.909 0.980 0.977 0.976 - 0.967 0.967 0.932 0.935 0.944 0.913
Reuters (ave.) 0.684 0.716 0.757 0.740 0.749 0.755 0.716 0.746 0.758 0.767 0.757 0.766
SRAA (ave.) 0.768 0.935 0.906 0.918 0.846 - - - 0.968 0.969 0.973 0.967
20News (ave.) 0.799 0.914 0.944 0.968 0.929 - 0.930 0.948 0.942 0.936 0.944 0.922</p>
      <p>
        20 Newsgroups: is a publicly available11 text collection of approximately
20,000 Usenet discussion groups, which are nearly evenly partitioned across 20
di erent newsgroups. Following the common practice in the related literature,
we restrict our attention to the 4 most frequent top classes in the dataset (comp,
sci, rec, and talk ). The data is then split by their sub-classes for certain pairs of
top-classes. We generated 6 di erent domain adaptation problems by following
the same classes split as de ned in [
        <xref ref-type="bibr" rid="ref17">17</xref>
        ].
      </p>
      <p>
        We compare the performance of TDCI12 with the following baselines
(discussed in section 2): Co-Clustering (CoCC{[
        <xref ref-type="bibr" rid="ref4">4</xref>
        ]), Topic-bridged PLSA
(TPLSA[
        <xref ref-type="bibr" rid="ref17">17</xref>
        ]), Cross Domain Spectral Classi cation (CDSC{[
        <xref ref-type="bibr" rid="ref10">10</xref>
        ]), Matrix Trifactorization
(MTrick{[
        <xref ref-type="bibr" rid="ref18">18</xref>
        ]), Topic Correlation Analysis (TCA{[
        <xref ref-type="bibr" rid="ref9">9</xref>
        ]), and Partially Supervised
Cross-Collection LDA (PSCC{[
        <xref ref-type="bibr" rid="ref1">1</xref>
        ]). Additionally, we report experiments on a
lower bound baseline that simply classi es the target documents using an
Inductive SVM13 trained on the source domain without carrying out any sort of
adaptation (ISVM{[
        <xref ref-type="bibr" rid="ref7">7</xref>
        ]), and its transductive version (TSVM{[
        <xref ref-type="bibr" rid="ref8">8</xref>
        ]).
      </p>
      <p>
        Table 1 reports the results of our experiments in terms of accuracy for the
Reuters-21578, SRAA, and 20 Newsgroup datasets. Reported score values for
CoCC, TPLSA, CDSC, MTrick, TCA, and PSCC where taken from the original
papers. Columns Linear, AMI, PMI, and Cosine correspond to our TDCI with
di erent DCFs (see Section 3.3); in this experiment, we set the number of pivots
to 100, following [
        <xref ref-type="bibr" rid="ref11 ref5">5,11</xref>
        ].
11 http://qwone.com/~jason/20Newsgroups/
12 The code implementing or method is integrated in JaTeCS, and available in https:
//github.com/jatecs/jatecs. A stand-alone version could also be accessed in http:
//hlt.isti.cnr.it/dciext/
13 We used the popular Joachims' implementation in http://svmlight.joachims.org/
      </p>
      <p>Overall, the results of these experiments indicate that TDCI is competitive
in transductive cross-domain classi cation by topic. In average, all con
gurations of TDCI outperform all baselines in terms of accuracy in Reuters-21578
and SRAA datasets. In 20 Newsgroup TDCI performs comparably in average,
without surpassing though the best averaged score obtained by TPLSA. When
AMI is used as the DCF, TDCI beats all baselines in 6 out of 11 cases, obtaining
two best global results (orgs vs people, and rec vs talk ), and the best average in
Reuters-21578. The PMI function also obtained promising results, surpassing all
baselines in 5 out of 11 cases, with four best global results, and the best average
in SRAA. It is also noticeably that TDCI outperforms all comparison methods
in SRAA, in all cases irrespective of the DCF, and by a signi cant margin.</p>
      <p>When TDCI did not achieve the best score, its performance could still be
considered aligned with respect to the baselines, with the sole exceptions of comp
vs rec and comp vs talk cases, where TDCI performed comparatively worse. This
could be explained by observing the relative performance of ISVM and TSVM.
In both cases the improvements in accuracy brought about by transduction
represent the lowest ones in the entire 20 Newsgroup benchmark; note it even
degrades signi cantly in comp vs talk. Such observation prompted us to confront
TDCI with its inductive version (noted IDCI for consistency). IDCI outperforms
TDCI only in these two cases (and performed signi cantly worse in the rest of
cases, that we omit for the sake of brevity). More precisely, accuracy scores
delivered by IDCI ranged from 0.943 (Cosine) to 0.961 (Linear) in comp vs
rec (which is now comparable to the baselines performances), and from 0.983
(PMI) to 0.992 (Linear) in comp vs talk (which surpasses the best accuracy score
of 0.980 attributed to CoCC). This sensible variation in performance suggests
further investigations are needed in order to shade light on deciding whether it
is advisable to maintain an inductive strategy even when the structure of the
test set is observable.</p>
      <p>We also investigated the impact in performance due to variations in the
number of pivots, i.e., the dimensionality of the embedding space. Table 1 shows
two representative plots that summarizes well the casuistic we found in our
experiments.</p>
      <p>The plot for rec vs talk exempli es the most frequent case in our experiments,
in which accuracy improves smoothly as more pivots are selected. The case of
comp vs talk exempli es a less frequent case in which the performance is somehow
unstable. These uctuations seem to depend on the number of pivots, and on the
DCF at hand. For example, the performance trend is increasing for AMI and
decreasing for the Cosine DCFs; while PMI and Linear delivered competitive
results even with only 30 pivots. This result seem to indicate the order in which
pivots should be selected could, in some cases, depend also on the DCF under
consideration, something we plan to clarify in future research.</p>
      <p>REC VS TALK</p>
      <p>COMP VS TALK
In this article, we have explored the performance e ciency of Distributional
Correspondence Indexing, a method originally proposed for cross-domain and
cross-lingual inductive classi cation by sentiment, in a di erent problem setting,
i.e., by considering the topic classi cation task and the transductive inference.
Results show our transductive version, dubbed TDCI, to be comparable and even
better in many cases to the state of the art on three extensively used datasets.
Our experiments also revealed more investigations are still required in order to
automatically determine the optimal number of pivots to select, so as to nd
more stable distributional correspondence functions.</p>
    </sec>
  </body>
  <back>
    <ref-list>
      <ref id="ref1">
        <mixed-citation>
          1.
          <string-name>
            <surname>Bao</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Collier</surname>
            ,
            <given-names>N.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Datta</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          :
          <article-title>A partially supervised cross-collection topic model for cross-domain text classi cation</article-title>
          .
          <source>In: Proceedings of the 22nd ACM international conference on Conference on information &amp; knowledge management</source>
          . pp.
          <volume>239</volume>
          {
          <fpage>248</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2013</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref2">
        <mixed-citation>
          2.
          <string-name>
            <surname>Blitzer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dredze</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <string-name>
            <surname>Biographies</surname>
          </string-name>
          , Bollywood, boom
          <article-title>-boxes and blenders: Domain adaptation for sentiment classi cation</article-title>
          .
          <source>In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL</source>
          <year>2007</year>
          ). pp.
          <volume>440</volume>
          {
          <fpage>447</fpage>
          . Prague, CZ (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref3">
        <mixed-citation>
          3.
          <string-name>
            <surname>Blitzer</surname>
            ,
            <given-names>J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>McDonald</surname>
            ,
            <given-names>R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Pereira</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          :
          <article-title>Domain adaptation with structural correspondence learning</article-title>
          .
          <source>In: Proceedings of the 4th Conference on Empirical Methods in Natural Language Processing (EMNLP</source>
          <year>2006</year>
          ). pp.
          <volume>120</volume>
          {
          <fpage>128</fpage>
          .
          <string-name>
            <surname>Sydney</surname>
            ,
            <given-names>AU</given-names>
          </string-name>
          (
          <year>2006</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref4">
        <mixed-citation>
          4.
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Co-clustering based classi cation for outof-domain documents</article-title>
          .
          <source>In: Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining</source>
          . pp.
          <volume>210</volume>
          {
          <fpage>219</fpage>
          .
          <string-name>
            <surname>ACM</surname>
          </string-name>
          (
          <year>2007</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref5">
        <mixed-citation>
          5.
          <string-name>
            <surname>Esuli</surname>
            ,
            <given-names>A.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Moreo</surname>
            <given-names>Fernandez</given-names>
          </string-name>
          ,
          <string-name>
            <surname>A.</surname>
          </string-name>
          :
          <article-title>Distributional correspondence indexing for crosslanguage text categorization</article-title>
          .
          <source>In: Proceedings of the 37th European Conference on Information Retrieval (ECIR</source>
          <year>2015</year>
          ). pp.
          <volume>104</volume>
          {
          <fpage>109</fpage>
          .
          <string-name>
            <surname>Wien</surname>
          </string-name>
          , AT (
          <year>2015</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref6">
        <mixed-citation>
          6.
          <string-name>
            <surname>Harris</surname>
            ,
            <given-names>Z.S.:</given-names>
          </string-name>
          <article-title>Distributional structure</article-title>
          .
          <source>Word</source>
          <volume>10</volume>
          (
          <issue>23</issue>
          ),
          <volume>146</volume>
          {
          <fpage>162</fpage>
          (
          <year>1954</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref7">
        <mixed-citation>
          7.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Text categorization with support vector machines: Learning with many relevant features</article-title>
          .
          <source>In: Proceedings of the 10th European Conference on Machine Learning (ECML</source>
          <year>1998</year>
          ). pp.
          <volume>137</volume>
          {
          <fpage>142</fpage>
          . Chemnitz, DE (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref8">
        <mixed-citation>
          8.
          <string-name>
            <surname>Joachims</surname>
            ,
            <given-names>T.</given-names>
          </string-name>
          :
          <article-title>Transductive inference for text classi cation using support vector machines</article-title>
          .
          <source>In: Proceedings of the 16th International Conference on Machine Learning (ICML</source>
          <year>1999</year>
          ). pp.
          <volume>200</volume>
          {
          <fpage>209</fpage>
          .
          <string-name>
            <surname>Bled</surname>
            ,
            <given-names>SL</given-names>
          </string-name>
          (
          <year>1999</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref9">
        <mixed-citation>
          9.
          <string-name>
            <surname>Li</surname>
            ,
            <given-names>L.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Jin</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Long</surname>
            ,
            <given-names>M.</given-names>
          </string-name>
          :
          <article-title>Topic correlation analysis for cross-domain text classication</article-title>
          .
          <source>In: Proceedings of the 26th AAAI Conference on Arti cial Intelligence (AAAI</source>
          <year>2012</year>
          ). pp.
          <volume>998</volume>
          {
          <fpage>1004</fpage>
          . Toronto, CA (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref10">
        <mixed-citation>
          10.
          <string-name>
            <surname>Ling</surname>
            ,
            <given-names>X.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Spectral-domain transfer learning</article-title>
          .
          <source>In: Proceedings of the 14th ACM International Conference on Knowledge Discovery and Data Mining (KDD</source>
          <year>2008</year>
          ). pp.
          <volume>488</volume>
          {
          <fpage>496</fpage>
          .
          <string-name>
            <surname>Las</surname>
            <given-names>Vegas</given-names>
          </string-name>
          ,
          <string-name>
            <surname>US</surname>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref11">
        <mixed-citation>
          11.
          <string-name>
            <given-names>Moreo</given-names>
            <surname>Fernandez</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Esuli</surname>
          </string-name>
          ,
          <string-name>
            <given-names>A.</given-names>
            ,
            <surname>Sebastiani</surname>
          </string-name>
          ,
          <string-name>
            <surname>F.</surname>
          </string-name>
          :
          <article-title>Distributional correspondence indexing for cross-lingual and cross-domain sentiment classi cation</article-title>
          .
          <source>Journal of Arti cial Intelligence Research</source>
          <volume>55</volume>
          ,
          <volume>131</volume>
          {
          <fpage>163</fpage>
          (
          <year>2016</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref12">
        <mixed-citation>
          12.
          <string-name>
            <surname>Niwa</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Nitta</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Co-occurrence vectors from corpora vs. distance vectors from dictionaries</article-title>
          .
          <source>In: Proceedings of the 15th conference on Computational linguisticsVolume 1</source>
          . pp.
          <volume>304</volume>
          {
          <fpage>309</fpage>
          .
          <article-title>Association for Computational Linguistics (</article-title>
          <year>1994</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref13">
        <mixed-citation>
          13.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>S.J.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>A survey on transfer learning</article-title>
          .
          <source>IEEE Transactions on Knowledge and Data Engineering</source>
          <volume>22</volume>
          (
          <issue>10</issue>
          ),
          <volume>1345</volume>
          {
          <fpage>1359</fpage>
          (
          <year>2010</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref14">
        <mixed-citation>
          14.
          <string-name>
            <surname>Pan</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Zhong</surname>
            ,
            <given-names>E.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          :
          <article-title>Transfer learning for text mining</article-title>
          . In: Aggarwal,
          <string-name>
            <given-names>C.C.</given-names>
            ,
            <surname>Zhai</surname>
          </string-name>
          , C. (eds.)
          <source>Mining Text Data</source>
          , pp.
          <volume>223</volume>
          {
          <fpage>258</fpage>
          . Springer, Heidelberg, DE (
          <year>2012</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref15">
        <mixed-citation>
          15.
          <string-name>
            <surname>Prettenhofer</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Stein</surname>
            ,
            <given-names>B.</given-names>
          </string-name>
          :
          <article-title>Cross-lingual adaptation using structural correspondence learning</article-title>
          .
          <source>ACM Transactions on Intelligent Systems and Technology</source>
          <volume>3</volume>
          (
          <issue>1</issue>
          ),
          <source>Article</source>
          <volume>13</volume>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref16">
        <mixed-citation>
          16.
          <string-name>
            <surname>Vapnik</surname>
          </string-name>
          , V.:
          <article-title>Statistical Learning Theory</article-title>
          . Wiley, New York, US (
          <year>1998</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref17">
        <mixed-citation>
          17.
          <string-name>
            <surname>Xue</surname>
            ,
            <given-names>G.R.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Dai</surname>
            ,
            <given-names>W.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yang</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Yu</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          :
          <article-title>Topic-bridged PLSA for cross-domain text classi cation</article-title>
          .
          <source>In: Proceedings of the 31st ACM International Conference on Research and Development in Information Retrieval (SIGIR</source>
          <year>2008</year>
          ). pp.
          <volume>627</volume>
          {
          <fpage>634</fpage>
          .
          <string-name>
            <surname>Singapore</surname>
            ,
            <given-names>SN</given-names>
          </string-name>
          (
          <year>2008</year>
          )
        </mixed-citation>
      </ref>
      <ref id="ref18">
        <mixed-citation>
          18.
          <string-name>
            <surname>Zhuang</surname>
            ,
            <given-names>F.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Luo</surname>
            ,
            <given-names>P.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>H.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>He</surname>
            ,
            <given-names>Q.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Xiong</surname>
            ,
            <given-names>Y.</given-names>
          </string-name>
          ,
          <string-name>
            <surname>Shi</surname>
            ,
            <given-names>Z.</given-names>
          </string-name>
          :
          <article-title>Exploiting associations between word clusters and document classes for cross-domain text categorization</article-title>
          .
          <source>Statistical Analysis and Data Mining</source>
          <volume>4</volume>
          (
          <issue>1</issue>
          ),
          <volume>100</volume>
          {
          <fpage>114</fpage>
          (
          <year>2011</year>
          )
        </mixed-citation>
      </ref>
    </ref-list>
  </back>
</article>